본문 바로가기
  • Home

A Study on Measuring the Risk of Re-identification of Personal Information in Conversational Text Data using AI

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2024, 29(10), pp.77-87
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : August 5, 2024
  • Accepted : October 7, 2024
  • Published : October 31, 2024

Dong-Hyun Kim 1 Ye-Seul Cho 2 Tae-Jong Kim 2

1한라대학교
2월드버텍(주)

Accredited

ABSTRACT

With the recent advancements in artificial intelligence, various chatbots have emerged, efficiently performing everyday tasks such as hotel bookings, news updates, and legal consultations. Particularly, generative chatbots like ChatGPT are expanding their applicability by generating original content in fields such as education, research, and the arts. However, the training of these AI chatbots requires large volumes of conversational text data, such as customer service records, which has led to privacy infringement cases domestically and internationally due to the use of unrefined data. This study proposes a methodology to quantitatively assess the re-identification risk of personal information contained in conversational text data used for training AI chatbots. To validate the proposed methodology, we conducted a case study using synthetic conversational data and carried out a survey with 220 external experts, confirming the significance of the proposed approach.

Citation status

* References for papers published after 2023 are currently being built.

This paper was written with support from the National Research Foundation of Korea.