A Study on Measuring the Risk of Re-identification of Personal Information in Conversational Text Data using AI (AI를 활용한 대화형 텍스트 데이터의 개인정보 재식별 위험성 측정방안에 관한 연구)

Dong Hyun Kim (김동현); Ye-Seul Cho (조예슬); Tae-Jong Kim (김태종)

doi:10.9708/jksci.2024.29.10.077

A Study on Measuring the Risk of Re-identification of Personal Information in Conversational Text Data using AI

Journal of The Korea Society of Computer and Information
Abbr : JKSCI
2024, 29(10), pp.77~87
DOI : 10.9708/jksci.2024.29.10.077
Publisher : The Korean Society Of Computer And Information
Research Area : Engineering > Computer Science
Received : August 5, 2024
Accepted : October 7, 2024
Published : October 31, 2024

Dong Hyun Kim ¹, Ye-Seul Cho ², Tae-Jong Kim ²

¹한라대학교
²월드버텍(주)

Accredited

ABSTRACT

With the recent advancements in artificial intelligence, various chatbots have emerged, efficiently performing everyday tasks such as hotel bookings, news updates, and legal consultations. Particularly, generative chatbots like ChatGPT are expanding their applicability by generating original content in fields such as education, research, and the arts. However, the training of these AI chatbots requires large volumes of conversational text data, such as customer service records, which has led to privacy infringement cases domestically and internationally due to the use of unrefined data. This study proposes a methodology to quantitatively assess the re-identification risk of personal information contained in conversational text data used for training AI chatbots. To validate the proposed methodology, we conducted a case study using synthetic conversational data and carried out a survey with 220 external experts, confirming the significance of the proposed approach.

KEYWORDS

Personal Information, AI Chatbot, Conversational Data, Risk Measurement

Citation status

* References for papers published after 2024 are currently being built.

[journal] Jun-ho Park / 2019 / Artificial Intelligence-Based Chatbot System Technology Trend / Korea Information Processing Society 26(2) : 39~46

[web] Avinash Chandra Das / 2023 / The next frontier of customer engagement: AI-enabled customer service / Mckinsey& Company / https://mck.co/40y0s9A/

[web] / 2023 / Chatbot Market Global Industry Analysis / Precedence Research / https://www.precedenceresearch.com/chatbot-market

[journal] Xiaodong Wu / 2024 / Unveiling Security, Privacy, and Ethical Concerns of ChatGPT / Journal of Information and Intelligence 2(2) : 102~115

[journal] Seung-Jae Jeon / 2021 / Possibility of Using Personal Information as Machine Learning Data Seen Through the Iruda Case / Korea Association For Info-Media Law 25(2) : 103~133

[journal] Heui-ok Lee / 2023 / Ethical Guidelines for Controlling Bias of Artificial Intelligence Chatbots / Korean Public Law Association 51(3) : 715~

[other] Mark Elliot / 2016 / The Anonymisation Decision-making Framework / UK Anonymisation Network

[other] Personal Information Protection Commission / 2020 / Guidelines for processing Pseudonymization information

[web] Simson Garfinkel / 2022 / NIST800-188 De-Identifying Government Data Sets / NIST / https://doi.org/10.6028/NIST.SP.800-188

[web] / 2017 / ISO/IEC 25237, Health informatics- Pseudonymization / https://www.iso.org/standard/63553.html

[other] Personal Information Protection Commission / 2024 / Guidelines for Personal Information Impact Assessment

[other] Personal Information Protection Commission / 2020 / Personal information risk analysis standards and commentary

[other] Ministry of the Interior and Safety / 2017 / guidelines for homapage personal Information exposure prevention

[thesis] Eu-gene Kim / 2011 / Privacy Detection and Risk Analysis Model / Master / Sungshin Women's University

[journal] Su-jun Jeong / 2015 / A Study on Analysis of Personal Information Risk Using Importance-Performance Analysis / The Journal of The Institute of Internet 15(6) : 267~273

[journal] Sung-jick Lee / 2009 / Keyword Extraction from News Corpus using Modified TF-IDF / Journal of Society for e-Business Studies 14(4) : 59~73

[journal] J. A. Martilla / 1977 / Importance Performance Analysis / Journal of Making 41 : 77~79

[thesis] Jo-seong lae / 2013 / A Study of the Aged in the Leisure Life of Leisure Motivation and on the Leisure Satisfaction / master

[confproc] Chae-hyeon Kim / 2022 / An Information Content-based Method for Measuring the Risk of Personal Information Exposure / Korean Institute of Information Scientists and Engineers : 926~928

[confproc] Hye-rin Kang / 2022 / A Study on the Construction of Specialized NER Dataset for Personal Information Detection / Annual Conference on Human and Language Technology : 185~191

[other] National Institute of Korean Language / 2022 / Messenger Corpus

[other] KOREA PRESS FOUNDATION / 2022 / Understanding BERT in the history of artificial intelligence

[web] Sweeney L / 2017 / Re-identification Risks in HIPAA Safe Harbor Data / PubMed Cenral / https://techscience.org/a/2017082801

[journal] Dong-hyun Kim / 2022 / A study on Data Context-Based Risk Measurement Method for Pseudonymized Information Processing / Journal of The Korea Society of Computer and Information 27(6) : 53~63

This paper was written with support from the National Research Foundation of Korea.

KJCKorea
Journal Central

Journal of The Korea Society of Computer and Information 2024 KCI Impact Factor : 0.81

A Study on Measuring the Risk of Re-identification of Personal Information in Conversational Text Data using AI

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2024 are currently being built.

Journal of The Korea Society of Computer and Information 2024 KCI Impact Factor : 0.81

A Study on Measuring the Risk of Re-identification of Personal Information in Conversational Text Data using AI

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (3)

REFERENCES (24) * References for papers published after 2024 are currently being built.

Search PDF

Citation

* References for papers published after 2024 are currently being built.