@article{ART003338913},
author={Sunghoon Jeong and Yujin Noh and Jeongeun Hwang and Jinsun Kim and Hajin Kim and Hyoji Ha},
title={A Study on Rule-Based Text Mining and Named Entity Recognition Approaches for Data Formalization of “The Memoirs of Casanova”},
journal={Journal of The Korea Society of Computer and Information},
issn={1598-849X},
year={2026},
volume={31},
number={5},
pages={165-178}
TY - JOUR
AU - Sunghoon Jeong
AU - Yujin Noh
AU - Jeongeun Hwang
AU - Jinsun Kim
AU - Hajin Kim
AU - Hyoji Ha
TI - A Study on Rule-Based Text Mining and Named Entity Recognition Approaches for Data Formalization of “The Memoirs of Casanova”
JO - Journal of The Korea Society of Computer and Information
PY - 2026
VL - 31
IS - 5
PB - The Korean Society Of Computer And Information
SP - 165
EP - 178
SN - 1598-849X
AB - This study constructs and analyzes structured data by applying digital humanities methodologies to Casanova's memoirs, considered the most extensive autobiographical record of the 18th century. Based on the text from the memoir, we designed a data refinement pipeline that integrates NLP technologies —such as Stanza, spaCy, and NRCLex—with generative AI. Specifically, to resolve the complex naming conventions and title issues, a rule-based algorithm was introduced to verify data accuracy.
Through this process, we constructed structured data for a total of 1,924 individuals, encompassing seven attributes including gender, mention frequency, and associated emotion words. The analysis of the data revealed a distinct alternation between sections peaking with large-scale influxes of new characters and sections where a select few individuals repeatedly appeared, creating dense relationship networks.
Notably, in sections centered around public and institutional events, as well as in the latter half of the narrative, the proportion of female characters plummeted to below half, demonstrating a pattern where the narrative converges upon a core group of male figures. To validate the efficacy of this methodology, the identical system was applied to Benjamin Franklin's autobiography. The results demonstrated stable operation and achieved higher accuracy in areas such as regional classification compared to conventional methods, thereby proving its accessibility and scalability.
KW - Rule-based text mining;Casanova memoirs;NER;Digital humanities;Tabular data
DO -
UR -
ER -
Sunghoon Jeong, Yujin Noh, Jeongeun Hwang, Jinsun Kim, Hajin Kim and Hyoji Ha. (2026). A Study on Rule-Based Text Mining and Named Entity Recognition Approaches for Data Formalization of “The Memoirs of Casanova”. Journal of The Korea Society of Computer and Information, 31(5), 165-178.
Sunghoon Jeong, Yujin Noh, Jeongeun Hwang, Jinsun Kim, Hajin Kim and Hyoji Ha. 2026, "A Study on Rule-Based Text Mining and Named Entity Recognition Approaches for Data Formalization of “The Memoirs of Casanova”", Journal of The Korea Society of Computer and Information, vol.31, no.5 pp.165-178.
Sunghoon Jeong, Yujin Noh, Jeongeun Hwang, Jinsun Kim, Hajin Kim, Hyoji Ha "A Study on Rule-Based Text Mining and Named Entity Recognition Approaches for Data Formalization of “The Memoirs of Casanova”" Journal of The Korea Society of Computer and Information 31.5 pp.165-178 (2026) : 165.
Sunghoon Jeong, Yujin Noh, Jeongeun Hwang, Jinsun Kim, Hajin Kim, Hyoji Ha. A Study on Rule-Based Text Mining and Named Entity Recognition Approaches for Data Formalization of “The Memoirs of Casanova”. 2026; 31(5), 165-178.
Sunghoon Jeong, Yujin Noh, Jeongeun Hwang, Jinsun Kim, Hajin Kim and Hyoji Ha. "A Study on Rule-Based Text Mining and Named Entity Recognition Approaches for Data Formalization of “The Memoirs of Casanova”" Journal of The Korea Society of Computer and Information 31, no.5 (2026) : 165-178.
Sunghoon Jeong; Yujin Noh; Jeongeun Hwang; Jinsun Kim; Hajin Kim; Hyoji Ha. A Study on Rule-Based Text Mining and Named Entity Recognition Approaches for Data Formalization of “The Memoirs of Casanova”. Journal of The Korea Society of Computer and Information, 31(5), 165-178.
Sunghoon Jeong; Yujin Noh; Jeongeun Hwang; Jinsun Kim; Hajin Kim; Hyoji Ha. A Study on Rule-Based Text Mining and Named Entity Recognition Approaches for Data Formalization of “The Memoirs of Casanova”. Journal of The Korea Society of Computer and Information. 2026; 31(5) 165-178.
Sunghoon Jeong, Yujin Noh, Jeongeun Hwang, Jinsun Kim, Hajin Kim, Hyoji Ha. A Study on Rule-Based Text Mining and Named Entity Recognition Approaches for Data Formalization of “The Memoirs of Casanova”. 2026; 31(5), 165-178.
Sunghoon Jeong, Yujin Noh, Jeongeun Hwang, Jinsun Kim, Hajin Kim and Hyoji Ha. "A Study on Rule-Based Text Mining and Named Entity Recognition Approaches for Data Formalization of “The Memoirs of Casanova”" Journal of The Korea Society of Computer and Information 31, no.5 (2026) : 165-178.