본문 바로가기
  • Home

A study on the similarity of inter-regional transmission trends of oral tales using text mining

  • The Research of the Korean Classic
  • 2024, (64), pp.157-188
  • Publisher : The Research Of The Korean Classic
  • Research Area : Humanities > Korean Language and Literature > Korean Literature > Korean classic prose
  • Received : January 13, 2024
  • Accepted : February 7, 2024
  • Published : February 29, 2024

Yujin Han 1

1이화여자대학교

Accredited

ABSTRACT

This paper analyzed the similarities in the inter-regional transmission trends of oral tales handed down in nine regions using two data collections, 『Comprehensive Korean Oral Literature』 and 『Complementary Edition of Comprehensive Korean Oral Literature』. To this end, we used text mining techniques to go through the analysis process of “data collection → local information pre-processing → regional narrative analysis → visualization”. First, 26,542 tale title data were collected from the digital archive of 〈Comprehensive Korean Oral Literature〉, and regional information that was not organized into administrative districts at the “province” level was preprocessed. The data was then divided into nine regions, and these data were again classified based on the year of recording. Next, the corpus morphemes created by collecting only titles from the preprocessed data were analyzed to extract the top 100 frequencies of nouns by region. Then, the extracted noun frequencies were normalized to accurately compare the proportion of oral speech between regions. The distribution of stories between regions was compared by calculating the cosine similarity between regions using the normalization value calculated here. This targeted 384 nouns extracted from 『Comprehensive Korean Oral Literature』 and 435 nouns from 『Complementary Edition of Comprehensive Korean Oral Literature』. The results derived through the analysis process were presented through a word cloud for each of the nine regions, the numbers of cosine similarity values ​​between regions, and data visualizing the cosine similarity values ​​on a map. The results indicate that, excluding Jeju, narratives transmitted in the Gyeonggi region show relatively low similarity with those of other regions, making it the most heterogeneous in terms of transmission tendencies across the nation in 『Comprehensive Korean Oral Literature』. On the other hand, in the 『Complementary Edition of Comprehensive Korean Oral Literature』 the regions of Chungcheongbuk-do and Jeollabuk-do exhibit the most heterogeneous transmission tendencies, with Gyeonggi region showing a relatively higher similarity with other regions.

Citation status

* References for papers published after 2022 are currently being built.