본문 바로가기
  • Home

Classification of Literary Works(Novels) Using Text Mining

  • PHILOSOPHY·THOUGHT·CULTURE
  • 2021, (35), pp.381~407
  • DOI : 10.33639/ptc.2021..35.016
  • Publisher : Research Institute for East-West Thought
  • Research Area : Humanities > Other Humanities
  • Received : December 6, 2020
  • Accepted : January 29, 2021
  • Published : January 31, 2021

Wonil Chung 1 Bahng, Seunghee 2 Park, Myung Kwan 1

1동국대학교
2국민대학교

Accredited

ABSTRACT

This paper is to introduce quantitative text analysis of some literary works registered in the Project Gutenberg among Big Data and classification of the works using text mining techniques. After performing data preprocessing using the programming language R, we measured cosine similarity between chapters within a novel and cosine similarity between chapters of different novels to classify the novels. We found the cosine similarity between chapters within the novel is relatively high, but not between the novels. Furthermore, clustering analysis, which is an unsupervised machine learning task, showed strong cohesion of semantic distance, and classification analysis, which is a supervised machine learning task, showed high accuracy. In addition, we have confirmed that children's novels can be classified as easy-to-read works due to the large cosine similarity value and small semantic distance between chapters. Therefore, quantitative text analysis using text mining technique is expected to serve as a foundation for performing qualitative text analysis.

Citation status

* References for papers published after 2023 are currently being built.

This paper was written with support from the National Research Foundation of Korea.