본문 바로가기
  • Home

Effective Thematic Words Extraction from a Book using Compound Noun Phrase Synthesis Method

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2017, 22(3), pp.107-113
  • DOI : 10.9708/jksci.2017.22.03.107
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : March 7, 2017
  • Accepted : March 23, 2017
  • Published : March 31, 2017

Hee-Jeong Ahn 1 Keewon Kim 1 Seung-Hoon Kim 1

1단국대학교

Accredited

ABSTRACT

Most of online bookstores are providing a user with the bibliographic book information rather than the concrete information such as thematic words and atmosphere. Especially, thematic words help a user to understand books and cast a wide net. In this paper, we propose an efficient extraction method of thematic words from book text by applying the compound noun and noun phrase synthetic method. The compound nouns represent the characteristics of a book in more detail than single nouns. The proposed method extracts the thematic word from book text by recognizing two types of noun phrases, such as a single noun and a compound noun combined with single nouns. The recognized single nouns, compound nouns, and noun phrases are calculated through TF-IDF weights and extracted as main words. In addition, this paper suggests a method to calculate the frequency of subject, object, and other roles separately, not just the sum of the frequencies of all nouns in the TF-IDF calculation method. Experiments is carried out in the field of economic management, and thematic word extraction verification is conducted through survey and book search. Thus, 9 out of the 10 experimental results used in this study indicate that the thematic word extracted by the proposed method is more effective in understanding the content. Also, it is confirmed that the thematic word extracted by the proposed method has a better book search result.

Citation status

* References for papers published after 2023 are currently being built.