본문 바로가기
  • Home

Adjusting Weights of Single-word and Multi-word Terms for Keyphrase Extraction from Article Text

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2021, 26(8), pp.47-54
  • DOI : 10.9708/jksci.2021.26.08.047
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : June 10, 2021
  • Accepted : August 4, 2021
  • Published : August 31, 2021

In-Su Kang 1

1경성대학교

Accredited

ABSTRACT

Given a document, keyphrase extraction is to automatically extract words or phrases which topically represent the content of the document. In unsupervised keyphrase extraction approaches, candidate words or phrases are first extracted from the input document, and scores are calculated for keyphrase candidates, and final keyphrases are selected based on the scores. Regarding the computation of the scores of candidates in unsupervised keyphrase extraction, this study proposes a method of adjusting the scores of keyphrase candidates according to the types of keyphrase candidates: word-type or phrase-type. For this, type-token ratios of word-type and phrase-type candidates as well as information content of high-frequency word-type and phrase-type candidates are collected from the input document, and those values are employed in adjusting the scores of keyphrase candidates. In experiments using four keyphrase extraction evaluation datasets which were constructed for full-text articles in English, the proposed method performed better than a baseline method and comparison methods in three datasets.

Citation status

* References for papers published after 2022 are currently being built.