본문 바로가기
  • Home

Efficient Keyword Extraction from Social Big Data Based on Cohesion Scoring

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2020, 25(10), pp.87-94
  • DOI : 10.9708/jksci.2020.25.10.087
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : August 19, 2020
  • Accepted : October 12, 2020
  • Published : October 30, 2020

Hyeon Gyu Kim 1

1삼육대학교

Accredited

ABSTRACT

Social reviews such as SNS feeds and blog articles have been widely used to extract keywords reflecting opinions and complaints from users’ perspective, and often include proper nouns or new words reflecting recent trends. In general, these words are not included in a dictionary, so conventional morphological analyzers may not detect and extract those words from the reviews properly. In addition, due to their high processing time, it is inadequate to provide analysis results in a timely manner. This paper presents a method for efficient keyword extraction from social reviews based on the notion of cohesion scoring. Cohesion scores can be calculated based on word frequencies, so keyword extraction can be performed without a dictionary when using it. On the other hand, their accuracy can be degraded when input data with poor spacing is given. Regarding this, an algorithm is presented which improves the existing cohesion scoring mechanism using the structure of a word tree. Our experiment results show that it took only 0.008 seconds to extract keywords from 1,000 reviews in the proposed method while resulting in 15.5% error ratio which is better than the existing morphological analyzers.

Citation status

* References for papers published after 2022 are currently being built.