본문 바로가기
  • Home

Evaluation of English Term Extraction based on Inner/Outer Term Statistics

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2020, 25(4), pp.141-148
  • DOI : 10.9708/jksci.2020.25.04.141
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : March 9, 2020
  • Accepted : April 20, 2020
  • Published : April 30, 2020

In-Su Kang 1

1경성대학교

Accredited

ABSTRACT

Automatic term extraction is to recognize domain-specific terms given a collection of domain-specific text. Previous term extraction methods operate effectively in unsupervised manners which include extracting candidate terms, and assigning importance scores to candidate terms. Regarding the calculation of term importance scores, the study focuses on utilizing sets of inner and outer terms of a candidate term. For a candidate term, its inner terms are shorter terms which belong to the candidate term as components, and its outer terms are longer terms which include the candidate term as their component. This work presents various functions that compute, for a candidate term, term strength from either set of its inner or outer terms. In addition, a scoring method of a term importance is devised based on C-value score and the term strength values obtained from the sets of inner and outer terms. Experimental evaluations using GENIA and ACL RD-TEC 2.0 datasets compare and analyze the effectiveness of the proposed term extraction methods for English. The proposed method performed better than the baseline method by up to 1% and 3% respectively for GENIA and ACL datasets.

Citation status

* References for papers published after 2022 are currently being built.