Evaluation of English Term Extraction based on Inner/Outer Term Statistics (내부 및 외부용어집합 통계치에 기반한 영어 용어 추출 방법의 평가)

In-Su Kang (강인수)

doi:10.9708/jksci.2020.25.04.141

Evaluation of English Term Extraction based on Inner/Outer Term Statistics

Journal of The Korea Society of Computer and Information
Abbr : JKSCI
2020, 25(4), pp.141~148
DOI : 10.9708/jksci.2020.25.04.141
Publisher : The Korean Society Of Computer And Information
Research Area : Engineering > Computer Science
Received : March 9, 2020
Accepted : April 20, 2020
Published : April 30, 2020

In-Su Kang ¹

¹경성대학교

Accredited

ABSTRACT

Automatic term extraction is to recognize domain-specific terms given a collection of domain-specific text. Previous term extraction methods operate effectively in unsupervised manners which include extracting candidate terms, and assigning importance scores to candidate terms. Regarding the calculation of term importance scores, the study focuses on utilizing sets of inner and outer terms of a candidate term. For a candidate term, its inner terms are shorter terms which belong to the candidate term as components, and its outer terms are longer terms which include the candidate term as their component. This work presents various functions that compute, for a candidate term, term strength from either set of its inner or outer terms. In addition, a scoring method of a term importance is devised based on C-value score and the term strength values obtained from the sets of inner and outer terms. Experimental evaluations using GENIA and ACL RD-TEC 2.0 datasets compare and analyze the effectiveness of the proposed term extraction methods for English. The proposed method performed better than the baseline method by up to 1% and 3% respectively for GENIA and ACL datasets.

KEYWORDS

Term extraction, Inner term set, Outer term set, Term importance score, Domain term

Citation status

* References for papers published after 2024 are currently being built.

[other] N. Astrakhantsev / 2016 / ATR4S : Toolkit with State-of-the-art Automatic Terms Recognition Methods in Scala / CoRR abs/1611.07804

[other] Z. Zhang / 2017 / SemRe-Rank : Incorporating Semantic Relatedness to Improve Automatic Term Extraction Using Personalized PageRank / CoRR abs/1711.03373

[confproc] T. Koutropoulou / 2019 / TMG-BoBI: Generating Back-of-the-Book Indexes with the Text-to-Matrix-Generator / Proceedings of 10th International Conference on Information, Intelligence, Systems and Applications

[confproc] Z. Wu / 2013 / Can back-of-the-book indexes be automatically created? / Proceedings of the 22nd ACM International Conference on Information and Knowledge Management : 1745~1750

[confproc] N. Simon / 2018 / Automatic Term Extraction in Technical Domain using Part-of-Speech and Common-Word Features / Proceedings of the ACM Symposium on Document Engineering

[book] G. Petasis / 2011 / Knowledge-Driven Multimedia Information Extraction and Ontology Evolution : 134~166

[journal] M. Asim / 2018 / A survey of ontology learning techniques and applications / Database 2018

[journal] K. Frantzi / 2000 / Automatic recognition of multi-word terms : . the c-value/nc-value method / International Journal on Digital Libraries 3(2) : 115~130

[confproc] G. Bordea / 2013 / Domain-independent term extraction through domain modelling / Proceedings of the 10th International Conference on Terminology and Artificial Intelligence

[book] S. Rose / 2010 / Text Mining:Applications and Theory / John Wiley & Sons Ltd

[confproc] H. Nakagawa / 2002 / A Simple but Powerful Automatic Term Extraction Method / COLING-02: COMPUTERM 2002:Second International Workshop on Computational Terminology

[thesis] N. Astrakhantsev / 2015 / Methods and software for terminology extraction from domain specific text collection / Ph.D. / Institute for System Programming of Russian Academy of Sciences

[confproc] Z. Zhang / 2016 / JATE 2.0: Java Automatic Term Extraction with Apache Solr / Proceedings of the Tenth International Conference on Language Resources and Evaluation

[journal] K. Meijer / 2014 / A semantic approach for extracting domain taxonomies from text / Decision Support Systems 62 : 78~93

[confproc] K. Ahmad / 1999 / University of Surrey Participation in TREC8: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER) / Proceedings of The Eighth Text REtrieval Conference

[confproc] J. Ventura / 2013 / Combining c-value and keyword extraction methods for biomedical terms extraction / International Symposium on Languages in Biology and Medicine : 45~49

[confproc] B. QasemiZadeh / 2016 / The ACL RD-TEC 2.0:A Language Resource for Evaluating Term Extraction and Entity Recognition Methods / Proceedings of the Tenth International Conference on Language Resources and Evaluation

[journal] J. Kim / 2003 / GENIA corpus - a semantically annotated corpus for bio-textmining / ISMB (Supplement of Bioinformatics) 180(182) : 180~182

[confproc] A. Sajatovic / 2019 / Basic : Evaluating Automatic Term Extraction Methods on Individual Documents / Proceedings of the Joint Workshop on Multiword Expressions and WordNet : 149~154

[web] / SpaCy / https://spacy.io/

[journal] M. Marcus / 1993 / Building a Large Annotated Corpus of English : The Penn Treebank / Computational Linguistics 19(2) : 313~330

KJCKorea
Journal Central

Journal of The Korea Society of Computer and Information 2024 KCI Impact Factor : 0.81

Evaluation of English Term Extraction based on Inner/Outer Term Statistics

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2024 are currently being built.

Journal of The Korea Society of Computer and Information 2024 KCI Impact Factor : 0.81

Evaluation of English Term Extraction based on Inner/Outer Term Statistics

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (0)

REFERENCES (21) * References for papers published after 2024 are currently being built.

Search PDF

Citation

* References for papers published after 2024 are currently being built.