Automatic term extraction is to recognize domain-specific terms given a collection of domain-specific text. Previous term extraction methods operate effectively in unsupervised manners which include extracting candidate terms, and assigning importance scores to candidate terms. Regarding the calculation of term importance scores, the study focuses on utilizing sets of inner and outer terms of a candidate term. For a candidate term, its inner terms are shorter terms which belong to the candidate term as components, and its outer terms are longer terms which include the candidate term as their component.
This work presents various functions that compute, for a candidate term, term strength from either set of its inner or outer terms. In addition, a scoring method of a term importance is devised based on C-value score and the term strength values obtained from the sets of inner and outer terms.
Experimental evaluations using GENIA and ACL RD-TEC 2.0 datasets compare and analyze the effectiveness of the proposed term extraction methods for English. The proposed method performed better than the baseline method by up to 1% and 3% respectively for GENIA and ACL datasets.