본문 바로가기
  • Home

Corpus-based Term Extraction Methods for Translator Training

Park Myongsu 1

1상명대학교

Accredited

ABSTRACT

This paper reports on how to extract terms from a small specialized corpus of Korean Weather Corpus (KWC). The KWC was built from three different sets of data: the Korea Herald, the Korea Times, and Arirang News and its size was 88,042 tokens. It is more than true that the developments in computer technology have made tremendous contribution to the widespread use of corpus in various disciplines and its effects are also felt in the field of the translation studies as well. As part of efforts of encouraging the use of corpus and the corpus-based analytic approaches, the present research aimed at making use of two corpus-based approaches in extracting terms. The first method was using “a list of stopwords” which mainly consists of grammatical function words such as articles and prepositions. By filtering out these words prior to making a list of most frequent words in the KWC, it was made possible to create a list of words that were almost all term candidates. The second one was based on “a keyword analysis.” Keywords are those whose frequency is unusually high in comparison with a reference corpus. These unusually high frequent words can represent the aboutness of a given text and reveal some salient features related to a genre. The method also provided a list of positive keywords, which can result in a good list of term candidates of KWC. The suggested methods, hopefully, can serve as alternative ways of extracting terms and contribute to the widespread us of corpus in the translation study.

Citation status

* References for papers published after 2023 are currently being built.