본문 바로가기
  • Home

A Study on English-to-Korean Test Suites for NMT Automatic Evaluation by Linguistic Assessment Items

  • The Journal of Translation Studies
  • Abbr : JTS
  • 2020, 21(5), pp.351-371
  • DOI : 10.15749/jts.2020.21.5.012
  • Publisher : The Korean Association for Translation Studies
  • Research Area : Humanities > Interpretation and Translation Studies
  • Received : November 8, 2020
  • Accepted : November 30, 2020
  • Published : December 31, 2020

Sung-Kwon Choi 1 Ji-Eun Han 2 Gyu-Hyeun Choi 1 Youngkil Kim 1

1한국전자통신연구원
2한국외국어대학교

Accredited

ABSTRACT

This paper describes an approach to automatically evaluate Neural Machine Translation(NMT) systems by linguistic assessment items. While the previous automatic evaluation approaches cannot identify the strengths and weaknesses of NMT systems for each linguistic assessment item, our automatic evaluation approach can intuitively determine both strengths and weaknesses of each linguistic assessment item. The automatic evaluation by linguistic assessment items of NMT systems is evaluated based on whether the answer of translation exists in the machine translation results, after building the test suites of the source text, the expressions in the source text, and the translated word. As applying the automatic evaluation approach by linguistic assessment items to NMT systems of Papago by Naver and Google Translate by Google, we figured out the strengths and weaknesses of each system. The biggest weakness of Papago English-to-Korean machine translation system is Cohesion(40.00%). The most serious weak points of Google English-to-Korean translation system are the translation of Relative pronoun(35.00%), Spoken expression(40.00%), Structural Ambiguity (40.00%), and Cohesion(40.00%). The main purpose of automatic evaluation by the linguistic assessment items is to find various weaknesses of the machine translation systems, semi-automatically collect and build a targeted corpus based on the weaknesses, and improve the performance of the machine translation systems incrementally by retraining. Although this paper has the advantage of automatically recognizing the strengths and weaknesses by linguistic assessment items, the simplified automatic evaluation approach, a measurement based on the matching of translated word and machine translation, that this paper suggests should be improved. In this respect, the improvement directions of this paper in the future are 1) enlarging the linguistic assessment items to other language pairs other than English-to-Korean, 2) semi-automatically collecting the source text which is targeted for evaluation, 3) extending the research to machine interpreting with speech data, 4) including the assessment items that human translator considers.

Citation status

* References for papers published after 2022 are currently being built.