A Study on English-to-Korean Test Suites for NMT Automatic Evaluation by Linguistic Assessment Items (NMT의 평가항목별 자동평가를 위한 영한 평가세트 연구)

Sung-Kwon Choi (최승권); Ji-Eun Han (한지은); Gyu-Hyeun Choi (최규현); Youngkil Kim (김영길)

doi:10.15749/jts.2020.21.5.012

A Study on English-to-Korean Test Suites for NMT Automatic Evaluation by Linguistic Assessment Items

The Journal of Translation Studies
Abbr : JTS
2020, 21(5), pp.351~371
DOI : 10.15749/jts.2020.21.5.012
Publisher : The Korean Association for Translation Studies
Research Area : Humanities > Interpretation and Translation Studies
Received : November 8, 2020
Accepted : November 30, 2020
Published : December 31, 2020

Sung-Kwon Choi ¹, Ji-Eun Han ², Gyu-Hyeun Choi ¹, Youngkil Kim ¹

¹한국전자통신연구원
²한국외국어대학교

Accredited

ABSTRACT

This paper describes an approach to automatically evaluate Neural Machine Translation(NMT) systems by linguistic assessment items. While the previous automatic evaluation approaches cannot identify the strengths and weaknesses of NMT systems for each linguistic assessment item, our automatic evaluation approach can intuitively determine both strengths and weaknesses of each linguistic assessment item. The automatic evaluation by linguistic assessment items of NMT systems is evaluated based on whether the answer of translation exists in the machine translation results, after building the test suites of the source text, the expressions in the source text, and the translated word. As applying the automatic evaluation approach by linguistic assessment items to NMT systems of Papago by Naver and Google Translate by Google, we figured out the strengths and weaknesses of each system. The biggest weakness of Papago English-to-Korean machine translation system is Cohesion(40.00%). The most serious weak points of Google English-to-Korean translation system are the translation of Relative pronoun(35.00%), Spoken expression(40.00%), Structural Ambiguity (40.00%), and Cohesion(40.00%). The main purpose of automatic evaluation by the linguistic assessment items is to find various weaknesses of the machine translation systems, semi-automatically collect and build a targeted corpus based on the weaknesses, and improve the performance of the machine translation systems incrementally by retraining. Although this paper has the advantage of automatically recognizing the strengths and weaknesses by linguistic assessment items, the simplified automatic evaluation approach, a measurement based on the matching of translated word and machine translation, that this paper suggests should be improved. In this respect, the improvement directions of this paper in the future are 1) enlarging the linguistic assessment items to other language pairs other than English-to-Korean, 2) semi-automatically collecting the source text which is targeted for evaluation, 3) extending the research to machine interpreting with speech data, 4) including the assessment items that human translator considers.

KEYWORDS

machine translation, neural machine translation, taxonomy, test suites, linguistic assessment item

Citation status

* References for papers published after 2024 are currently being built.

[journal] 강수정 / 2020 / Study on Translators’ Recognition and Acceptance of NMT —Based on in-depth interviews / 번역학연구 / 한국번역학회 21(3) : 9~35

[journal] 정혜연 / 2020 / Application of Automatic Evaluation to Human Translation / 번역학연구 / 한국번역학회 21(1) : 9~29

[journal] 서보현 / 2018 / An Analysis of Errors in Machine Translation / 번역학연구 / 한국번역학회 19(1) : 99~117

[journal] 정혜연 / 2018 / Automatic Evaluation of Human Translation / 통번역학연구 / 통번역연구소 22(4) : 265~287

[journal] 최동익 / 2013 / An Analysis of Errors in Translating English Sentences with Inanimate Subjects into Korean by Machine Translation / 언어학 연구 / 한국중원언어학회 (29) : 279~299

[confproc] Bentivogli, L / 2016 / Neural versus phrase-based machine translation quality: a case study’ / Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics : 257~267

[journal] Bojar O / 2011 / Analysing Error Types in English-Czech Machine Translation / The Prague Bulletin of Mathematical Linguistics : 63~76

[confproc] Brussel, Laura Van / 2018 / A fine-grained error analysis of NMT, PBMT and RBMT output for English-to-Dutch / Proceedings of the Eleventh International Conference on Language Resources and Evaluation : 3799~3804

[journal] Costa, Angela / 2015 / A Linguistically Motivated Taxonomy for Machine Translation Error Analysis / Machine Translation 29(2) : 127~161

[confproc] Daems, Joke / 2014 / On the origin of errors: a fine-grained analysis of MT and PE errors and their relationship / Proceedings of the Ninth International Conference on Language Resources and Evaluation : 62~66

[journal] Doyon, J / 1998 / The DARPA MT evaluation methodology: Past and present / Proceedings of the Association for Machine Translation in the Americas : 1~4

[confproc] Fishel, Mark / 2012 / Terra: a Collection of Translation Error-Annotated Corpora’ / Proceedings of the Eighth International Conference on Language Resources and Evaluation : 7~14

[confproc] Guillou, L / 2016 / PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation / Proceedings of the Tenth International Conference on Language Resources and Evaluation : 636~643

[confproc] Isabelle, P / 2017 / A Challenge Set Approach to Evaluating Machine Translation / Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing : 2486~2496

[confproc] Koehn, Philipp / 2017 / Six Challenges for Neural Machine Translation / Proceedings of the First Workshop on Neural Machine Translation : 28~39

[confproc] Lommel, A / 2014 / Using a New Analytic Measure for the Annotation and Analysis of MT Errors on Real Data / EAMT-2014 : 165~172

[confproc] Macketanz, Vivien / 2018 / Fine-grained evaluation of German-English Machine Translation based on a Test Suite / Proceedings of the Third Conference on Machine Translation (WMT), 2 (Shared Task Papers) : 578~587

[confproc] Papineni, Kishore / 2002 / BLEU: a method for automatic evaluation of Machine Translation / Proceedings of the 40th Annual Meeting on Association for Computational Linguistics’, Association for Computational Linguistics, Stroudsburg, PA, USA, ACL ’02 : 311~318

[journal] Popović, Maja / 2011 / Towards automatic error analysis of machine translation output’ / Computational Linguistics 37(4) : 657~688

[confproc] Snover, Matthew / 2006 / A Study of Translation Edit Rate with Targeted Human Annotation / Proceedings of Association for Machine Translation in the Americas : 223~231

[confproc] Vilar D / 2006 / Error Analysis of Machine Translation Output / International Conference on Language Resources and Evaluation : 697~702

[confproc] Wisniewski, Guillaume / 2014 / A Corpus of Machine Translation Errors Extracted from Translation Students Exercises / Proceedings of the Ninth International Conference on Language Resources and Evaluation : 3585~3588

[web] Wu, Yonghui / 2016 / Google’s neural machine translation system: Bridging the gap between human and machine translation / http://arxiv.org/abs/1609.08144.pdf

KJCKorea
Journal Central

The Journal of Translation Studies 2024 KCI Impact Factor : 1.86

A Study on English-to-Korean Test Suites for NMT Automatic Evaluation by Linguistic Assessment Items

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2024 are currently being built.

The Journal of Translation Studies 2024 KCI Impact Factor : 1.86

A Study on English-to-Korean Test Suites for NMT Automatic Evaluation by Linguistic Assessment Items

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (3)

REFERENCES (23) * References for papers published after 2024 are currently being built.

Search PDF

Citation

* References for papers published after 2024 are currently being built.