Significance of Recall in Automatic Metrics for HT Evaluation (인간 번역평가에서 재현도(recall)의 중요성)

CHUNG, HYE-YEON (정혜연); CHOI JISOO (최지수); Heo TakSung (허탁성); SEOSOOYOUNG (서수영)

doi:10.15749/jts.2022.23.1.003

Significance of Recall in Automatic Metrics for HT Evaluation

The Journal of Translation Studies
Abbr : JTS
2022, 23(1), pp.81~100
DOI : 10.15749/jts.2022.23.1.003
Publisher : The Korean Association for Translation Studies
Research Area : Humanities > Interpretation and Translation Studies
Received : February 5, 2022
Accepted : March 22, 2022
Published : March 31, 2022

CHUNG, HYE-YEON ¹, CHOI JISOO ¹, Heo TakSung ², SEOSOOYOUNG ²

¹한국외국어대학교
²한림대학교

Accredited

ABSTRACT

In the automatic evaluation of translations, precision and recall are two indices that show how precisely (precision) and how much (recall) the system is able to recognize the well-translated portion in a translation. It would be ideal if two indices could be equally weighted in the evaluation system, since both accuracy and completeness are important criteria in evaluation of human translations (HT). This is, however, not easy, as both indices are negatively correlated. Papineni et al. (2002), for example, opted for precision, while Lavie et al. (2005) used both indices, giving recall nine times more weight than precision. The aim of this work is to examine which of the two indices correlates better with evaluation of professional evaluators and how much weight should be given each to precision and to recall. For this purpose, 459 translated texts were rated with precision, recall, F1 (harmonic mean of precision and recall) and Fmean (nine times higher weight on recall) as well as by professional evaluators. The results show that recall correlates better with human evaluation than precision in almost all cases, but not Fmean than F1, which were equivalent in all but one case. They indicate that recall is indeed a more important metric, but the weight as high as nine on recall is not ideal for HT evaluation.

KEYWORDS

automatic evaluation, translation quality, precision, recall, F1, Fmean

Citation status

* References for papers published after 2024 are currently being built.

[book] 권철민 / 2020 / 파이썬 머신러닝 완벽 가이드 / 위키북스

[journal] 정혜연 / 2020 / Application of Automatic Evaluation to Human Translation / 번역학연구 / 한국번역학회 21(1) : 9~29

[report] 박혜주 / 2007 / 문학번역 평가 시스템 연구 / 한국문학번역원

[journal] 정혜연 / 2021 / Die Applikabilität der automatischen Evaluation von Humanübersetzungen / 독일언어문학 / 한국독일언어문학회 (93) : 75~95

[journal] 정혜연 / 2021 / Automatic Evaluation of Human Translation using Word and Sentence Embedding: Can Machines Evaluate Meaning? / 통번역학연구 / 통번역연구소 25(3) : 141~162

[confproc] 한국외대 번역평가인증 연구팀 / 2016 / 번역인증제도 (실무편) / 한국외대 통번역연구소 학술대회 <언어, 통번역의 평가 및 인증> 발표집 : 23~33

[confproc] Banerjee, Satanjeev / 2005 / METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments / Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization : 65~72

[journal] Buckland, Michael / 1994 / The Relationship between Recall and Precision / Journal of the American Society for Information Science 45(1) : 12~19

[journal] Chung, Hye-Yeon / 2020 / Automatische Evaluation der Humanübersetzung: BLEU vs. METEOR / Lebende Sprachen 65(1) : 181~205

[confproc] Han, Lifeng / 2018 / Machine Translation Evaluation Resources and Methods: A Survey / IPRC-2018 (Ireland Postgraduate Research Conference)

[journal] Kunilovskaya, Maria / 2015 / How Far Do We Agree on the Quality of Translation? / English Studies at NBU 1(1) : 18~31

[journal] Lai, Tzu-Yun / 2011 / Reliability and Validity of a Scale-based Assessment for Translation Tests / Meta 56(3) : 713~722

[web] Lavie, Alon / 2004 / The Significance of Recall in Automatic Metrics for MT Evaluation / https://www.cs.cmu.edu/~alavie/papers/Recall-AMTA-04.pdf

[confproc] Papineni, Kishore / 2002 / BLEU: A Method for Automatic Evaluation of Machine Translation / Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL) : 311~318

[web] Sasaki, Yutaka / 2007 / The Truth of the F-measure / https://www.cs.odu.edu/~mukka/cs795sum10dm/Lecturenotes/Day3/F-measure-YS-26Oct07.pdf

[journal] Waddington, Christopher / 2001 / Should Translations Be Assessed Holistically or through Error Analysis? / HERMES Journal of Language and Communication in Business 26 : 15~37

[journal] Waddington, Christopher / 2001 / Different Methods of Evaluating Student Translations: The Question of Validity / Meta 46(2) : 311~325

[book] van Rijsbergen, Cornelius / 1979 / Information Retrieval / Butterworth

[confproc] Zhang, Tianyi / 2020 / BERTScore: Evaluating Text Generation with BERT / Conference Paper at ICLR 2020 : 1~14

This paper was written with support from the National Research Foundation of Korea.

KJCKorea
Journal Central

The Journal of Translation Studies 2024 KCI Impact Factor : 1.86

Significance of Recall in Automatic Metrics for HT Evaluation

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2024 are currently being built.

The Journal of Translation Studies 2024 KCI Impact Factor : 1.86

Significance of Recall in Automatic Metrics for HT Evaluation

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (2)

REFERENCES (19) * References for papers published after 2024 are currently being built.

Search PDF

Citation

* References for papers published after 2024 are currently being built.