본문 바로가기
  • Home

Significance of Recall in Automatic Metrics for HT Evaluation

  • The Journal of Translation Studies
  • Abbr : JTS
  • 2022, 23(1), pp.81-100
  • DOI : 10.15749/jts.2022.23.1.003
  • Publisher : The Korean Association for Translation Studies
  • Research Area : Humanities > Interpretation and Translation Studies
  • Received : February 5, 2022
  • Accepted : March 22, 2022
  • Published : March 31, 2022

Hyeyeon Chung 1 CHOI JISOO 1 Heo TakSung 2 SEOSOOYOUNG 2

1한국외국어대학교
2한림대학교

Accredited

ABSTRACT

In the automatic evaluation of translations, precision and recall are two indices that show how precisely (precision) and how much (recall) the system is able to recognize the well-translated portion in a translation. It would be ideal if two indices could be equally weighted in the evaluation system, since both accuracy and completeness are important criteria in evaluation of human translations (HT). This is, however, not easy, as both indices are negatively correlated. Papineni et al. (2002), for example, opted for precision, while Lavie et al. (2005) used both indices, giving recall nine times more weight than precision. The aim of this work is to examine which of the two indices correlates better with evaluation of professional evaluators and how much weight should be given each to precision and to recall. For this purpose, 459 translated texts were rated with precision, recall, F1 (harmonic mean of precision and recall) and Fmean (nine times higher weight on recall) as well as by professional evaluators. The results show that recall correlates better with human evaluation than precision in almost all cases, but not Fmean than F1, which were equivalent in all but one case. They indicate that recall is indeed a more important metric, but the weight as high as nine on recall is not ideal for HT evaluation.

Citation status

* References for papers published after 2022 are currently being built.

This paper was written with support from the National Research Foundation of Korea.