본문 바로가기
  • Home

Machine Learning Classification of Literary Translation Samples by Human and Machine Translators

  • The Journal of Translation Studies
  • Abbr : JTS
  • 2021, 22(1), pp.199-217
  • DOI : 10.15749/jts.2021.22.1.008
  • Publisher : The Korean Association for Translation Studies
  • Research Area : Humanities > Interpretation and Translation Studies
  • Received : February 7, 2021
  • Accepted : March 4, 2021
  • Published : March 31, 2021

Chang-Soo Lee 1

1한국외국어대학교

Accredited

ABSTRACT

The current paper reports the results of a text classification experiment on literary translation samples by human and machine translators. The original data consists of the English translations of 28 short and long Korean novels by a set of human translators and 3 Web-based neural machine translators – Google Translate (Google), Bing (Microsoft), and Papago (Naver). Machine translation samples were collected twice in February 2019 and February 2020. One hundred most frequent words were extracted from the data and subjected to supervised classification by two machine learning algorithms – random forest (RF) and linear discriminant analysis (LDA) - for cross-reference tests. The most important findings are as follows. First, Both RF and LDA classified human and machine translation samples from both 2019 and 2020 with high accuracy, with prediction accuracy rates topping 90 percent. This indicated a clear distinction in word use patterns between human and machine translators, which did not change much over the 1-year period. Second, in both RF and LDA tests, most of the 2019 machine translation samples were accurately classified according to their translators with prediction accuracy rates ranging between 78 and 100 percent. Classification accuracy, however, fell visibly for Bing and Papago in 2020, with Papago plunging from 100 and 80 percent to 41 percent. This meant that over the 1-year period the three machine translators moved in closer toward each other, suggesting a trend toward homogeneity in word use patterns over time.

Citation status

* References for papers published after 2022 are currently being built.

This paper was written with support from the National Research Foundation of Korea.