Machine Learning Classification of Literary Translation Samples by Human and Machine Translators (기계학습 알고리즘을 활용한 문학번역에서의 기계 번역과 인간 번역 결과물 분류 연구)

Lee Chang-soo (이창수)

doi:10.15749/jts.2021.22.1.008

Machine Learning Classification of Literary Translation Samples by Human and Machine Translators

The Journal of Translation Studies
Abbr : JTS
2021, 22(1), pp.199~217
DOI : 10.15749/jts.2021.22.1.008
Publisher : The Korean Association for Translation Studies
Research Area : Humanities > Interpretation and Translation Studies
Received : February 7, 2021
Accepted : March 4, 2021
Published : March 31, 2021

Lee Chang-soo ¹

¹한국외국어대학교

Accredited

ABSTRACT

The current paper reports the results of a text classification experiment on literary translation samples by human and machine translators. The original data consists of the English translations of 28 short and long Korean novels by a set of human translators and 3 Web-based neural machine translators – Google Translate (Google), Bing (Microsoft), and Papago (Naver). Machine translation samples were collected twice in February 2019 and February 2020. One hundred most frequent words were extracted from the data and subjected to supervised classification by two machine learning algorithms – random forest (RF) and linear discriminant analysis (LDA) - for cross-reference tests. The most important findings are as follows. First, Both RF and LDA classified human and machine translation samples from both 2019 and 2020 with high accuracy, with prediction accuracy rates topping 90 percent. This indicated a clear distinction in word use patterns between human and machine translators, which did not change much over the 1-year period. Second, in both RF and LDA tests, most of the 2019 machine translation samples were accurately classified according to their translators with prediction accuracy rates ranging between 78 and 100 percent. Classification accuracy, however, fell visibly for Bing and Papago in 2020, with Papago plunging from 100 and 80 percent to 41 percent. This meant that over the 1-year period the three machine translators moved in closer toward each other, suggesting a trend toward homogeneity in word use patterns over time.

KEYWORDS

machine translation, literary translation, machine learning text classification, random forest, linear discriminant analysis

Citation status

* References for papers published after 2025 are currently being built.

[journal] 마승혜 / 2018 / A Detailed Investigation on Limitations of Literary Work Machine Translation / 통번역학연구 / 통번역연구소 22(3) : 65~88

[journal] 이준호 / 2019 / Current State of Machine Translation for Literary Translation Work / 통번역학연구 / 통번역연구소 23(1) : 143~167

[journal] 전혜진 / 2019 / AI 시대, 문학번역에서 기계번역과 인간번역 비교분석 연구 - 똘스또이의 『유년시절』번역 분석을 중심으로 / 노어노문학 / 한국노어노문학회 31(1) : 111~154

[web] 정상혁 / 2017 / 진화하는 번역기 ... 사라지는 번역가? / 조선일보 / https://www.chosun.com/site/data/html_dir/2017/01/18/2017011800020.html

[book] Aggarwal, Charu C. / 2018 / Machine Learning for Text / Springer

[book] Aggarwal, Charu C / 2012 / Mining Text Data / Springer : 163~222

[book] Alloghani, Mohamed / 2019 / Supervised and Unsupervised Learning for Data Science / Springer : 3~21

[book] Bokka, Karthiek Reddy / 2019 / Solve Your Natural Language Processing Problems with Smart Deep Neural Networks / Packt Publishing

[book] Brownlee, Jason / 2016 / Master Machine Learning Algorithms / Brownlee

[journal] Castilho, Sheila / 2019 / Editors’ Foreword to The Special Issue on Human Factors in Neural Machine Translation / Machine Translation 33 : 1~7

[journal] Eder, Maciej / 2013 / Does Size Matter? Authorship Attribution, Small Samples, Big Problem / Literary and Linguistic Computing 30(2) : 167~182

[other] Fu, Han / 2019 / Reference Network for Neural Machine Translation

[other] Hassan, Hany / 2018 / Achieving Human Parity on Automatic Chinese to English News Translation

[confproc] Harjule, Priyanka / 2020 / Text Classification on Twitter Data / Proceedings of 3rd International Conference on Emerging Technologies in Computer Engineering: Machine Learning and Internet of Things (ICETCE) : 160~164

[journal] Juola, Patrick / 2006 / Authorship attribution / Foundations and Trends in Information Retrieval 1(3) : 233~334

[journal] Kowsari, Kamran / 2019 / Text Classification Algorithms: A Survey / Information 10(4) : 1~68

[journal] Moorkens, Joss / 2018 / Translators’ Perceptions of Literary Post-Editing Using Statistical and Neural Machine Translation / Translation Spaces 7(2) : 240~262

[confproc] Papineni, Kishore / 2002 / BLEU: a Method for Automatic Evaluation of Machine Translation / Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL) : 311~318

[journal] Taivalkoski-Shilov, Kristiina / 2018 / Ethical Issues Regarding Machine(-assisted) Translation of Literary Texts / Perspectives 27(5) : 689~703

[journal] Toral, Antonio / 2015 / Machine-assisted Translation of Literary Text: A Case Study / Translation Spaces 4 : 241~268

[other] Toral, Antonio / 2018 / What Level of Quality can Neural Machine Translation Attain on Literary Text?

[journal] Toral, Antonio / 2018 / Post-editing Effort of a Novel With Statistical and Neural Machine Translation / Front. Digit. Humanit.

[confproc] Toral, Antonio / 2018 / Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation / Proceedings of the Third Conference on Machine Translation (WMT) 1 : 113~123

[other] Wu, Yonghui / 2016 / Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

This paper was written with support from the National Research Foundation of Korea.

KJCKorea
Journal Central

The Journal of Translation Studies 2025 KCI Impact Factor : 2.77

Machine Learning Classification of Literary Translation Samples by Human and Machine Translators

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2025 are currently being built.

The Journal of Translation Studies 2025 KCI Impact Factor : 2.77

Machine Learning Classification of Literary Translation Samples by Human and Machine Translators

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (17)

REFERENCES (24) * References for papers published after 2025 are currently being built.

Search PDF

Citation

* References for papers published after 2025 are currently being built.