본문 바로가기
  • Home

How Much Do Quantitative Factors Affect Qualitative Analyses in Corpus-based Translation Studies?

Jeong-Woo Kim 1

1경남대학교

Accredited

ABSTRACT

This paper aims at elucidating what size of corpus can produce the reliable qualitative analyses when the parallel corpus, composed of English original and Korean translated texts, is used. To reach the goal, we have divided the size of corpus into 5 levels from a quarter-million to one million (phonological) words. At each level, the number of words has been increased by one hundred fifty thousand words, i.e., 250,000, 400,000, 550,000, 700,000, 850,000, and 1,000,000 words. Then, we have examined the major differences between the levels. The results obtained from our investigation are as follows:First, with reference to the translation source of the Korean bound noun ttaemun(reason or ground), the zero-morph translation is most frequent in a quarter-million corpus level, while the frequency of the conjunctive translation is the highest in the seven hundred thousand corpus level. This indicates that at least, the corpus size of seven hundred thousand words is necessary to get a meaningful analysis of the bound noun ttaemun. Second, although the differences between the five levels are not significant, the translation of the long-form causative construction becomes more frequent in the seven hundred thousand corpus level while the frequency of the text-free translation decreases more or less. Third, in the case of the translation source of the Korean conjunctive geureona(but), the translation frequency of conjunctive ‘but’ increases by 20 percent in the four hundred thousand corpus level while the translation of either zero morph or conjunctive ‘however’ decreases by 10 percent in the same corpus level. On the other hand, in the case of the Korean conjunctive hajiman(yet or but), certain significant change of translation frequency occurs in the five hundred fifty thousand corpus level. Finally, concerning the translation of the English dash mark ‘-’ into Korean, the five hundred fifty corpus level shows a significant result. For example, the dash mark disappears in many Korean texts, or the contents after the dash mark is rewritten as a new Korean sentence. In conclusion, the reasonable size of corpus, which can be developed into a hypothesis or theory, can vary from four hundred thousand words minimally to seven hundred thousand words maximally according to our investigation. Futhermore, the corpus size over seven hundred thousand words does not make any difference on the qualitative analyses of the 4 items thoroughly investigated in this paper.

Citation status

* References for papers published after 2022 are currently being built.

This paper was written with support from the National Research Foundation of Korea.