How Much Do Quantitative Factors Affect Qualitative Analyses in Corpus-based Translation Studies? (코퍼스기반 번역학 연구에서 정량적 인자가 정성적 분석 결과에 미치는 영향)

Jeong-Woo Kim (김정우)

doi:10.15749/jts.2013.14.1.002

How Much Do Quantitative Factors Affect Qualitative Analyses in Corpus-based Translation Studies?

The Journal of Translation Studies
Abbr : JTS
2013, 14(1), pp.31~98
DOI : 10.15749/jts.2013.14.1.002
Publisher : The Korean Association for Translation Studies
Research Area : Humanities > Interpretation and Translation Studies

Jeong-Woo Kim ¹

¹경남대학교

Accredited

ABSTRACT

This paper aims at elucidating what size of corpus can produce the reliable qualitative analyses when the parallel corpus, composed of English original and Korean translated texts, is used. To reach the goal, we have divided the size of corpus into 5 levels from a quarter-million to one million (phonological) words. At each level, the number of words has been increased by one hundred fifty thousand words, i.e., 250,000, 400,000, 550,000, 700,000, 850,000, and 1,000,000 words. Then, we have examined the major differences between the levels. The results obtained from our investigation are as follows:First, with reference to the translation source of the Korean bound noun ttaemun(reason or ground), the zero-morph translation is most frequent in a quarter-million corpus level, while the frequency of the conjunctive translation is the highest in the seven hundred thousand corpus level. This indicates that at least, the corpus size of seven hundred thousand words is necessary to get a meaningful analysis of the bound noun ttaemun. Second, although the differences between the five levels are not significant, the translation of the long-form causative construction becomes more frequent in the seven hundred thousand corpus level while the frequency of the text-free translation decreases more or less. Third, in the case of the translation source of the Korean conjunctive geureona(but), the translation frequency of conjunctive ‘but’ increases by 20 percent in the four hundred thousand corpus level while the translation of either zero morph or conjunctive ‘however’ decreases by 10 percent in the same corpus level. On the other hand, in the case of the Korean conjunctive hajiman(yet or but), certain significant change of translation frequency occurs in the five hundred fifty thousand corpus level. Finally, concerning the translation of the English dash mark ‘-’ into Korean, the five hundred fifty corpus level shows a significant result. For example, the dash mark disappears in many Korean texts, or the contents after the dash mark is rewritten as a new Korean sentence. In conclusion, the reasonable size of corpus, which can be developed into a hypothesis or theory, can vary from four hundred thousand words minimally to seven hundred thousand words maximally according to our investigation. Futhermore, the corpus size over seven hundred thousand words does not make any difference on the qualitative analyses of the 4 items thoroughly investigated in this paper.

KEYWORDS

translation universals, simplification, explicitation, normalization, parallel corpus, Korean translated texts from English, quantitative factors, qualitative results

Citation status

* References for papers published after 2025 are currently being built.

This paper was written with support from the National Research Foundation of Korea.

KJCKorea
Journal Central

The Journal of Translation Studies 2025 KCI Impact Factor : 2.77

How Much Do Quantitative Factors Affect Qualitative Analyses in Corpus-based Translation Studies?

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2025 are currently being built.

The Journal of Translation Studies 2025 KCI Impact Factor : 2.77

How Much Do Quantitative Factors Affect Qualitative Analyses in Corpus-based Translation Studies?

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (12)

REFERENCES (34) * References for papers published after 2025 are currently being built.

Search PDF

Citation

* References for papers published after 2025 are currently being built.