본문 바로가기
  • Home

Classification Performance Analysis of Cross-Language Text Categorization using Machine Translation

  • Journal of the Korean Society for Library and Information Science
  • 2009, 43(1), pp.313-332
  • DOI : 10.4275/KSLIS.2009.43.1.313
  • Publisher : 한국문헌정보학회
  • Research Area : Interdisciplinary Studies > Library and Information Science
  • Received : February 27, 2009
  • Accepted : March 9, 2009

Yong-Gu Lee 1

1피츠버그대학

Accredited

ABSTRACT

Cross-language text categorization(CLTC) can classify documents automatically using training set from other language. In this study, collections appropriated for CLTC were extracted from KTSET. Classification performance of various CLTC methods were compared by SVM classifier using machine translation. Results showed that the classification performance in the order of poly-lingual training method, training-set translation and test-set translation. However, training-set translation could be regarded as the most useful method among CLTC, because it was efficient for machine translation and easily adapted to general environment. On the other hand, low performance was shown to be due to the feature reduction or features with no subject characteristics, which occurred in the process of machine translation of CLTC.

Citation status

* References for papers published after 2022 are currently being built.