본문 바로가기
  • Home

A Comparative Study of Feature Extraction Methods for Authorship Attribution in the Text of Traditional East Asian Medicine with a Focus on Function Words

  • The Journal Of Korean Medical Classics
  • Abbr : JKMC
  • 2020, 33(2), pp.51~59
  • DOI : 10.14369/jkmc.2020.33.2.051
  • Publisher : 대한한의학원전학회
  • Research Area : Medicine and Pharmacy > Korean Medicine
  • Received : April 17, 2020
  • Accepted : May 11, 2020
  • Published : May 25, 2020

Oh Junho 1

1한국한의학연구원

Accredited

ABSTRACT

Objectives : We would like to study what is the most appropriate "feature" to effectively perform authorship attribution of the text of Traditional East Asian MedicineMethods : The authorship attribution performance of the Support Vector Machine (SVM) was compared by cross validation, depending on whether the function words or content words, single word or collocations, and IDF weights were applied or not, using ‘Variorum of the Nanjing’ as an experimental Corpus. Results : When using the combination of 'function words/uni-bigram/TF', the performance was best with accuracy of 0.732, and the combination of 'content words/unigram/TFIDF' showed the lowest accuracy of 0.351. Conclusions : This shows the following facts from the authorship attribution of the text of East Asian traditional medicine. First, function words play an important role in comparison to content words. Second, collocations was relatively important in content words, but single words have more important meanings in function words. Third, unlike general text analysis, IDF weighting resulted in worse performance.

Citation status

* References for papers published after 2023 are currently being built.