본문 바로가기
  • Home

Comparison of Word Extraction Methods Based on Unsupervised Learning for Analyzing East Asian Traditional Medicine Texts

  • The Journal Of Korean Medical Classics
  • Abbr : JKMC
  • 2019, 32(3), pp.47~57
  • DOI : 10.14369/jkmc.2019.32.3.047
  • Publisher : 대한한의학원전학회
  • Research Area : Medicine and Pharmacy > Korean Medicine
  • Received : July 22, 2019
  • Accepted : August 5, 2019
  • Published : August 25, 2019

Oh Junho 1

1한국한의학연구원

Accredited

ABSTRACT

Objectives : We aim to assist in choosing an appropriate method for word extraction when analyzing East Asian Traditional Medical texts based on unsupervised learning. Methods : In order to assign ranks to substrings, we conducted a test using one method(BE:Branching Entropy) for exterior boundary value, three methods(CS:cohesion score, TS:t-score, SL:simple-ll) for interior boundary value, and six methods(BExSL, BExTS, BExCS, CSxTS, CSxSL, TSxSL) from combining them. Results : When Miss Rate(MR) was used as the criterion, the error was minimal when the TS and SL were used together, while the error was maximum when CS was used alone. When number of segmented texts was applied as weight value, the results were the best in the case of SL, and the worst in the case of BE alone. Conclusions : Unsupervised-Learning-Based Word Extraction is a method that can be used to analyze texts without a prepared set of vocabulary data. When using this method, SL or the combination of SL and TS could be considered primarily.

Citation status

* References for papers published after 2023 are currently being built.