본문 바로가기
  • Home

Comparison between Word Embedding Techniques in Traditional Korean Medicine for Data Analysis: Implementation of a Natural Language Processing Method

  • The Journal Of Korean Medical Classics
  • Abbr : JKMC
  • 2019, 32(1), pp.61~74
  • DOI : 10.14369/jkmc.2019.32.1.061
  • Publisher : 대한한의학원전학회
  • Research Area : Medicine and Pharmacy > Korean Medicine
  • Received : January 18, 2019
  • Accepted : February 11, 2019
  • Published : February 25, 2019

Oh Junho 1

1한국한의학연구원

Accredited

ABSTRACT

Objectives : The purpose of this study is to help select an appropriate word embedding method when analyzing East Asian traditional medicine texts as data. Methods : Based on prescription data that imply traditional methods in traditional East Asian medicine, we have examined 4 count-based word embedding and 2 prediction-based word embedding methods. In order to intuitively compare these word embedding methods, we proposed a "prescription generating game" and compared its results with those from the application of the 6 methods. Results : When the adjacent vectors are extracted, the count-based word embedding method derives the main herbs that are frequently used in conjunction with each other. On the other hand, in the prediction-based word embedding method, the synonyms of the herbs were derived. Conclusions : Counting based word embedding methods seems to be more effective than prediction-based word embedding methods in analyzing the use of domesticated herbs. Among count-based word embedding methods, the TF-vector method tends to exaggerate the frequency effect, and hence the TF-IDF vector or co-word vector may be a more reasonable choice. Also, the t-score vector may be recommended in search for unusual information that could not be found in frequency. On the other hand, prediction-based embedding seems to be effective when deriving the bases of similar meanings in context.

Citation status

* References for papers published after 2023 are currently being built.