Comparison between Word Embedding Techniques in Traditional Korean Medicine for Data Analysis: Implementation of a Natural Language Processing Method (한의학 고문헌 데이터 분석을 위한 단어 임베딩 기법 비교: 자연어처리 방법을 적용하여)

Oh Junho (오준호)

doi:10.14369/jkmc.2019.32.1.061

Comparison between Word Embedding Techniques in Traditional Korean Medicine for Data Analysis: Implementation of a Natural Language Processing Method

The Journal Of Korean Medical Classics
Abbr : JKMC
2019, 32(1), pp.61~74
DOI : 10.14369/jkmc.2019.32.1.061
Publisher : 대한한의학원전학회
Research Area : Medicine and Pharmacy > Korean Medicine
Received : January 18, 2019
Accepted : February 11, 2019
Published : February 25, 2019

Oh Junho ¹

¹한국한의학연구원

Accredited

ABSTRACT

Objectives : The purpose of this study is to help select an appropriate word embedding method when analyzing East Asian traditional medicine texts as data. Methods : Based on prescription data that imply traditional methods in traditional East Asian medicine, we have examined 4 count-based word embedding and 2 prediction-based word embedding methods. In order to intuitively compare these word embedding methods, we proposed a "prescription generating game" and compared its results with those from the application of the 6 methods. Results : When the adjacent vectors are extracted, the count-based word embedding method derives the main herbs that are frequently used in conjunction with each other. On the other hand, in the prediction-based word embedding method, the synonyms of the herbs were derived. Conclusions : Counting based word embedding methods seems to be more effective than prediction-based word embedding methods in analyzing the use of domesticated herbs. Among count-based word embedding methods, the TF-vector method tends to exaggerate the frequency effect, and hence the TF-IDF vector or co-word vector may be a more reasonable choice. Also, the t-score vector may be recommended in search for unusual information that could not be found in frequency. On the other hand, prediction-based embedding seems to be effective when deriving the bases of similar meanings in context.

KEYWORDS

Word Embedding, East Asian Traditional Medicine, Korean Medicine, Data Analysis, Natural Language Processing

KJCKorea
Journal Central

The Journal Of Korean Medical Classics 2024 KCI Impact Factor : 0.28

Comparison between Word Embedding Techniques in Traditional Korean Medicine for Data Analysis: Implementation of a Natural Language Processing Method

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2024 are currently being built.

The Journal Of Korean Medical Classics 2024 KCI Impact Factor : 0.28

Comparison between Word Embedding Techniques in Traditional Korean Medicine for Data Analysis: Implementation of a Natural Language Processing Method

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (5)

REFERENCES (29) * References for papers published after 2024 are currently being built.

Search PDF

Citation

* References for papers published after 2024 are currently being built.