Comparison of Word Extraction Methods Based on Unsupervised Learning for Analyzing East Asian Traditional Medicine Texts (한의학 고문헌 텍스트 분석을 위한 비지도학습 기반 단어 추출 방법 비교)

Oh Junho (오준호)

doi:10.14369/jkmc.2019.32.3.047

Comparison of Word Extraction Methods Based on Unsupervised Learning for Analyzing East Asian Traditional Medicine Texts

The Journal Of Korean Medical Classics
Abbr : JKMC
2019, 32(3), pp.47~57
DOI : 10.14369/jkmc.2019.32.3.047
Publisher : 대한한의학원전학회
Research Area : Medicine and Pharmacy > Korean Medicine
Received : July 22, 2019
Accepted : August 5, 2019
Published : August 25, 2019

Oh Junho ¹

¹한국한의학연구원

Accredited

ABSTRACT

Objectives : We aim to assist in choosing an appropriate method for word extraction when analyzing East Asian Traditional Medical texts based on unsupervised learning. Methods : In order to assign ranks to substrings, we conducted a test using one method(BE:Branching Entropy) for exterior boundary value, three methods(CS:cohesion score, TS:t-score, SL:simple-ll) for interior boundary value, and six methods(BExSL, BExTS, BExCS, CSxTS, CSxSL, TSxSL) from combining them. Results : When Miss Rate(MR) was used as the criterion, the error was minimal when the TS and SL were used together, while the error was maximum when CS was used alone. When number of segmented texts was applied as weight value, the results were the best in the case of SL, and the worst in the case of BE alone. Conclusions : Unsupervised-Learning-Based Word Extraction is a method that can be used to analyze texts without a prepared set of vocabulary data. When using this method, SL or the combination of SL and TS could be considered primarily.

KEYWORDS

Text segmentation, Word extraction, tokenization, East Asian Traditional Medicine, Korean medicine

Citation status

* References for papers published after 2024 are currently being built.

[book] Huang Yongnian / 2018 / Introduction of Ancient books Arrangement / Institute for the Translation of Korean Classics : 209~

[journal] 김현중 / 2014 / KR-WordRank : An Unsupervised Korean Word Extraction Method Based on WordRank / 대한산업공학회지 / 대한산업공학회 40(1) : 18~33

[confproc] Stefan Bordag / 2008 / A Comparison of Co-occurrence and Similarity Measures as Simulations of Context / Computational Linguistics and Intelligent Text Processing / Springer : 52~63

[confproc] Zhihui Jin / 2006 / Unsupervised Segmentation of Chinese Text by Use of Branching Entropy / Proceedings of the COLING/ACL 2006Main Conference Poster Sessions : 428~435

[book] 黃永年 / 2018 / 고적정리개론 / 한국고전번역원 : 209~

[book] Chinese Medical Database / 2003 / Beijing / Hunan Electronic Audio and Video Publishing House

[web] Hyunjoong Kim / LOVITxDATA SCIENCE / https://lovit.github.io/nlp/2018/04/09/cohesion_ltokenizer

[web] Korea Institute of Oriental Medicine / Mediclassics / https://mediclassics.kr

[book] 中华医典 / 2003 / 中国中医药学会 / 湖南电子音像出版社

[web] 한국한의학연구원 / 한의학고전DB / https://mediclassics.kr

KJCKorea
Journal Central

The Journal Of Korean Medical Classics 2024 KCI Impact Factor : 0.28

Comparison of Word Extraction Methods Based on Unsupervised Learning for Analyzing East Asian Traditional Medicine Texts

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2024 are currently being built.

The Journal Of Korean Medical Classics 2024 KCI Impact Factor : 0.28

Comparison of Word Extraction Methods Based on Unsupervised Learning for Analyzing East Asian Traditional Medicine Texts

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (3)

REFERENCES (10) * References for papers published after 2024 are currently being built.

Search PDF

Citation

* References for papers published after 2024 are currently being built.