본문 바로가기
  • Home

Word Embedding Based Clustering Method for Query Expansion of Korean Medicine Symptoms

  • Journal of Knowledge Information Technology and Systems
  • Abbr : JKITS
  • 2020, 15(5), pp.863-874
  • DOI : 10.34163/jkits.2020.15.5.028
  • Publisher : Korea Knowledge Information Technology Society
  • Research Area : Interdisciplinary Studies > Interdisciplinary Research
  • Received : September 16, 2020
  • Accepted : October 13, 2020
  • Published : October 31, 2020

Yea Sang-Jun 1 Lee Sanghun 1 Jiwon Yoon 1 Jang Ho 1

1한국한의학연구원

Accredited

ABSTRACT

In the information age, smart search is essential due to the enormous amount of accumulated information. However most of the query entered into search engines are general nouns with various meanings and are only about 2.4 words on average, making it difficult for search engines to grasp the exact search intention of users. To mitigate this situation, figuring out exact search intention of users is supported by query expansion. In order to develop the query expansion of Korean Medicine clinical decision support system (KM-CDSS), we suggest novel algorithm that consist of 4 steps; first, symptom names are extracted from the prescription information fetched by the initial query. second, extracted symptoms are embedded into a vector space. third, vector space are clustered and each cluster’s quality is evaluated by silhouette coefficient. finally, each words which are laid nearest to the center is suggested as representative symptom. In the experiments, we examined relevance and comprehensiveness qualitatively by KM doctors and utility are analyzed quantitatively. The results of the evaluation of the three indicators were analyzed in an integrated manner. The proposed model showed good enough results in relevance and comprehensiveness test and best result in utility test. It turned out that the proposed model is most suitable as a query expansion model for KM-CDSS. If user search logs are collected in the future, it is expected that related search words will be provided in more sophisticated ways through improved query expansion.

Citation status

* References for papers published after 2023 are currently being built.