본문 바로가기
  • Home

Improving the Retrieval Effectiveness by Incorporating Word Sense Disambiguation Process

  • Journal of the Korean Society for Information Management
  • Abbr : JKOSIM
  • 2005, 22(2), pp.125~145
  • DOI : 10.3743/KOSIM.2005.22.2.125
  • Publisher : 한국정보관리학회
  • Research Area : Interdisciplinary Studies > Library and Information Science
  • Received : May 20, 2005
  • Accepted : June 20, 2005
  • Published : June 30, 2005

Young-Mee Chung 1 LeeYong-Gu 2

1연세대학교
2계명대학교

Accredited

ABSTRACT

This paper presents a semantic vector space retrieval model incorporating a word sense disambiguation algorithm in an attempt to improve retrieval effectiveness. Nine Korean homonyms are selected for the sense disambiguation and retrieval experiments. The total of approximately 120,000 news articles comprise the raw test collection and 18 queries including homonyms as query words are used for the retrieval experiments. A Naive Bayes classifier and EM algorithm representing supervised and unsupervised learning algorithms respectively are used for the disambiguation process. The Naive Bayes classifier achieved 92% disambiguation accuracy, while the clustering performance of the EM algorithm is 67% on the average. The retrieval effectiveness of the semantic vector space model incorporating the Naive Bayes classifier showed 39.6% precision achieving about 7.4% improvement. However, the retrieval effectiveness of the EM algorithm-based semantic retrieval is 3% lower than the baseline retrieval without disambiguation. It is worth noting that the performances of disambiguation and retrieval depend on the distribution patterns of homonyms to be disambiguated as well as the characteristics of queries.

Citation status

* References for papers published after 2023 are currently being built.

This paper was written with support from the National Research Foundation of Korea.