Improving the Retrieval Effectiveness by Incorporating Word Sense Disambiguation Process (정보검색 성능 향상을 위한 단어 중의성 해소 모형에 관한 연구)

Young-Mee Chung (정영미); Yong-Gu Lee (이용구)

doi:10.3743/KOSIM.2005.22.2.125

Improving the Retrieval Effectiveness by Incorporating Word Sense Disambiguation Process

Journal of the Korean Society for Information Management
Abbr : JKOSIM
2005, 22(2), pp.125~145
DOI : 10.3743/KOSIM.2005.22.2.125
Publisher : 한국정보관리학회
Research Area : Interdisciplinary Studies > Library and Information Science
Received : May 20, 2005
Accepted : June 20, 2005
Published : June 30, 2005

Young-Mee Chung ¹, Yong-Gu Lee ²

¹연세대학교
²계명대학교

Accredited

ABSTRACT

This paper presents a semantic vector space retrieval model incorporating a word sense disambiguation algorithm in an attempt to improve retrieval effectiveness. Nine Korean homonyms are selected for the sense disambiguation and retrieval experiments. The total of approximately 120,000 news articles comprise the raw test collection and 18 queries including homonyms as query words are used for the retrieval experiments. A Naive Bayes classifier and EM algorithm representing supervised and unsupervised learning algorithms respectively are used for the disambiguation process. The Naive Bayes classifier achieved 92% disambiguation accuracy, while the clustering performance of the EM algorithm is 67% on the average. The retrieval effectiveness of the semantic vector space model incorporating the Naive Bayes classifier showed 39.6% precision achieving about 7.4% improvement. However, the retrieval effectiveness of the EM algorithm-based semantic retrieval is 3% lower than the baseline retrieval without disambiguation. It is worth noting that the performances of disambiguation and retrieval depend on the distribution patterns of homonyms to be disambiguated as well as the characteristics of queries.

KEYWORDS

information retrieval, word sense disambiguation, Naive Bayes classifier, EM algorithm, clustering, retrieval effectiveness

Citation status

* References for papers published after 2025 are currently being built.

[journal] / 2001 / 사전의 뜻풀이말에서 추출한 의미정보에 기반한 동형이의어 중의성 해결 시스템 소프트웨어 및 응용 : 688~698

[journal] / 2001 / “A Corpus-based Approach to Com- parative Evaluation of Statistical Term Association Measures Journal of the American Society for Infor- mation Science and Technology : 283~296

[journal] / 1992 / “A Method for Disambigu- ating Word Sense in a Large Corpus : 415~439

[journal] / 1992a / “Estimating Upper and Lower Bounds on the Performance of Word Sense Disambiguation Programs Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics : 249~256

[journal] / 1992b / Proceedings of the Speech and Natural Language Workshop : 233~237

[journal] / 1993 / “A method for disambiguating word senses in a large corpus : 415~439

[journal] / 1998 / the state of the art : 1~40

[journal] / 2002 / Natural Language Processing for Online Applications / John Benjamins Publishing Company

[journal] / 2005 / “An analysis of web searching by European AlltheWeb : 361~381

[journal] / 2000 / a study and analysis of user queries on the web : 207~227

[journal] / 1992 / ACM Transactions on Information Retrieval Systems : 115~141

[journal] / 1999 / “Corpus-based method for unsupervised word sense disambigu- ation Proceedings of the Workshop on Machine Learning in Human Language Technology Advanced Cou- rse on Artificial Intelligence : 267~273

[journal] / 1999 / Foundations of Statistical Natural Language Processing / MIT Press

[journal] / 1994 / Proceedings of the 17th international ACM SIGIR : 49~57

[journal] / 2000 / “Retrieving with good : 49~69

[journal] / 1995 / “Information retrieval based on word sense Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval : 161~175

[journal] / 2003 / the Case for Combinations for Knowledge Sources

[journal] / 2003 / “Word sense disambiguation in information retrieval revisited Pro- ceedings of the 26th ACM SIGIR : 159~166

[journal] / 1999 / Proceedings of the Seventh Text Retrieval Conference

[journal] / 1993 / “Using WordNet to disambiguate word senses for text retrieval Proceedings of SIGIR '93 : 171~180

[journal] / 1995189-196 / Annual Meeting of the ACL Archive Proceedings of the 33rd conference on Association for Computational Linguistics

This paper was written with support from the National Research Foundation of Korea.

KJCKorea
Journal Central

Journal of the Korean Society for Information Management 2025 KCI Impact Factor : 1.27

Improving the Retrieval Effectiveness by Incorporating Word Sense Disambiguation Process

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2025 are currently being built.

Journal of the Korean Society for Information Management 2025 KCI Impact Factor : 1.27

Improving the Retrieval Effectiveness by Incorporating Word Sense Disambiguation Process

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (5)

REFERENCES (22) * References for papers published after 2025 are currently being built.

Search PDF

Citation

* References for papers published after 2025 are currently being built.