A Study on Statistical Feature Selection with Supervised Learning for Word Sense Disambiguation (단어 중의성 해소를 위한 지도학습 방법의 통계적 자질선정에 관한 연구)

LeeYong-Gu (이용구)

A Study on Statistical Feature Selection with Supervised Learning for Word Sense Disambiguation

Journal of the Korean Biblia Society for Library and Information Science
2011, 22(2), pp.5~25
Publisher : Journal Of The Korean Biblia Society For Library And Information Science
Research Area : Interdisciplinary Studies > Library and Information Science

LeeYong-Gu ¹

¹계명대학교

Accredited

ABSTRACT

This study aims to identify the most effective statistical feature selecting method and context window size for word sense disambiguation using supervised methods. In this study, features were selected by four different methods: information gain, document frequency, chi-square, and relevancy. The result of weight comparison showed that identifying the most appropriate features could improve word sense disambiguation performance. Information gain was the highest. SVM classifier was not affected by feature selection and showed better performance in a larger feature set and context size. Naive Bayes classifier was the best performance on 10 percent of feature set size. kNN classifier on under 10 percent of feature set size. When feature selection methods are applied to word sense disambiguation, combinations of a small set of features and larger context window size, or a large set of features and small context windows size can make best performance improvements.

KEYWORDS

Word Sense Disambiguation, Statistical Feature Selection, Context Size, SVM, Naive Bayes Classifier, kNN Classifier

Citation status

* References for papers published after 2024 are currently being built.

[book] 정영미 / 2005 / 정보검색연구 / 구미무역(주) 출판부

[journal] 정영미 / 2005 / Improving the Retrieval Effectiveness by Incorporating Word Sense Disambiguation Process / 정보관리학회지 / 한국정보관리학회 22(2) : 125~145

[confproc] Escudero, G / 2000 / Naive Bayes and Exemplar-based Approaches to Word Sense Disambiguation Revisited / Proceedings of the 14th European Conference on Artificial Intelligence : 421~425

[journal] Fragos, K / 2008 / Disambiguation of Greek Polysemous Words Using Hierachical Probabilistic Networks and a Chi-square Feature Selection Strategy / International Journal on Artificial Intelligence Tools 17(4) : 687~701

[confproc] Gale, W / 1992 / One Sense per Discourse / Proceedings of the DARPA Speech and Natural Language Workshop : 233~237

[journal] Guyon, I / 2003 / An Introduction to Variable and Feature Selection / Journal of Machine Learning Research 3 : 1157~1182

[confproc] Hoste, V / 2002 / Evaluating the Results of a Memory-based Word-expert Approach to Unrestricted Word Sense Disambiguation / Proceedings of the Workshop on Word Sense Disambiguation: Recent Successes and Future Directions : 95~101

[confproc] Jackson, P / 2002 / Natural Language Processing for Online Applications:Text Retrieval / Processing for Online Applications:Text Retrieval, Extraction and Categorization / Benjamins Publishing Co

[book] Joachims, T / 2001 / Learning to Classify Text Using Support Vector Machines / Kluwer Academic Publishers

[journal] Mihalcea, R / 2002 / Word Sense Disambiguation with Pattern Learning and Automatic Feature Selection / Natural Language Engineering 8(4) : 343~358

[journal] Navigli, R / 2009 / Word Sense Disambiguation: A Survey / ACM Computing Surveys 41(2) : 1~69

[confproc] Ng, T / 1996 / Integrating Multiple Knowledge Sources to Disambiguate Word Senses: An Exemplar-based Approach / Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics : 40~47

[confproc] Orhan, Z / 2006 / Impact of Feature Selection for Corpus-based WSD in Turkish / Proceedings of the MICAI 2006: Advances in Artificial Intelligence (LNCS 4293) : 868~878

[journal] Sebastiani, F / 2002 / Machine Learning in Automated Text Categorization / ACM Computing Surveys 34(1) : 1~47

[book] Stevenson, M / 2003 / Word Sense Disambiguation: The Case for Combinations for Knowledge Sources / CSLI Publications

[journal] Stevenson, M / 2001 / The Interaction of Knowledge Sources in Word Sense Disambiguation / Computational Linguistics 27(3) : 321~349

[confproc] Strapparava, C / 2004 / Pattern Abstraction and Term Similarity for Word Sense Disambiguation:IRST at Senseval-3 / Proceedings of Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text : 229~234

[confproc] Suarez, A / 2002 / Improving Feature Selection for Maximum Entropybased Word Sense Disambiguation / Proceedings of the PorTAL 2002(LNAI 2389) : 15~23

[confproc] Yang, Y / 1997 / A Comparative Study on Feature Selection in Text Categorization / Proceedings of the 14th International Conference on Machine Learning : 412~420

KJCKorea
Journal Central

Journal of the Korean Biblia Society for Library and Information Science 2024 KCI Impact Factor : 1.0