본문 바로가기
  • Home

Empirical Study on Improving the Performance of Text Categorization Considering the Relationships between Feature Selection Criterea and Weighting Methods

  • Journal of the Korean Society for Library and Information Science
  • 2005, 39(2), pp.123-146
  • Publisher : 한국문헌정보학회
  • Research Area : Interdisciplinary Studies > Library and Information Science

Jae Yun Lee 1

1경기대학교

Accredited

ABSTRACT

This study aims to find consistent strategies for feature selection and feature weighting methods, which can improve the effectiveness and efficiency of kNN text classifier. Feature selection criteria and feature weighting methods are as important factor as classification algorithms to achieve good performance of text categorization systems. Most of the former studies chose conflicting strategies for feature selection criteria and weighting methods. In this study, the performance of several feature selection criteria are measured considering the storage space for inverted index records and the classification time. The classification experiments in this study are conducted to examine the performance of IDF as feature selection criteria and the performance of conventional feature selection criteria, e.g. mutual information, as feature weighting methods. The results of these experiments suggest that using those measures which prefer low-frequency features as feature selection criterion and also as feature weighting method, we can increase the classification speed up to three or five times without loosing classification accuracy.

Citation status

* References for papers published after 2022 are currently being built.