본문 바로가기
  • Home

A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods

  • Journal of the Korean Society for Information Management
  • Abbr : JKOSIM
  • 2008, 25(1), pp.211~233
  • DOI : 10.3743/KOSIM.2008.25.1.211
  • Publisher : 한국정보관리학회
  • Research Area : Interdisciplinary Studies > Library and Information Science
  • Received : February 19, 2008
  • Accepted : March 10, 2008
  • Published : March 30, 2008

Kim, Pan Jun 1

1신라대학교

Accredited

ABSTRACT

This study examines various weighting methods for improving the performance of automatic classification based on Rocchio algorithm on two collections(LISA, Reuters-21578). First, three factors for weighting are identified as document factor, document factor, category factor for each weighting schemes, the performance of each was investigated. Second, the performance of combined weighting methods between the single schemes were examined. As a result, for the single schemes based on each factor, category-factor-based schemes showed the best performance, document set-factor-based schemes the second, and document-factor-based schemes the worst. For the combined weighting schemes, the schemes(idf*cat) which combine document set factor with category factor show better performance than the combined schemes(tf*cat or ltf*cat) which combine document factor with category factor as well as the common schemes(tfidf or ltfidf) that combining document factor with document set factor. However, according to the results of comparing the single weighting schemes with combined weighting schemes in the view of the collections, while category-factor-based schemes(cat only) perform best on LISA, the combined schemes(idf*cat) which combine document set factor with category factor showed best performance on the Reuters-21578. Therefore for the practical application of the weighting methods, it needs careful consideration of the categories in a collection for automatic classification.

Citation status

* References for papers published after 2023 are currently being built.