@article{ART001242185},
author={Kim, Pan Jun},
title={A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods},
journal={Journal of the Korean Society for Information Management},
issn={1013-0799},
year={2008},
volume={25},
number={1},
pages={211-233},
doi={10.3743/KOSIM.2008.25.1.211}
TY - JOUR
AU - Kim, Pan Jun
TI - A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods
JO - Journal of the Korean Society for Information Management
PY - 2008
VL - 25
IS - 1
PB - 한국정보관리학회
SP - 211
EP - 233
SN - 1013-0799
AB - This study examines various weighting methods for improving the performance of automatic classification based on Rocchio algorithm on two collections(LISA, Reuters-21578). First, three factors for weighting are identified as document factor, document factor, category factor for each weighting schemes, the performance of each was investigated. Second, the performance of combined weighting methods between the single schemes were examined. As a result, for the single schemes based on each factor, category-factor-based schemes showed the best performance, document set-factor-based schemes the second, and document-factor-based schemes the worst. For the combined weighting schemes, the schemes(idf*cat) which combine document set factor with category factor show better performance than the combined schemes(tf*cat or ltf*cat) which combine document factor with category factor as well as the common schemes(tfidf or ltfidf) that combining document factor with document set factor. However, according to the results of comparing the single weighting schemes with combined weighting schemes in the view of the collections, while category-factor-based schemes(cat only) perform best on LISA, the combined schemes(idf*cat) which combine document set factor with category factor showed best performance on the Reuters-21578. Therefore for the practical application of the weighting methods, it needs careful consideration of the categories in a collection for automatic classification.
KW - weighting;Rocchio algorithm;classifier;automatic classification;text categorization;feature selection principles
DO - 10.3743/KOSIM.2008.25.1.211
ER -
Kim, Pan Jun. (2008). A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods. Journal of the Korean Society for Information Management, 25(1), 211-233.
Kim, Pan Jun. 2008, "A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods", Journal of the Korean Society for Information Management, vol.25, no.1 pp.211-233. Available from: doi:10.3743/KOSIM.2008.25.1.211
Kim, Pan Jun "A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods" Journal of the Korean Society for Information Management 25.1 pp.211-233 (2008) : 211.
Kim, Pan Jun. A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods. 2008; 25(1), 211-233. Available from: doi:10.3743/KOSIM.2008.25.1.211
Kim, Pan Jun. "A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods" Journal of the Korean Society for Information Management 25, no.1 (2008) : 211-233.doi: 10.3743/KOSIM.2008.25.1.211
Kim, Pan Jun. A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods. Journal of the Korean Society for Information Management, 25(1), 211-233. doi: 10.3743/KOSIM.2008.25.1.211
Kim, Pan Jun. A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods. Journal of the Korean Society for Information Management. 2008; 25(1) 211-233. doi: 10.3743/KOSIM.2008.25.1.211
Kim, Pan Jun. A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods. 2008; 25(1), 211-233. Available from: doi:10.3743/KOSIM.2008.25.1.211
Kim, Pan Jun. "A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods" Journal of the Korean Society for Information Management 25, no.1 (2008) : 211-233.doi: 10.3743/KOSIM.2008.25.1.211