@article{ART001096809},
author={Shim, Kyung and Young-Mee Chung},
title={The Effect of the Quality of Pre-Assigned Subject Categories on the Text Categorization Performance},
journal={Journal of the Korean Society for Information Management},
issn={1013-0799},
year={2006},
volume={23},
number={2},
pages={265-285},
doi={10.3743/KOSIM.2006.23.2.265}
TY - JOUR
AU - Shim, Kyung
AU - Young-Mee Chung
TI - The Effect of the Quality of Pre-Assigned Subject Categories on the Text Categorization Performance
JO - Journal of the Korean Society for Information Management
PY - 2006
VL - 23
IS - 2
PB - 한국정보관리학회
SP - 265
EP - 285
SN - 1013-0799
AB - In text categorization a certain level of correctness of labels assigned to training documents is assumed without solid knowledge on that of real-world collections. Our research attempts to explore the quality of pre-assigned subject categories in a real-world collection, and to identify the relationship between the quality of category assignment in training set and text categorization performance. Particularly, we are interested in to what extent the performance can be improved by enhancing the quality (i.e., correctness) of category assignment in training documents. A collection of 1,150 abstracts in computer science is re-classified by an expert group, and divided into 907 training documents and 227 test documents (15 duplicates are removed). The performances of before and after re-classification groups, called Initial set and Recat-1/Recat-2 sets respectively, are compared using a kNN classifier. The average correctness of subject categories in the Initial set is 16%, and the categorization performance with the Initial set shows 17% in F1 value. On the other hand, the Recat-1 set scores F1 value of 61%, which is 3.6 times higher than that of the Initial set.
KW - Text categorization;test collections;kNN;training sets
DO - 10.3743/KOSIM.2006.23.2.265
ER -
Shim, Kyung and Young-Mee Chung. (2006). The Effect of the Quality of Pre-Assigned Subject Categories on the Text Categorization Performance. Journal of the Korean Society for Information Management, 23(2), 265-285.
Shim, Kyung and Young-Mee Chung. 2006, "The Effect of the Quality of Pre-Assigned Subject Categories on the Text Categorization Performance", Journal of the Korean Society for Information Management, vol.23, no.2 pp.265-285. Available from: doi:10.3743/KOSIM.2006.23.2.265
Shim, Kyung, Young-Mee Chung "The Effect of the Quality of Pre-Assigned Subject Categories on the Text Categorization Performance" Journal of the Korean Society for Information Management 23.2 pp.265-285 (2006) : 265.
Shim, Kyung, Young-Mee Chung. The Effect of the Quality of Pre-Assigned Subject Categories on the Text Categorization Performance. 2006; 23(2), 265-285. Available from: doi:10.3743/KOSIM.2006.23.2.265
Shim, Kyung and Young-Mee Chung. "The Effect of the Quality of Pre-Assigned Subject Categories on the Text Categorization Performance" Journal of the Korean Society for Information Management 23, no.2 (2006) : 265-285.doi: 10.3743/KOSIM.2006.23.2.265
Shim, Kyung; Young-Mee Chung. The Effect of the Quality of Pre-Assigned Subject Categories on the Text Categorization Performance. Journal of the Korean Society for Information Management, 23(2), 265-285. doi: 10.3743/KOSIM.2006.23.2.265
Shim, Kyung; Young-Mee Chung. The Effect of the Quality of Pre-Assigned Subject Categories on the Text Categorization Performance. Journal of the Korean Society for Information Management. 2006; 23(2) 265-285. doi: 10.3743/KOSIM.2006.23.2.265
Shim, Kyung, Young-Mee Chung. The Effect of the Quality of Pre-Assigned Subject Categories on the Text Categorization Performance. 2006; 23(2), 265-285. Available from: doi:10.3743/KOSIM.2006.23.2.265
Shim, Kyung and Young-Mee Chung. "The Effect of the Quality of Pre-Assigned Subject Categories on the Text Categorization Performance" Journal of the Korean Society for Information Management 23, no.2 (2006) : 265-285.doi: 10.3743/KOSIM.2006.23.2.265