본문 바로가기
  • Home

A Study on Automatic DDC Classification of Documents in Technology

  • Journal of the Korean Society for Library and Information Science
  • 2026, 60(1), pp.173~194
  • Publisher : 한국문헌정보학회
  • Research Area : Interdisciplinary Studies > Library and Information Science
  • Received : January 20, 2026
  • Accepted : February 3, 2026
  • Published : February 28, 2026

KANG WOOJIN 1 SANGO NA 1 Jongwook Lee 1

1경북대학교

Excellent Accredited

ABSTRACT

This study investigates the automatic classification of documents in the Dewey Decimal Classification (DDC) Technology class (600) using machine learning models, with the aim of overcoming the limitations of title-based classification approaches. To enhance classification performance, descriptive document information, such as summaries and introductions, was incorporated as additional classification features. Three machine learning models—Omikuji, FastText, and BERT—were employed, and classification performance was evaluated at both the main class and division levels. Accuracy and F1-score were used as evaluation metrics. The results demonstrate that BERT consistently outperformed FastText and Omikuji across most experimental conditions. With the exception of the division-level F1-score of the Omikuji model, all models showed improved performance when descriptive information was added. In particular, the BERT-based model achieved an accuracy of 79.52% at the division level, representing an improvement of approximately 8.62 percentage points compared to previous studies. The findings also indicate that classification performance generally improves as the volume of documents used in model training increases, underscoring the importance of data scale in addition to feature selection. These results suggest that competitive automatic classification performance can be achieved through appropriate model selection and enriched classification features, even within single-model approaches. Future research should expand the scope to all DDC classes and examine the applicability of the proposed approach to the Korean Decimal Classification (KDC), as well as explore additional features and alternative machine learning models.

Citation status

* References for papers published after 2024 are currently being built.