본문 바로가기
  • Home

A Study on the Classification of Unstructured Data through Morpheme Analysis

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2021, 26(4), pp.105-112
  • DOI : 10.9708/jksci.2021.26.04.105
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : March 4, 2021
  • Accepted : March 29, 2021
  • Published : April 30, 2021

SungJin Kim 1 NakJin Choi 1 Lee Jun Dong 1

1강릉원주대학교

Accredited

ABSTRACT

In the era of big data, interest in data is exploding. In particular, the development of the Internet and social media has led to the creation of new data, enabling the realization of the era of big data and artificial intelligence and opening a new chapter in convergence technology. Also, in the past, there are many demands for analysis of data that could not be handled by programs. In this paper, an analysis model was designed and verified for classification of unstructured data, which is often required in the era of big data. Data crawled DBPia's thesis summary, main words, and sub-keyword, and created a database using KoNLP’s data dictionary, and tokenized words through morpheme analysis. In addition, nouns were extracted using KAIST's 9 part-of-speech classification system, TF-IDF values were generated, and an analysis dataset was created by combining training data and Y values. Finally, The adequacy of classification was measured by applying three analysis algorithms(random forest, SVM, decision tree) to the generated analysis dataset. The classification model technique proposed in this paper can be usefully used in various fields such as civil complaint classification analysis and text-related analysis in addition to thesis classification.

Citation status

* References for papers published after 2023 are currently being built.

This paper was written with support from the National Research Foundation of Korea.