A Study on the Classification of Unstructured Data through Morpheme Analysis (형태소 분석을 통한 비정형 데이터 분류 연구)

SungJin Kim (김성진); NakJin Choi (최낙진); Jundong Lee (이준동)

doi:10.9708/jksci.2021.26.04.105

A Study on the Classification of Unstructured Data through Morpheme Analysis

Journal of The Korea Society of Computer and Information
Abbr : JKSCI
2021, 26(4), pp.105~112
DOI : 10.9708/jksci.2021.26.04.105
Publisher : The Korean Society Of Computer And Information
Research Area : Engineering > Computer Science
Received : March 4, 2021
Accepted : March 29, 2021
Published : April 30, 2021

SungJin Kim ¹, NakJin Choi ¹, Jundong Lee ¹

¹강릉원주대학교

Accredited

ABSTRACT

In the era of big data, interest in data is exploding. In particular, the development of the Internet and social media has led to the creation of new data, enabling the realization of the era of big data and artificial intelligence and opening a new chapter in convergence technology. Also, in the past, there are many demands for analysis of data that could not be handled by programs. In this paper, an analysis model was designed and verified for classification of unstructured data, which is often required in the era of big data. Data crawled DBPia's thesis summary, main words, and sub-keyword, and created a database using KoNLP’s data dictionary, and tokenized words through morpheme analysis. In addition, nouns were extracted using KAIST's 9 part-of-speech classification system, TF-IDF values were generated, and an analysis dataset was created by combining training data and Y values. Finally, The adequacy of classification was measured by applying three analysis algorithms(random forest, SVM, decision tree) to the generated analysis dataset. The classification model technique proposed in this paper can be usefully used in various fields such as civil complaint classification analysis and text-related analysis in addition to thesis classification.

KEYWORDS

Big Data, Data Analysis, Visualization, Textmining, Modeling

Citation status

* References for papers published after 2024 are currently being built.

[journal] Barnett, T. P. / 1987 / Origins and levels of monthly and seasonal forecast skill for United States surface air temperatures determined by canonical correlation analysis / Monthly Weather Review 115

[confproc] Key-Sun Choi / 1994 / KAIST tree bank project for Korean: Present and future development / Proceedings of the International Workshop on Sharable Natural Language Resources : 7~14

[journal] Cho Taeho / 2001 / Concepts and Applications of Text Mining / Journal of scientific & technological knowledge infrastructure (5) : 76~85

[journal] Leo Breiman / 2001 / Random Forests / Machine Learning / Springer Science and Business Media LLC 45(1) : 5~32

[book] / 2001 / Encyclopedia of Mathematics / SpringerVerlag

[journal] 최윤정 / 2002 / Interplay of Text Mining and Data Mining for Classifying Web Contents / 인지과학 / 한국인지과학회 13(3) : 33~46

[other] Hsu, Daniel / 2008 / A spectral algorithm for learning hidden markov models / arXiv preprint arXiv:0811.4413

[book] Manning, C. D. / 2008 / Introduction to Information Retrieval / Cambridge University Press : 100~123

[other] Douglas, Laney / 2001 / 3D Data Management: Controlling Data Volume, Velocity and Variety / Gartner. Retrieved February 6, 2001

[journal] Beom Jiin / 2013 / use cases and implications / CEO Focus 312

[journal] 유은순 / 2015 / Study on Extraction of Keywords Using TF-IDF and Text Structure of Novels / 한국컴퓨터정보학회논문지 / 한국컴퓨터정보학회 20(2) : 121~129

[report] / Mary Meeker's 2016 internet trends report

[journal] Bogumił Kamiński / 2018 / A framework for sensitivity analysis of decision trees / Central European Journal of Operations Research / Springer Science and Business Media LLC 26(1) : 135~159

[journal] 박주석 / 2018 / A Comparative Study of Big Data, Open Data, and My Data / 한국빅데이터학회 학회지 / 사)한국빅데이터학회 3(1) : 41~46

[other] Liaw, Andy / 2018 / Documentation for R package randomForest

[journal] 김현종 / 2018 / A Study on Text Mining Methods to Analyze Civil Complaints: Structured Association Analysis / 한국산업정보학회논문지 / 한국산업정보학회 23(3) : 13~24

[thesis] Cho ByungSun / 2020 / A Comparative Study on Requirements Analysis Techniques using Natural Language Processing and Machine Learning / Ajou Univ

[journal] Bryan Bischof / 2020 / Higher order co-occurrence tensors for hypergraphs via face-splitting / Mathematics, Computer Science

[journal] 여현진 / 2020 / Mobile Commerce Brand Identity Strategy by SNS Text mining / 한국컴퓨터정보학회논문지 / 한국컴퓨터정보학회 25(10) : 255~260

[web] Jinyoung Kim / Hello data science / www.hellodatascience.com

[web] / Data collection / KOREA Data Agency / www.dbguide.net

This paper was written with support from the National Research Foundation of Korea.

KJCKorea
Journal Central

Journal of The Korea Society of Computer and Information 2024 KCI Impact Factor : 0.81

A Study on the Classification of Unstructured Data through Morpheme Analysis

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2024 are currently being built.

Journal of The Korea Society of Computer and Information 2024 KCI Impact Factor : 0.81

A Study on the Classification of Unstructured Data through Morpheme Analysis

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (1)

REFERENCES (21) * References for papers published after 2024 are currently being built.

Search PDF

Citation

* References for papers published after 2024 are currently being built.