Valid Data Conditions and Discrimination for Machine Learning: Case study on Dataset in the Public Data Portal (기계학습에 유효한 데이터 요건 및 선별: 공공데이터포털 제공 데이터 사례를 통해)

Hyo-Jung Oh (오효정); Yun Bo-Hyun (윤보현)

doi:10.20465/KIOTS.2022.8.1.037

Valid Data Conditions and Discrimination for Machine Learning: Case study on Dataset in the Public Data Portal

Journal of Internet of Things and Convergence
Abbr : JKIOTS
2022, 8(1), pp.37~43
DOI : 10.20465/KIOTS.2022.8.1.037
Publisher : The Korea Internet of Things Society
Research Area : Engineering > Computer Science > Internet Information Processing
Received : November 20, 2021
Accepted : January 15, 2022
Published : February 28, 2022

Hyo-Jung Oh ¹, Yun Bo-Hyun ²

¹전북대학교
²목원대학교

Accredited

ABSTRACT

The fundamental basis of AI technology is learningable data. Recently, the types and amounts of data collected and produced by the government or private companies are increasing exponentially, however, verified data that can be used for actual machine learning has not yet led to it. This study discusses the conditions that data actually can be used for machine learning should meet, and identifies factors that degrade data quality through case studies. To this end, two representative cases of developing a prediction model using public big data was selected, and data for actual problem solving was collected from the public data portal. Through this, there is a difference from the results of applying valid data screening criteria and post-processing. The ultimate purpose of this study is to argue the importance of data quality management that must be most fundamentally preceded before the development of machine learning technology, which is the core of artificial intelligence, and accumulating valid data.

KEYWORDS

Valid Data, Machine Learning, Data Discrimination, Quality of Data, Public Big data

Citation status

* References for papers published after 2025 are currently being built.

This paper was written with support from the National Research Foundation of Korea.

KJCKorea
Journal Central

Journal of Internet of Things and Convergence 2025 KCI Impact Factor : 0.75

Valid Data Conditions and Discrimination for Machine Learning: Case study on Dataset in the Public Data Portal

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2025 are currently being built.

Journal of Internet of Things and Convergence 2025 KCI Impact Factor : 0.75

Valid Data Conditions and Discrimination for Machine Learning: Case study on Dataset in the Public Data Portal

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (2)

REFERENCES (15) * References for papers published after 2025 are currently being built.

Search PDF

Citation

* References for papers published after 2025 are currently being built.