본문 바로가기
  • Home

Valid Data Conditions and Discrimination for Machine Learning: Case study on Dataset in the Public Data Portal

  • Journal of Internet of Things and Convergence
  • Abbr : JKIOTS
  • 2022, 8(1), pp.37-43
  • DOI : 10.20465/KIOTS.2022.8.1.037
  • Publisher : The Korea Internet of Things Society
  • Research Area : Engineering > Computer Science > Internet Information Processing
  • Received : November 20, 2021
  • Accepted : January 15, 2022
  • Published : February 28, 2022

Hyo-Jung Oh 1 Yun Bo-Hyun 2

1전북대학교
2목원대학교

Accredited

ABSTRACT

The fundamental basis of AI technology is learningable data. Recently, the types and amounts of data collected and produced by the government or private companies are increasing exponentially, however, verified data that can be used for actual machine learning has not yet led to it. This study discusses the conditions that data actually can be used for machine learning should meet, and identifies factors that degrade data quality through case studies. To this end, two representative cases of developing a prediction model using public big data was selected, and data for actual problem solving was collected from the public data portal. Through this, there is a difference from the results of applying valid data screening criteria and post-processing. The ultimate purpose of this study is to argue the importance of data quality management that must be most fundamentally preceded before the development of machine learning technology, which is the core of artificial intelligence, and accumulating valid data.

Citation status

* References for papers published after 2023 are currently being built.

This paper was written with support from the National Research Foundation of Korea.