본문 바로가기
  • Home

A Study on the Domain Discrimination Model of CSV Format Public Open Data

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2023, 28(12), pp.129-136
  • DOI : 10.9708/jksci.2023.28.12.129
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : October 18, 2023
  • Accepted : December 8, 2023
  • Published : December 29, 2023

Ha-Na Jeong 1 Kim Jae Woong 1 Young-Suk Chung 1

1공주대학교

Accredited

ABSTRACT

The government of the Republic of Korea is conducting quality management of public open data by conducting a public data quality management level evaluation. Public open data is provided in various open formats such as XML, JSON, and CSV, with CSV format accounting for the majority. When diagnosing the quality of public open data in CSV format, the quality diagnosis manager determines and diagnoses the domain for each field based on the field name and data within the field of the public open data file. However, it takes a lot of time because quality diagnosis is performed on large amounts of open data files. Additionally, in the case of fields whose meaning is difficult to understand, the accuracy of quality diagnosis is affected by the quality diagnosis person's ability to understand the data. This paper proposes a domain discrimination model for public open data in CSV format using field names and data distribution statistics to ensure consistency and accuracy so that quality diagnosis results are not influenced by the capabilities of the quality diagnosis person in charge, and to support shortening of diagnosis time. As a result of applying the model in this paper, the correct answer rate was about 77%, which is 2.8% higher than the file format open data diagnostic tool provided by the Ministry of Public Administration and Security. Through this, we expect to be able to improve accuracy when applying the proposed model to diagnosing and evaluating the quality management level of public data.

Citation status

* References for papers published after 2023 are currently being built.