본문 바로가기
  • Home

Suggestions on how to convert official documents to Machine Readable

  • The Korean Journal of Archival Studies
  • 2021, (67), pp.99-138
  • DOI : 10.20923/kjas.2021.67.099
  • Publisher : Korean Society Of Archival Studies
  • Research Area : Interdisciplinary Studies > Library and Information Science
  • Received : December 31, 2020
  • Accepted : January 13, 2021
  • Published : January 31, 2021

YIM JIN HEE 1

1명지대학교

Accredited

ABSTRACT

In the era of big data, analyzing not only structured data but also unstructured data is emerging as an important task. Official documents produced by government agencies are also subject to big data analysis as large text-based unstructured data. From the perspective of internal work efficiency, knowledge management, records management, etc, it is necessary to analyze big data of public documents to derive useful implications. However, since many of the public documents currently held by public institutions are not in open format, a pre-processing process of extracting text from a bitstream is required for big data analysis. In addition, since contextual metadata is not sufficiently stored in the document file, separate efforts to secure metadata are required for high-quality analysis. In conclusion, the current official documents have a low level of machine readability, so big data analysis becomes expensive.

Citation status

* References for papers published after 2022 are currently being built.