본문 바로가기
  • Home

A Case Study on Metadata Extraction for Records Management Using ChatGPT

  • Journal of Korean Society of Archives and Records Management
  • Abbr : JRMASK
  • 2024, 24(2), pp.89~112
  • DOI : 10.14404/JKSARM.2024.24.2.089
  • Publisher : Korean Society of Archives and Records Management
  • Research Area : Interdisciplinary Studies > Library and Information Science > Archival Studies / Conservation
  • Received : April 16, 2024
  • Accepted : May 10, 2024
  • Published : May 31, 2024

Minji Kim 1 Sunghee Kang 2 Rieh, Hae-Young 3

1명지대학교 기록정보과학전문대학원 데이터기록전공 석사
2명지대학교 기록정보과학전문대학원 데이터기록전공 교수
3명지대학교

Accredited

ABSTRACT

Metadata is a crucial component of record management, playing a vital role in properly managing and understanding the record. In cases where automatic metadata assignment is not feasible, manual input by records professionals becomes necessary. This study aims to alleviate the challenges associated with manual entry by proposing a method that harnesses ChatGPT technology for extracting records management metadata elements. To employ ChatGPT technology, a Python program utilizing the LangChain library was developed. This program was designed to analyze PDF documents and extract metadata from records through questions, both with a locally installed instance of ChatGPT and the ChatGPT online service. Multiple PDF documents were subjected to this process to test the effectiveness of metadata extraction. The results revealed that while using LangChain with ChatGPT-3.5 turbo provided a secure environment, it exhibited some limitations in accurately retrieving metadata elements. Conversely, the ChatGPT-4 online service yielded relatively accurate results despite being unable to handle sensitive documents for security reasons. This exploration underscores the potential of utilizing ChatGPT technology to extract metadata in records management. With advancements in ChatGPT-related technologies, safer and more accurate results are expected to be achieved. Leveraging these advantages can significantly enhance the efficiency and productivity of tasks associated with managing records and metadata in archives.

Citation status

* References for papers published after 2023 are currently being built.