본문 바로가기
  • Home

Research Trends in Record Management Using Unstructured Text Data Analysis

  • Journal of Korean Society of Archives and Records Management
  • Abbr : JRMASK
  • 2023, 23(4), pp.73~89
  • DOI : 10.14404/JKSARM.2023.23.4.073
  • Publisher : Korean Society of Archives and Records Management
  • Research Area : Interdisciplinary Studies > Library and Information Science > Archival Studies / Conservation
  • Received : October 16, 2023
  • Accepted : November 6, 2023
  • Published : November 30, 2023

Hong, Deok Yong 1 Junseok Heo 2

1부산광역시 수영구청
2㈜에이티앤아이

Accredited

ABSTRACT

This study aims to analyze the frequency of keywords used in Korean abstracts, which are unstructured text data in the domestic record management research field, using text mining techniques to identify domestic record management research trends through distance analysis between keywords. To this end, 1,157 keywords of 77,578 journals were visualized by extracting 1,157 articles from 7 journal types (28 types) searched by major category (complex study) and middle category (literature informatics) from the institutional statistics (registered site, candidate site) of the Korean Citation Index (KCI). Analysis of t-Distributed Stochastic Neighbor Embedding (t-SNE) and Scattertext using Word2vec was performed. As a result of the analysis, first, it was confirmed that keywords such as “record management” (889 times), “analysis” (888 times), “archive” (742 times), “record” (562 times), and “utilization” (449 times) were treated as significant topics by researchers. Second, Word2vec analysis generated vector representations between keywords, and similarity distances were investigated and visualized using t-SNE and Scattertext. In the visualization results, the research area for record management was divided into two groups, with keywords such as “archiving,” “national record management,” “standardization,” “official documents,” and “record management systems” occurring frequently in the first group (past). On the other hand, keywords such as “community,” “data,” “record information service,” “online,” and “digital archives” in the second group (current) were garnering substantial focus.

Citation status

* References for papers published after 2023 are currently being built.