Embedding Model-Based Approach to Duplicate Verification in MARC Records (임베딩 모델 기반의 MARC 레코드 중복검증)

Soon-Young Lee (이순영); Min-Geon Song (송민건); Soo-Sang Lee (이수상)

doi:10.16981/kliss.56.4.202512.1

@article{ART003280299},
author={Soon-Young Lee and Min-Geon Song and Soo-Sang Lee},
title={Embedding Model-Based Approach to Duplicate Verification in MARC Records},
journal={Journal of Korean Library and Information Science Society},
issn={2466-2542},
year={2025},
volume={56},
number={4},
pages={1-20},
doi={10.16981/kliss.56.4.202512.1}

TY - JOUR
AU - Soon-Young Lee
AU - Min-Geon Song
AU - Soo-Sang Lee
TI - Embedding Model-Based Approach to Duplicate Verification in MARC Records
JO - Journal of Korean Library and Information Science Society
PY - 2025
VL - 56
IS - 4
PB - Korean Library And Information Science Society
SP - 1
EP - 20
SN - 2466-2542
AB - This study aimed to improve the performance of duplicate verification algorithms for MARC records by applying AI technology. To overcome the limitations of existing rule-based algorithms, we utilized AI embedding models based on semantic similarity of text to vectorize MARC records and verify duplicate records through similarity search and semantic similarity analysis. The specific research methodology consisted of two phases. First, we implemented a duplicate verification algorithm for MARC records based on vector similarity search using embedding models and evaluated its performance using the same dataset as the prior study. Second, reflecting on the evaluation results of the initial experiment, we implemented an algorithm that maximizes the advantages of the embedding approach—specifically, identifying duplicate records caused by variations in string notation. For this purpose, we evaluated the algorithm’s performance using newly constructed experimental data and evaluation metrics. The experimental dataset was designed to reflect notational variations that may occur in actual library settings, applying eight transformation rules. The results of the first experiment showed that the rate of correctly identifying identical groups as duplicates improved compared to the prior study. However, the embedding approach revealed limitations in areas requiring precise matching of numbers and special characters, such as incorrectly judging multi-volume materials with different volume information as similar. The results of the second experiment, designed to validate the advantages of the embedding approach, demonstrated 100% identification of both duplicate records and transformation rules across the entire experimental dataset.
KW - AI;Embedding Models;Vector Similarity Search;MARC Records;Duplicate Verification
DO - 10.16981/kliss.56.4.202512.1
ER -

Soon-Young Lee, Min-Geon Song and Soo-Sang Lee. (2025). Embedding Model-Based Approach to Duplicate Verification in MARC Records. Journal of Korean Library and Information Science Society, 56(4), 1-20.

Soon-Young Lee, Min-Geon Song and Soo-Sang Lee. 2025, "Embedding Model-Based Approach to Duplicate Verification in MARC Records", Journal of Korean Library and Information Science Society, vol.56, no.4 pp.1-20. Available from: doi:10.16981/kliss.56.4.202512.1

Soon-Young Lee, Min-Geon Song, Soo-Sang Lee "Embedding Model-Based Approach to Duplicate Verification in MARC Records" Journal of Korean Library and Information Science Society 56.4 pp.1-20 (2025) : 1.

Soon-Young Lee, Min-Geon Song, Soo-Sang Lee. Embedding Model-Based Approach to Duplicate Verification in MARC Records. 2025; 56(4), 1-20. Available from: doi:10.16981/kliss.56.4.202512.1

Soon-Young Lee, Min-Geon Song and Soo-Sang Lee. "Embedding Model-Based Approach to Duplicate Verification in MARC Records" Journal of Korean Library and Information Science Society 56, no.4 (2025) : 1-20.doi: 10.16981/kliss.56.4.202512.1

Soon-Young Lee; Min-Geon Song; Soo-Sang Lee. Embedding Model-Based Approach to Duplicate Verification in MARC Records. Journal of Korean Library and Information Science Society, 56(4), 1-20. doi: 10.16981/kliss.56.4.202512.1

Soon-Young Lee; Min-Geon Song; Soo-Sang Lee. Embedding Model-Based Approach to Duplicate Verification in MARC Records. Journal of Korean Library and Information Science Society. 2025; 56(4) 1-20. doi: 10.16981/kliss.56.4.202512.1

Soon-Young Lee, Min-Geon Song, Soo-Sang Lee. Embedding Model-Based Approach to Duplicate Verification in MARC Records. 2025; 56(4), 1-20. Available from: doi:10.16981/kliss.56.4.202512.1

Soon-Young Lee, Min-Geon Song and Soo-Sang Lee. "Embedding Model-Based Approach to Duplicate Verification in MARC Records" Journal of Korean Library and Information Science Society 56, no.4 (2025) : 1-20.doi: 10.16981/kliss.56.4.202512.1

KJCKorea
Journal Central

Journal of Korean Library and Information Science Society 2024 KCI Impact Factor : 0.91

Embedding Model-Based Approach to Duplicate Verification in MARC Records

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2024 are currently being built.

Journal of Korean Library and Information Science Society 2024 KCI Impact Factor : 0.91

Embedding Model-Based Approach to Duplicate Verification in MARC Records

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (0)

REFERENCES (0) * References for papers published after 2024 are currently being built.

Search PDF

Citation

* References for papers published after 2024 are currently being built.