@article{ART003266109},
author={Seong-Min Kim and Won-Chan Lee and Heewan Park},
title={Design and Implementation of an LCS and Weight-Based Birthmark System for Detecting Python Code Theft},
journal={Journal of The Korea Society of Computer and Information},
issn={1598-849X},
year={2025},
volume={30},
number={11},
pages={191-203}
TY - JOUR
AU - Seong-Min Kim
AU - Won-Chan Lee
AU - Heewan Park
TI - Design and Implementation of an LCS and Weight-Based Birthmark System for Detecting Python Code Theft
JO - Journal of The Korea Society of Computer and Information
PY - 2025
VL - 30
IS - 11
PB - The Korean Society Of Computer And Information
SP - 191
EP - 203
SN - 1598-849X
AB - This paper proposes a novel method for accurately measuring the similarity between Python codes.
Existing approaches primarily rely on k-gram based token matching, which lacks sufficient consideration of code structure and flow. To address these limitations, the proposed method introduces an s-gram segmentation technique and a weighted comparison scheme, combined with the Longest Common Subsequence (LCS) algorithm to analyze structural similarity. The approach was evaluated on various open-source Python projects collected from GitHub. Experimental results demonstrate that the proposed method effectively reduces similarity scores between codes developed by different authors while maintaining high similarity scores for codes from the same author, outperforming existing techniques.
These findings suggest that the proposed method can contribute significantly to Python code theft detection and software quality management.
KW - Software birthmark;Code theft detection;Python bytecode;Bytecode similarity;;Longest Common Subsequence (LCS)
DO -
UR -
ER -
Seong-Min Kim, Won-Chan Lee and Heewan Park. (2025). Design and Implementation of an LCS and Weight-Based Birthmark System for Detecting Python Code Theft. Journal of The Korea Society of Computer and Information, 30(11), 191-203.
Seong-Min Kim, Won-Chan Lee and Heewan Park. 2025, "Design and Implementation of an LCS and Weight-Based Birthmark System for Detecting Python Code Theft", Journal of The Korea Society of Computer and Information, vol.30, no.11 pp.191-203.
Seong-Min Kim, Won-Chan Lee, Heewan Park "Design and Implementation of an LCS and Weight-Based Birthmark System for Detecting Python Code Theft" Journal of The Korea Society of Computer and Information 30.11 pp.191-203 (2025) : 191.
Seong-Min Kim, Won-Chan Lee, Heewan Park. Design and Implementation of an LCS and Weight-Based Birthmark System for Detecting Python Code Theft. 2025; 30(11), 191-203.
Seong-Min Kim, Won-Chan Lee and Heewan Park. "Design and Implementation of an LCS and Weight-Based Birthmark System for Detecting Python Code Theft" Journal of The Korea Society of Computer and Information 30, no.11 (2025) : 191-203.
Seong-Min Kim; Won-Chan Lee; Heewan Park. Design and Implementation of an LCS and Weight-Based Birthmark System for Detecting Python Code Theft. Journal of The Korea Society of Computer and Information, 30(11), 191-203.
Seong-Min Kim; Won-Chan Lee; Heewan Park. Design and Implementation of an LCS and Weight-Based Birthmark System for Detecting Python Code Theft. Journal of The Korea Society of Computer and Information. 2025; 30(11) 191-203.
Seong-Min Kim, Won-Chan Lee, Heewan Park. Design and Implementation of an LCS and Weight-Based Birthmark System for Detecting Python Code Theft. 2025; 30(11), 191-203.
Seong-Min Kim, Won-Chan Lee and Heewan Park. "Design and Implementation of an LCS and Weight-Based Birthmark System for Detecting Python Code Theft" Journal of The Korea Society of Computer and Information 30, no.11 (2025) : 191-203.