본문 바로가기
  • Home

Design and Implementation of an LCS and Weight-Based Birthmark System for Detecting Python Code Theft

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2025, 30(11), pp.191~203
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : September 8, 2025
  • Accepted : November 13, 2025
  • Published : November 28, 2025

Seong-Min Kim 1 Won-Chan Lee 1 Heewan Park 1

1한라대학교

Accredited

ABSTRACT

This paper proposes a novel method for accurately measuring the similarity between Python codes. Existing approaches primarily rely on k-gram based token matching, which lacks sufficient consideration of code structure and flow. To address these limitations, the proposed method introduces an s-gram segmentation technique and a weighted comparison scheme, combined with the Longest Common Subsequence (LCS) algorithm to analyze structural similarity. The approach was evaluated on various open-source Python projects collected from GitHub. Experimental results demonstrate that the proposed method effectively reduces similarity scores between codes developed by different authors while maintaining high similarity scores for codes from the same author, outperforming existing techniques. These findings suggest that the proposed method can contribute significantly to Python code theft detection and software quality management.

Citation status

* References for papers published after 2024 are currently being built.