본문 바로가기
  • Home

An Illegal Steaming Site Structure Analysis Method Based on Text Numericalization

  • Journal of Software Assessment and Valuation
  • Abbr : JSAV
  • 2024, 20(4), pp.193-201
  • Publisher : Korea Software Assessment and Valuation Society
  • Research Area : Engineering > Computer Science
  • Received : December 3, 2024
  • Accepted : December 20, 2024
  • Published : December 31, 2024

Seung-Hyun Park 1 In-Jae Yoo 1 Se-Young Jang 2 Sunhee Shin 3 Byuong-Chan Park 2 Seok-Yoon Kim 2 Youngmo Kim 2

1숭실대학교 컴퓨터학과
2숭실대학교
3강남대학교

Accredited

ABSTRACT

With the rapid growth of the OTT content market, illegal streaming sites have proliferated, raising social issues such as copyright infringement and exposure to harmful content for minors. These sites evade blocking measures by altering URLs and continuously modifying HTML structures to complicate detection. This paper proposes a text quantification-based structural analysis method to effectively identify and classify illegal streaming sites with different HTML structures. The method involves converting HTML elements into text for quantification, tokenizing and vectorizing CSS Selectors, and calculating cosine similarity to group sites with similar structures. Through this approach, the categories of each site are identified, and key keywords are analyzed to systematically classify the structures of illegal streaming sites.

Citation status

* References for papers published after 2023 are currently being built.