본문 바로가기
  • Home

Malware Byte Stream Analysis Using Overlapped LDA

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2024, 29(12), pp.109-119
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : October 15, 2024
  • Accepted : November 27, 2024
  • Published : December 31, 2024

Young-Seob Jeong 1 Yeong-Jin Kim 2 Medard Edmund Mswahili 1 Jiyoung Woo 3 Kang Ah Reum 2

1충북대학교
2배재대학교
3순천향대학교

Accredited

ABSTRACT

More documents are appearing in online platforms, and people are vulnerable to malicious attacks in non-executable documents. Recently data-driven approaches have shown successful results in malware detection task. As they heavily rely on the dataset, it is important to make a lot of annotated data, while the annotation process is normally performed manually by domain experts. Therefore, it is necessary to develop a system or a tool that analyzes the files and help the annotation process. In this paper, we propose a new method that automatically analyzes files and generates byte-level labels using a modified version of overlapped dirichlet allocation that clusters given bytes into two (e.g., malware and benign) or more groups. By experimental results with our annotated dataset, we demonstrated that the generated byte-level labels achieved high recall (95~100%). We observed that our model suffered from low precision because the dataset is sparsely annotated, but it still has a potential to aid in finding suspicious bytes for malware analysis. We also provide sample results visualized by highlights with different colors.

Citation status

* References for papers published after 2023 are currently being built.

This paper was written with support from the National Research Foundation of Korea.