본문 바로가기
  • Home

Transformer-based Android Malware Classification using Multi-stage Feature Selection

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2025, 30(11), pp.179~189
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : October 2, 2025
  • Accepted : November 17, 2025
  • Published : November 28, 2025

Gwang-Ho Kim 1 Soo-Jin Lee 1

1국방대학교

Accredited

ABSTRACT

High-dimensional features in API Call-based Android malware detection lead to high computational costs when applying Transformer models. To address this, this paper proposes a multi-stage feature selection pipeline combining LightGBM and Transformer's tokenization method to achieve both model lightweightness and high performance. The proposed method ranks feature importance using LightGBM and then dynamically constructs a final feature set constrained by each Transformer's maximum input token limit. Experimental results show that despite dramatically reducing 9,503 original features to 80-95, our model achieved up to 98.28% accuracy in binary classification and an 83.66% Macro F1-Score in multi-class classification. This demonstrates that our methodology provides comparable performance to previous studies with significantly fewer features, proving it to be an effective solution for ensuring both efficiency and high detection rates in high-dimensional data analysis.

Citation status

* References for papers published after 2024 are currently being built.