본문 바로가기
  • Home

An LLM-based Log Embedding Representation Learning Approach for Detecting Early-stage Malware Anomalies

  • Journal of Software Assessment and Valuation
  • Abbr : JSAV
  • 2025, 21(4), pp.1~17
  • Publisher : Korea Software Assessment and Valuation Society
  • Research Area : Engineering > Computer Science
  • Received : December 1, 2025
  • Accepted : December 20, 2025
  • Published : December 26, 2025

Dong-Wan Kim 1 Hyun-Soo Kim 2 Kyung-Yeob Park 1 MinSoo Kim 1 Shin DongMyung 3

1엘에스웨어
2엘에스웨어(주)
3엘에스웨어 (주)

Accredited

ABSTRACT

In large-scale information systems and web services, log-based anomaly detection is a key means of capturing early signs of ransomware and other malware. However, unsupervised methods that rely on raw log text and limited feature engineering perform poorly on real security logs with imbalanced labels and multi-stage attacks. This paper proposes an LLM-based log embedding pipeline that combines three representations-raw logs, embeddings from pre-trained Llama language models, and domain-fine-tuned embeddings for security logs-with statistical and deep anomaly detection models, using about 800,000 web access and system audit log entries. Under a common data split, embedding-based representations raise the binary F1-score of most models to roughly 2.5 times the raw-log baseline and more than threefold for rare attack types, demonstrating their effectiveness as a common input representation for malware anomaly detection and early-warning systems.

Citation status

* References for papers published after 2024 are currently being built.