An LLM-based Log Embedding Representation Learning Approach for Detecting Early-stage Malware Anomalies (악성코드 이상 징후 탐지를 위한 LLM 로그 임베딩 기반 표현 학습 기법)

Dong-Wan Kim (김동완); Hyun-Soo Kim (김현수); Kyung-Yeob Park (박경엽); MinSoo Kim (김민수); Shin DongMyung (신동명)

@article{ART003277342},
author={Dong-Wan Kim and Hyun-Soo Kim and Kyung-Yeob Park and MinSoo Kim and Shin DongMyung},
title={An LLM-based Log Embedding Representation Learning Approach for Detecting Early-stage Malware Anomalies},
journal={ Journal of Software Forensics},
issn={3092-541X},
year={2025},
volume={21},
number={4},
pages={1-17}

TY - JOUR
AU - Dong-Wan Kim
AU - Hyun-Soo Kim
AU - Kyung-Yeob Park
AU - MinSoo Kim
AU - Shin DongMyung
TI - An LLM-based Log Embedding Representation Learning Approach for Detecting Early-stage Malware Anomalies
JO - Journal of Software Forensics
PY - 2025
VL - 21
IS - 4
PB - Korea Software Assessment and Valuation Society
SP - 1
EP - 17
SN - 3092-541X
AB - In large-scale information systems and web services, log-based anomaly detection is a key means of capturing early signs of ransomware and other malware. However, unsupervised methods that rely on raw log text and limited feature engineering perform poorly on real security logs with imbalanced labels and multi-stage attacks. This paper proposes an LLM-based log embedding pipeline that combines three representations-raw logs, embeddings from pre-trained Llama language models, and domain-fine-tuned embeddings for security logs-with statistical and deep anomaly detection models, using about 800,000 web access and system audit log entries. Under a common data split, embedding-based representations raise the binary F1-score of most models to roughly 2.5 times the raw-log baseline and more than threefold for rare attack types, demonstrating their effectiveness as a common input representation for malware anomaly detection and early-warning systems.
KW - Anomaly detection;LLM;Log embedding;Malware intrusion detection;Representation learning
DO -
UR -
ER -

Dong-Wan Kim, Hyun-Soo Kim, Kyung-Yeob Park, MinSoo Kim and Shin DongMyung. (2025). An LLM-based Log Embedding Representation Learning Approach for Detecting Early-stage Malware Anomalies. Journal of Software Forensics, 21(4), 1-17.

Dong-Wan Kim, Hyun-Soo Kim, Kyung-Yeob Park, MinSoo Kim and Shin DongMyung. 2025, "An LLM-based Log Embedding Representation Learning Approach for Detecting Early-stage Malware Anomalies", Journal of Software Forensics, vol.21, no.4 pp.1-17.

Dong-Wan Kim, Hyun-Soo Kim, Kyung-Yeob Park, MinSoo Kim, Shin DongMyung "An LLM-based Log Embedding Representation Learning Approach for Detecting Early-stage Malware Anomalies" Journal of Software Forensics 21.4 pp.1-17 (2025) : 1.

Dong-Wan Kim, Hyun-Soo Kim, Kyung-Yeob Park, MinSoo Kim, Shin DongMyung. An LLM-based Log Embedding Representation Learning Approach for Detecting Early-stage Malware Anomalies. 2025; 21(4), 1-17.

Dong-Wan Kim, Hyun-Soo Kim, Kyung-Yeob Park, MinSoo Kim and Shin DongMyung. "An LLM-based Log Embedding Representation Learning Approach for Detecting Early-stage Malware Anomalies" Journal of Software Forensics 21, no.4 (2025) : 1-17.

Dong-Wan Kim; Hyun-Soo Kim; Kyung-Yeob Park; MinSoo Kim; Shin DongMyung. An LLM-based Log Embedding Representation Learning Approach for Detecting Early-stage Malware Anomalies. Journal of Software Forensics, 21(4), 1-17.

Dong-Wan Kim; Hyun-Soo Kim; Kyung-Yeob Park; MinSoo Kim; Shin DongMyung. An LLM-based Log Embedding Representation Learning Approach for Detecting Early-stage Malware Anomalies. Journal of Software Forensics. 2025; 21(4) 1-17.

Dong-Wan Kim, Hyun-Soo Kim, Kyung-Yeob Park, MinSoo Kim, Shin DongMyung. An LLM-based Log Embedding Representation Learning Approach for Detecting Early-stage Malware Anomalies. 2025; 21(4), 1-17.

Dong-Wan Kim, Hyun-Soo Kim, Kyung-Yeob Park, MinSoo Kim and Shin DongMyung. "An LLM-based Log Embedding Representation Learning Approach for Detecting Early-stage Malware Anomalies" Journal of Software Forensics 21, no.4 (2025) : 1-17.

KJCKorea
Journal Central

Journal of Software Forensics 2024 KCI Impact Factor : 0.32

An LLM-based Log Embedding Representation Learning Approach for Detecting Early-stage Malware Anomalies

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2024 are currently being built.

Journal of Software Forensics 2024 KCI Impact Factor : 0.32

An LLM-based Log Embedding Representation Learning Approach for Detecting Early-stage Malware Anomalies

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (0)

REFERENCES (0) * References for papers published after 2024 are currently being built.

Search PDF

Citation

* References for papers published after 2024 are currently being built.