본문 바로가기
  • Home

Fine-Tuning Large Language Models for Security Log Data Labeling

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2025, 30(10), pp.143~154
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : August 27, 2025
  • Accepted : October 13, 2025
  • Published : October 31, 2025

Doo-Yong Jeon 1

1영남이공대학교

Accredited

ABSTRACT

This study proposes a data sampling method called CoreShot Filter to address the high cost and subjective judgment issues in labeling security log data. CoreShot Filter combines the concepts of representativeness and uncertainty from active learning to select optimal data for fine-tuning large language models (LLMs). It defines uncertainty using discrepancies between weak learners and manual labels, while representativeness is measured through similarity with persona data generated by genetic algorithms. From over 310,000 logs, 204 core samples were selected and used to fine-tune GPT-4o mini. Experimental results demonstrate that CoreShot Filter outperforms stratified, outlier, and coreset sampling in terms of accuracy, recall, and F1-score. In particular, it achieved superior performance in abnormal detection (Recall 0.8901) and precision (0.9489), proving that CoreShot Filter is an effective method for improving security log analysis and LLM-based labeling efficiency.

Citation status

* References for papers published after 2024 are currently being built.