본문 바로가기
  • Home

Preference-Aligned sLLM for Safe and Helpful RAG-Based Battlefield Analysis System

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2025, 30(6), pp.31~46
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : May 15, 2025
  • Accepted : June 10, 2025
  • Published : June 30, 2025

Hansle Lee 1 Dong-Hyun Kim 1 Hyeong-Seok Kim 1 Jaesung Yoo 1

1한화시스템

Accredited

ABSTRACT

In the modern battlefield environment, where vast amounts of information are distributed in real time, there is a growing need for AI-based battlefield situation analysis systems to support commanders in analyzing massive volumes of data. This study aims to align the preferences of a small large language model (sLLM) tailored for a Retrieval-Augmented Generation (RAG) system designed for battlefield situation analysis. To this end, we redefine "safety" in the military domain from the perspective of minimizing hallucinations and construct a Direct Preference Optimization (DPO) dataset using a Teacher Critique-based Inference-with-Hint technique. This technique achieved improvements in hallucination-related safety preferences of 47.35% based on human evaluation and 78.42% based on LLM-as-Judge evaluation. Subsequently, through DPO-based preference learning, we identified the optimal hyperparameter configuraion for battlefield environments as =0.9, epoch=15. Under this setting, the model achieved improvements of +24.41% in safety and +3.77% in helpfulness compared to the SFT baseline. Furthermore, it achieved a performance gain of +85.58 points in the normalized safety-focused Z-score metric, demonstrating the effectiveness of the proposed method in reducing hallucinations.This study demonstrates the potential of developing an sLLM that effectively balances safety and helpfulness in defense applications.

Citation status

* References for papers published after 2023 are currently being built.