본문 바로가기
  • Home

Interpretability for Korean Language Models: Evidence from Attention Visualization

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2026, 31(1), pp.41~49
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : November 12, 2025
  • Accepted : December 29, 2025
  • Published : January 30, 2026

Jung-Kyu Shin 1 Beak-Cheol Jang 1

1연세대학교

Accredited

ABSTRACT

This study investigates how the agglutinative nature and morphological complexity of Korean are reflected in language model (LM) internal representations by fine-tuning KLUE RoBERTa Base on the NER task and conducting qualitative and quantitative analyses of attention maps. Our methodology includes a stable training design based on subword–label alignment and masking respecting character-level annotations, attention weight extraction, attention strength visualization, and pattern-specific attention distribution quantification. The analysis reveals three patterns: span-internal cohesion, where entity tokens attend to span boundaries; boundary alignment, where post-entity particles tagged O function as boundary cues; and long-distance dependencies, where distal arguments form semantically coherent links. These findings suggest that Korean linguistic characteristics are structurally organized at the attention layer and head level. This work enhances the interpretability of Korean LMs and establishes a foundation for interpretability research applicable to diverse downstream tasks.

Citation status

* References for papers published after 2024 are currently being built.