본문 바로가기
  • Home

KRAFT³-QA: Korean financial text-table benchmark for evaluating tool-augmented agents on QA tasks

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2025, 30(8), pp.29~39
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : July 7, 2025
  • Accepted : July 30, 2025
  • Published : August 29, 2025

Seungjae Park 1 Sung-Bae Cho 1 Ha Young Kim 1

1연세대학교

Accredited

ABSTRACT

Periodic corporate filings are structured documents combining text and tables. Practical use of these documents requires comprehensive reasoning to integrate and interpret information across multiple sections. However, current large language models (LLMs) struggle with such reasoning, and existing financial benchmarks are insufficient for evaluating practical skills like tool usage. To address this gap, we develop KRAFT³-QA, a new benchmark based on Korean corporate filings. KRAFT³-QA consists of multiple-choice tasks that require integrating information across various sections. Model performance is evaluated using both accuracy and valid response rate. Experiments with major open LLMs demonstrate that model scale and reasoning architecture can affect performance. This study presents a real document- -based, tool-augmented QA benchmark and an evaluation framework, establishing a technical foundation for quantitatively assessing the real-world problem-solving capabilities of LLM agents.

Citation status

* References for papers published after 2024 are currently being built.

This paper was written with support from the National Research Foundation of Korea.