본문 바로가기
  • Home

Quantitative Assessment of OCR for Complex Documents on Retrieval-Augmented Generation Performance

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2025, 30(6), pp.65~76
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : April 24, 2025
  • Accepted : June 10, 2025
  • Published : June 30, 2025

Minchae Song 1 Jaeyoung Park 2

1농협금융지주(주)
2부경대학교

Accredited

ABSTRACT

Retrieval-Augmented Generation (RAG) enhances the accuracy of generative AI services by allowing Large Language Models (LLMs) to reference external knowledge bases rather than relying solely on pre-trained knowledge. This study analyzes various types of financial document images to examine the impact of document image structure on RAG effectiveness. The results reveal that, although OCR achieves high recognition accuracy even with handwritten text, the overall performance of RAG remains suboptimal. This suggests that increased structural complexity in original document images hinders contextual understanding, which in turn degrades performance across the retrieval, chunking, and generation stages of the RAG pipeline. Therefore, assuming OCR text quality exceeds a certain threshold, structuring input data into a format that is more readily interpretable by machines through post-processing plays a more critical role in enhancing RAG performance.

Citation status

* References for papers published after 2023 are currently being built.