본문 바로가기
  • Home

Development of an Automated ESG Document Review System using Ensemble-Based OCR and RAG Technologies

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2024, 29(9), pp.25-37
  • DOI : 10.9708/jksci.2024.29.09.025
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : July 30, 2024
  • Accepted : August 26, 2024
  • Published : September 30, 2024

Eun-Sil Choi 1

1고려대학교

Accredited

ABSTRACT

This study proposes a novel automation system that integrates Optical Character Recognition (OCR) and Retrieval-Augmented Generation (RAG) technologies to enhance the efficiency of the ESG (Environmental, Social, and Governance) document review process. The proposed system improves text recognition accuracy by applying an ensemble model-based image preprocessing algorithm and hybrid information extraction models in the OCR process. Additionally, the RAG pipeline optimizes information retrieval and answer generation reliability through the implementation of layout analysis algorithms, re-ranking algorithms, and ensemble retrievers. The system's performance was evaluated using certificate images from online portals and corporate internal regulations obtained from various sources, such as the company’s websites. The results demonstrated an accuracy of 93.8% for certification reviews and 92.2% for company regulations reviews, indicating that the proposed system effectively supports human evaluators in the ESG assessment process.

Citation status

* References for papers published after 2023 are currently being built.