본문 바로가기
  • Home

Semojum: A Multimodal AI-Based Braille Transcription Support Service for Visual Materials and Layouts

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2026, 31(6), pp.91~98
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : May 20, 2026
  • Accepted : June 15, 2026
  • Published : June 30, 2026

Taemin Kim 1 Junhyeok Lee 2 Haeun Cho 3 Hyunju Kim 4 Sanghyun Kim 5 Jimin Lee 6 Won Joo Lee 7

1한국과학기술원
2건국대학교
3동덕여자대학교
4광운대학교
5한양대학교
6국민대학교
7인하공업전문대학

Accredited

ABSTRACT

In this paper, we proposes a Semojum, a multimodal AI-based braille translation support service designed to provide real-time braille books for the visually impaired. Semo-Jeom consists of a DeepSeek-OCR-2 based layout analysis module, a Semantic NMS algorithm, a table layout optimization engine, and a GPT-4o based image captioning module. The layout analysis module scales page images up to 1,536 pixels and outputs element types, OCR text, bounding boxes, and reading orders in a single pass using specialized markup tokens. To refine the output, the Semantic NMS algorithm determines redundancy based on the semantic inclusion relationships between extracted text contents. The table layout optimization engine extracts structured data using the JSON Schema strict output of GPT-4o-mini Vision. For visual accessibility, the image captioning module utilizes GPT-4o Vision, injecting adjacent text from the preceding and following pages as context; it then classifies images into seven distinct patterns and applies specific descriptive strategies for each. Furthermore, Semo-Jeom implements a hybrid braille translation engine that combines the Braillify library with a custom LaTeX tokenizer to support the 2024 Revised Korean Mathematics Braille Regulations. By integrating a Human-in-the-loop structure that presents AI outputs as verifiable drafts and a tab-based three-mode interface, the system significantly enhances the operational efficiency of professional braille translators.

Citation status

* References for papers published after 2024 are currently being built.