본문 바로가기
  • Home

Designing a Voice-Video Based UI Pipeline to Improve Digital Signage Usability for Digitally Vulnerable Groups

  • Journal of Software Assessment and Valuation
  • Abbr : JSAV
  • 2025, 21(4), pp.197~209
  • Publisher : Korea Software Assessment and Valuation Society
  • Research Area : Engineering > Computer Science
  • Received : December 5, 2025
  • Accepted : December 20, 2025
  • Published : December 26, 2025

Jeong-Hyun Kim 1 Jong-Seob Park 1 Chol Yong Soo 1

1신한대학교

Accredited

ABSTRACT

This paper proposes a speech-image UI pipeline to improve the usability of digital signage for the mobility-challenged. It consists of two main stages: visual speech detection and auditory signal enhancement. First, the image processing module utilizes Google MediaPipe Holistic and a MobileNetV2-GRU hybrid model to analyze the user's lip movements and capture "speech intent" in real time. Specifically, data augmentation and a circular buffer prevent early speech loss, achieving 100% speech detection accuracy. Second, the speech processing module adopts an adaptive signal-to-noise ratio (SNR) noise removal algorithm based on the "Do No Harm" principle. To solve the problem of deep learning models (Sepformer) distorting Korean speech signals, the SNR threshold is set to 6 dB. It is remove noise in case of low-SNR environments and skip processing in case of high-SNR environments to prevent speech loss. In particular, if it omit the speech processing of 79.4% of the total speech data in a low-SNR environment, 1.07% decrease in WER will be achieved. Furthermore, applying data augmentation techniques in visual speech detection significantly achieve the accuracy of 1.0(100%) and the loss of 0.0004.

Citation status

* References for papers published after 2024 are currently being built.