본문 바로가기
  • Home

Enhancing Voice Recognition Accuracy through Sequential Application of Visual Speech Detection and Noise Reduction

  • Journal of Software Assessment and Valuation
  • Abbr : JSAV
  • 2025, 21(4), pp.211~222
  • Publisher : Korea Software Assessment and Valuation Society
  • Research Area : Engineering > Computer Science
  • Received : December 5, 2025
  • Accepted : December 20, 2025
  • Published : December 26, 2025

Hyeon-Ji Ko 1 Won-Hu Seo 1 Chol Yong Soo 1

1신한대학교

Accredited

ABSTRACT

Traditional energy-based voice activity detection (VAD) fails to clearly distinguish between speech and non-speech in noisy environments and leads to unnecessary computations. It presents a pre-processing pipeline to address the issue of decreased speech recognition accuracy due to background noise in low-spec edge device environments. This study designs a four-stage sequential pipeline that uses visual voice activity detection as a computational gating mechanism in the system. The proposed system selectively performs noise reduction and STT only on verified speech segments, demonstrating that real-time performance can be achieved and non-speech noise can be effectively blocked using only a basic combination of algorithms, without the need for deep learning-based models. Experimental results show that the system maintains an RTF of 0.134 while improving noise reduction performance in speech-active segments to 15.67 dB, and as background noise is removed, a Speech Loss of 14.59 dB is observed, demonstrating overall improved performance compared to conventional noise-removal-based VAD.

Citation status

* References for papers published after 2024 are currently being built.