MLM-based Misrecognized Word Correction for Speech Recognition (향상된 음성인식을 위한 MLM 기반 오인식 단어 교정기법)

Yonghun Jang (장용훈); Jung Min Lim (임정민); Seong-Guk Nam (남성국); Minhyung Ryu (류민형); Eunjin Yoo (유은진); Myung-Sub Lee (이명섭); Jong Wook Kwak (곽종욱)

doi:10.9708/jksci.2025.30.11.079

MLM-based Misrecognized Word Correction for Speech Recognition

Journal of The Korea Society of Computer and Information
Abbr : JKSCI
2025, 30(11), pp.79~89
DOI : 10.9708/jksci.2025.30.11.079
Publisher : The Korean Society Of Computer And Information
Research Area : Engineering > Computer Science
Received : September 22, 2025
Accepted : October 24, 2025
Published : November 28, 2025

Yonghun Jang ¹, Jung Min Lim ², Seong-Guk Nam ¹, Minhyung Ryu ¹, Eunjin Yoo ¹, Myung-Sub Lee ³, Jong Wook Kwak ²

¹니어네트웍스
²영남대학교
³영남이공대학교

Accredited

ABSTRACT

In this study, we propose an integrated approach to improving the accuracy of Korean speech recognition by addressing phonetic similarity-induced misrecognitions. The proposed system combines three key components: (1) enhancing the signal-to-noise ratio through frequency-domain noise reduction using Minimum Mean Square Error (MMSE)-based log-spectral estimation and a high-pass emphasis filter, (2) detecting contextually inappropriate words using KoBERT-based Masked Language Modeling (MLM), and (3) selecting the final correction word using Jamo-level Levenshtein Distance, which reflects the phonetic characteristics of the Korean language. In an experiment conducted on 1,000 Korean sentences containing misrecognized words, the proposed method reduced the Word Error Rate (WER) from 9.2% to 4.7% compared to the baseline. In addition, the proposed method achieved a maximum detection accuracy of 96.4% for misrecognized words. In conclusion, the proposed method was verified to significantly improve the performance of real-world speech recognition systems.

KEYWORDS

Speech Recognition, Speech to Text, Error Correction, Language Model, Korean NLP

Citation status

* References for papers published after 2025 are currently being built.

[confproc] X. L. Dong / 2023 / Towards next-generation intelligent assistants leveraging llm techniques / Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining : 5792~5793

[other] Y. Guan / 2023 / Intelligent virtual assistants with llm-based process automation

[journal] R. Sarikaya / 2017 / The technology behind personal digital assistants: An overview of the system architecture and key components / IEEE Signal Processing Magazine 34(1) : 67~81

[confproc] T. N. Sainath / 2015 / Convolutional, long short-term memory, fully connected deep neural networks / 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) / IEEE : 4580~4584

[confproc] A. Graves / 2013 / Speech recognition with deep recurrent neural networks / 2013 IEEE international conference on acoustics, speech and signal processing / IEEE : 6645~6649

[confproc] A. Radford / 2023 / Robust speech recognition via large-scale weak supervision / International conference on machine learning / PMLR : 28492~28518

[confproc] J. Devlin / 2019 / Bert: Pre-training of deep bidirectional transformers for language understanding / Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies 1 : 4171~4186

[confproc] X. Chen / 2022 / Incorporating ranking context for end-to-end bert re-ranking / European Conference on Information Retrieval / Springer : 111~127

[journal] G. Hinton / 2012 / Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups / IEEE Signal processing magazine 29(6) : 82~97

[confproc] A. Das / 2018 / Advancing connectionist temporal classification with attention modeling / 2018 IEEE International conference on acoustics, speech and signal processing (ICASSP) / IEEE : 4769~4773

[confproc] J. Li / 2019 / Improving rnn transducer modeling for end-to-end speech recognition / 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) / IEEE : 114~121

[journal] J. Li / 2014 / An overview of noise robust automatic speech recognition / IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(4) : 745~777

[confproc] H. Seong / 2025 / Extending whisper for korean-english code-switching speech recognition / 2025 IEEE International Conference on Consumer Electronics(ICCE) / IEEE : 1~4

[other] Y. Fang / 2025 / Fewer hallucinations, more verification: A three-stage llm-based framework for asr error correction

[other] A. Hernandez / 2025 / Confidence-guided error correction for disordered speech recognition

[journal] H. Yu / 2025 / Krongbert: Enhanced factorization based morphological approach for the korean pretrained language model / Information Processing & Management 62(3) : 104072~

[other] T. Kim / 2025 / Kogec: Korean grammatical error correction with pre-trained translation models

[journal] D. Min / 2024 / A Study on Improving the Accuracy of Korean Speech Recognition Texts Using KcBERT / Journal of KIISE 51(12) : 1115~1124

[book] W. E. Winkler / 1997 / Overview of record linkage and current research directions / Research in Official Statistics : 57~64

[confproc] H. Y. Kim / 2022 / Fast Bilingual Grapheme-To-Phoneme Conversion / Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Industry Track) : 289~296

[confproc] W. I. Cho / 2019 / Investigating an effective character-level embedding in Korean sentence classification / Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation : 10~18

[journal] B. Kim / 2025 / Searching for effective preprocessing method and cnn based architecture with efficient channel attention on speech emotion recognition / Scientific Reports 15(1) : 32689~

[journal] P. Cherukuru / 2024 / Cnn-based noise reduction for multi-channel speech enhancement system with discrete wavelet transform (dwt) preprocessing / PeerJ Computer Science 10 : e1901~

[journal] Y. Iqbal / 2025 / A hybrid speech enhancement technique based on discrete wavelet transform and spectral subtraction / IEEE Access

KJCKorea
Journal Central

Journal of The Korea Society of Computer and Information 2025 KCI Impact Factor : 1.01

MLM-based Misrecognized Word Correction for Speech Recognition

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2025 are currently being built.

Journal of The Korea Society of Computer and Information 2025 KCI Impact Factor : 1.01

MLM-based Misrecognized Word Correction for Speech Recognition

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (0)

REFERENCES (24) * References for papers published after 2025 are currently being built.

Search PDF

Citation

* References for papers published after 2025 are currently being built.