A Study on the Impact of Speech Data Quality on Speech Recognition Models (음성 데이터 품질이 음성인식 모델에 미치는 영향 연구)

Yeong-Jin Kim (김영진); Hyun-Jong Cha (차현종); Kang Ah Reum (강아름)

doi:10.9708/jksci.2024.29.01.041

A Study on the Impact of Speech Data Quality on Speech Recognition Models

Journal of The Korea Society of Computer and Information
Abbr : JKSCI
2024, 29(1), pp.41~49
DOI : 10.9708/jksci.2024.29.01.041
Publisher : The Korean Society Of Computer And Information
Research Area : Engineering > Computer Science
Received : October 30, 2023
Accepted : January 25, 2024
Published : January 31, 2024

Yeong-Jin Kim ¹, Hyun-Jong Cha ¹, Kang Ah Reum ¹

¹배재대학교

Accredited

ABSTRACT

Speech recognition technology is continuously advancing and widely used in various fields. In this study, we aimed to investigate the impact of speech data quality on speech recognition models by dividing the dataset into the entire dataset and the top 70% based on Signal-to-Noise Ratio (SNR). Utilizing Seamless M4T and Google Cloud Speech-to-Text, we examined the text transformation results for each model and evaluated them using the Levenshtein Distance. Experimental results revealed that Seamless M4T scored 13.6 in models using data with high SNR, which is lower than the score of 16.6 for the entire dataset. However, Google Cloud Speech-to-Text scored 8.3 on the entire dataset, indicating lower performance than data with high SNR. This suggests that using data with high SNR during the training of a new speech recognition model can have an impact, and Levenshtein Distance can serve as a metric for evaluating speech recognition models.

KEYWORDS

Speech Recognition, Signal-to-Noise-Ratio(SNR), Levenshtein Distance Algorithm, Meta Seamless M4T, Google Cloud Speech-to-Text

Citation status

* References for papers published after 2024 are currently being built.

[confproc] Chandolikar, N. / 2022 / Voice Recognition: A Comprehensive Survey / 2022International Mobile and Embedded Technology Conference (MECON) : 45~51

[journal] Chen, J. / 2014 / A Feature Study for Classification-based Speech Separation at Low Signal-to-Noise Ratios / IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(12) : 1993~2002

[journal] Toscano, J. C. / 2021 / Effects of Face Masks on Speech Recognition in Multi-talker Babble Noise / PloS one 16(2) : e0246842~

[confproc] Gemmeke, J. F. / 2017 / Audio set: An Ontology and Human-labeled Dataset for Audio Events / 2017IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) : 776~780

[journal] Yeonsoo, L. / 2019 / An Automatic Data Construction Approach for Korean Speech Command Recognition / Journal of The Korea Society of Computer and Information 24(12) : 17~24

[confproc] Hirsch, H. G. / 2000 / The Aurora Experimental Framework for the Performance Evaluation of Speech Recognition Systems under Noisy Conditions / Automatic Speech Recognition:Challenges for the New Millenium ISCA ITRW ASR2000

[confproc] Yejin L. / 2020 / A Comparison of the Performance of Noise Cancellation Methods for Improving Speech Recognition Accuracy in Noisy Environment / Information and Control Symposium : 257~258

[confproc] Bokyoung Kim / 2022 / Deep Learning-based Filter for Speech Separation to Enhance STT Performance / Proceedings of Symposium of the Korean Institute of communications and Information Sciences : 157~158

[journal] Seung Gwan L. / 2019 / Data Augmentation for DNN-based Speech Enhancement / Journal of Korea Multimedia Society 22(7) : 749~758

[confproc] Jiwon L. / 2021 / Noise Filtering Method Based on Voice Frequency Correlation to Increase STT Efficiency / a collection of papers from The Korean Institute of Broadcast and Media Engineers academic presentation : 176~179

[journal] Byung Hee K. / 2023 / A Deep Learning based Speech Quality Enhancement Scheme Using Environmental Sound Classification and Location Information / Journal of KIISE 50(4) : 344~350

[journal] Hou, J, C. / Audio-Visual Speech Enhancement using Multimodal Deep Convolutional Neural Networks / IEEE transactions on Emerging Topics in Computational Intelligence

[confproc] Youngmi P. / 2021 / A Study on the Application of Language Model to Improve Speech Recognition Accuracy / Proceedings of the Korean Information Science Society Conference : 287~289

[web] Joris C. / 2020 / LibriMix : An Open-Source Dataset for Generalizable Speech Separation / arXiv / arXiv:2005.11262

[journal] Jungyoon C. / 2019 / CCVoice: Voice to Text Conversion and Management Program Implementation of Google Cloud Speech API / KIISE Transactions on Computing Practices 25(3) : 191~197

[confproc] Radford A. / 2023 / Robust Speech Recognition via Large-Scale Weak Supervision / International conference on Machine Learning : 28492~28518

[journal] Don H. Johnson / 2006 / Signal-to-Noise Ratio / Scholarpedia

[web] Microsoft / What is Speech to Text? / https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-to-text

[journal] Levenshtein, Vladimir I. / 1966 / Binary Codes Capable of Correcting Deletions, Insertions, and Reversals / Soviet Physics Doklady 10(8)

[web] Max Bachmann / Python-Levenshtein / https://github.com/maxbachmann/Levenshtein

[web] Barrault, Loïc / 2023 / SeamlessM4T-Massively Multilingual &Multimodal Machine Translation / Meta / 10.48550/arXiv.2308.11596

[web] Google / Google Cloud Speech-to-Text / https://cloud.google.com/speech-to-text?hl=ko

[web] Aihub / Aihub Introduce / https://aihub.or.kr/intrcn/intrcn.do?currMenu=150&topMenu=105

This paper was written with support from the National Research Foundation of Korea.

KJCKorea
Journal Central

Journal of The Korea Society of Computer and Information 2024 KCI Impact Factor : 0.81

A Study on the Impact of Speech Data Quality on Speech Recognition Models

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2024 are currently being built.

Journal of The Korea Society of Computer and Information 2024 KCI Impact Factor : 0.81

A Study on the Impact of Speech Data Quality on Speech Recognition Models

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (2)

REFERENCES (23) * References for papers published after 2024 are currently being built.

Search PDF

Citation

* References for papers published after 2024 are currently being built.