본문 바로가기
  • Home

A Study on the Impact of Speech Data Quality on Speech Recognition Models

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2024, 29(1), pp.41-49
  • DOI : 10.9708/jksci.2024.29.01.041
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : October 30, 2023
  • Accepted : January 25, 2024
  • Published : January 31, 2024

Yeong-Jin Kim 1 Hyun-Jong Cha 1 Ah Reum Kang 1

1배재대학교

Accredited

ABSTRACT

Speech recognition technology is continuously advancing and widely used in various fields. In this study, we aimed to investigate the impact of speech data quality on speech recognition models by dividing the dataset into the entire dataset and the top 70% based on Signal-to-Noise Ratio (SNR). Utilizing Seamless M4T and Google Cloud Speech-to-Text, we examined the text transformation results for each model and evaluated them using the Levenshtein Distance. Experimental results revealed that Seamless M4T scored 13.6 in models using data with high SNR, which is lower than the score of 16.6 for the entire dataset. However, Google Cloud Speech-to-Text scored 8.3 on the entire dataset, indicating lower performance than data with high SNR. This suggests that using data with high SNR during the training of a new speech recognition model can have an impact, and Levenshtein Distance can serve as a metric for evaluating speech recognition models.

Citation status

* References for papers published after 2022 are currently being built.

This paper was written with support from the National Research Foundation of Korea.