본문 바로가기
  • Home

Multi-Domain ESQ Metrics for Quality Assessment of Augmented Emotional Speech

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2025, 30(11), pp.37~62
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : September 17, 2025
  • Accepted : November 17, 2025
  • Published : November 28, 2025

Do Kyung Shin 1 Young Dae Kim 1

1엘아이지넥스원(주)

Accredited

ABSTRACT

Recent advances in Automatic Speech Recognition (ASR) technology have driven active research in Speech Emotion Recognition (SER) applications. While SER performance heavily depends on data quality and quantity, data scarcity remains a persistent challenge, making data augmentation techniques essential. Existing voice quality evaluation metrics such as PESQ and STOI are single-dimensional evaluation methods, and have the disadvantage of not being able to comprehensively evaluate the quality of audio data with high-dimensional and multi-dimensional characteristics, such as emotional speech. This study proposes ESQ (Emotion-Specific Quality Assessment) metrics for evaluating the quality of augmented emotional speech data. To validate the ESQ metrics, we utilized the EMO dataset augmented across quality levels using MetricGAN. Experimental results demonstrate consistent score improvements across all seven groups as quality levels increase, achieving an overall average improvement rate of 90.75%.

Citation status

* References for papers published after 2024 are currently being built.