본문 바로가기
  • Home

Improvement of Naturalness for a HMM-based Korean TTS using the prosodic boundary information

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2012, 17(9), pp.75-84
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science

임기정 1 Jungchul Lee 1

1울산대학교

Accredited

ABSTRACT

HMM-based Text-to-Speech systems generally utilize context dependent tri-phone units from a large corpus speech DB to enhance the synthetic speech. To downsize a large corpus speech DB, acoustically similar tri-phone units are clustered based on the decision tree using context dependent information. Context dependent information includes phoneme sequence as well as prosodic information because the naturalness of synthetic speech highly depends on the prosody such as pause, intonation pattern, and segmental duration. However, if the prosodic information was complicated, many context dependent phonemes would have no examples in the training data, and clustering would provide a smoothed feature which will generate unnatural synthetic speech. In this paper, instead of complicate prosodic information we propose a simple three prosodic boundary types and decision tree questions that use rising tone, falling tone, and monotonic tone to improve naturalness. Experimental results show that our proposed method can improve naturalness of a HMM-based Korean TTS and get high MOS in the perception test.

Citation status

* References for papers published after 2022 are currently being built.