Singing Voice Synthesis Using HMM Based TTS and MusicXML

Najeeb Ullah Khan (칸 나지브 울라); Jungchul Lee (이정철)

Singing Voice Synthesis Using HMM Based TTS and MusicXML

Journal of The Korea Society of Computer and Information
Abbr : JKSCI
2015, 20(5), pp.53~63
Publisher : The Korean Society Of Computer And Information
Research Area : Engineering > Computer Science

Najeeb Ullah Khan ¹, Jungchul Lee ¹

¹울산대학교

Accredited

ABSTRACT

Singing voice synthesis is the generation of a song using a computer given its lyrics and musical notes. Hidden Markov models (HMM) have been proved to be the models of choice for text to speech synthesis. HMMs have also been used for singing voice synthesis research, however, a huge database is needed for the training of HMMs for singing voice synthesis. And commercially available singing voice synthesis systems which use the piano roll music notation, needs to adopt the easy to read standard music notation which make it suitable for singing learning applications. To overcome this problem, we use a speech database for training context dependent HMMs, to be used for singing voice synthesis. Pitch and duration control methods have been devised to modify the parameters of the HMMs trained on speech, to be used as the synthesis units for the singing voice. This work describes a singing voice synthesis system which uses a MusicXML based music score editor as the front-end interface for entry of the notes and lyrics to be synthesized and a hidden Markov model based text to speech synthesis system as the back-end synthesizer. A perceptual test shows the feasibility of our proposed system.

KEYWORDS

TTS, HMM, Singing Voice Synthesis, Score Editor

Citation status

* References for papers published after 2025 are currently being built.

[confproc] H. Kenmochi / 2007 / VOCALOID -commercial singing synthesizer based on sample concatenation / Proc. INTERSPEECH : 4009~4010

[confproc] H. Kenmochi / 2012 / Singing synthesis as a new musical instrument / Proc. ICASSP : 5385~5388

[web] / UTAU / http://utau-synth.com/

[journal] J. Xu / 2014 / An Overview of Deep Generative Models / IETE Technical Review : 1~9

[journal] 임기정 / 2012 / Improvement of Naturalness for a HMM-based Korean TTS using the prosodic boundary information / 한국컴퓨터정보학회논문지 / 한국컴퓨터정보학회 17(9) : 75~84

[confproc] H. Zen / 2007 / The HMM-based speech synthesis system (HTS) version 2.0 / Proc. ISCA Workshop Speech Synthesis : 294~299

[confproc] K. Tokuda / 2000 / Speech parameter generation algorithms for HMM-based speech synthesis / Proc. ICASSP : 1315~1318

[confproc] K. Saino / 2006 / An HMM-based singing voice synthesis system / Proc. INTERSPEECH

[confproc] K. Oura / 2010 / Recent development of the HMM-based singing voice synthesis system-Sinsy / Proc. ISCA Workshop Speech Synthesis : 211~216

[confproc] K. Nakamura / 2014 / HMM-Based singing voice synthesis and its application to Japanese and English / Proc. ICASSP : 265~269

[confproc] K. Shirota / 2014 / Integration of speaker and pitch adaptive training for HMM-based singing voice synthesis / Proc. ICASSP : 2559~2563

[book] T. Saitou / 2007 / Applications of Signal Processing to Audio and Acoustics : 215~218

[confproc] J. Kominek / 2004 / The CMU Arctic speech databases / Fifth ISCA Workshop on Speech Synthesis : 223~224

[journal] K. Tokuda / 2002 / Multi-space probability distribution HMM / IEICE TRANSACTIONS on Information and Systems 85 : 455~464

[confproc] T. Yoshimura / 1998 / Duration modeling for HMM-based speech synthesis / Proc. ICSLP : 29~31

[journal] K. Shinoda / 2000 / MDL-based context-dependent subword modeling for speech recognition / The Journal of the Acoustical Society of Japan (E) 21 : 79~86

[confproc] T. YoshimuraÝ / 1999 / Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis / Proc. Eurospeech : 2347~2350

[web] / MusicXML / http://www.musicxml.com/

[journal] 칸 나지브 울라 / 2014 / Development of a Music Score Editor based on MusicXML / 한국컴퓨터정보학회논문지 / 한국컴퓨터정보학회 19(2) : 77~90

[book] W. H. Press / 1996 / Numerical recipes in C, vol. 2 / Citeseer

[confproc] K. Tokuda / 1994 / Mel-generalized cepstral analysis-a unified approach to speech spectral estimation / Proc. ICSLP : 1043~1046

KJCKorea
Journal Central

Journal of The Korea Society of Computer and Information 2025 KCI Impact Factor : 1.01