Fine-tuning of Korean Text-to-Speech Model Based on Elderly Speaker Voice Data (고령 화자 음성데이터 기반 한국어 음성합성 모델 파인 튜닝)

YeongJu Kim (김영주); Kwangmoon Cho (조광문); Do Hyun Lee (이도현)

@article{ART003306276},
author={YeongJu Kim and Kwangmoon Cho and Do Hyun Lee},
title={Fine-tuning of Korean Text-to-Speech Model Based on Elderly Speaker Voice Data},
journal={Journal of Internet of Things and Convergence},
issn={2466-0078},
year={2026},
volume={12},
number={1},
pages={9-16}

TY - JOUR
AU - YeongJu Kim
AU - Kwangmoon Cho
AU - Do Hyun Lee
TI - Fine-tuning of Korean Text-to-Speech Model Based on Elderly Speaker Voice Data
JO - Journal of Internet of Things and Convergence
PY - 2026
VL - 12
IS - 1
PB - The Korea Internet of Things Society
SP - 9
EP - 16
SN - 2466-0078
AB - This study proposes an effective method for building a Text-to-Speech (TTS) model in limited data environments using Korean voice data from elderly speakers aged 60 to 90. We collected approximately 250 minutes of voice data from 50 elderly speakers (25 males and 25 females), applying fine-tuning techniques with the XTTS (Cross-lingual Text-to-Speech) model based on an average of 5 minutes of data per speaker. In the data preprocessing stage, we refined speech segments and transcription quality through Automatic Speech Recognition (ASR) using the Whisper large-v3 model and Voice Activity Detection (VAD). We improved training efficiency and stability by applying Mixed Precision learning and CosineAnnealing scheduler. Experimental results demonstrate that with optimal hyperparameter settings, most speakers achieved low Word Error Rate (WER) and Character Error Rate (CER). Even for speakers who initially showed high error rates during the initial training phase, performance was significantly improved through retraining. This study presents the feasibility of building a Korean TTS system reflecting the voice characteristics of elderly speakers and provides an efficient Few-shot learning methodology applicable in environments with extremely limited data per speaker.
KW - Text-to-Speech;TTS;XTTS;Fine-tuning;Elderly Speaker;Few-shot Learning
DO -
UR -
ER -

YeongJu Kim, Kwangmoon Cho and Do Hyun Lee. (2026). Fine-tuning of Korean Text-to-Speech Model Based on Elderly Speaker Voice Data. Journal of Internet of Things and Convergence, 12(1), 9-16.

YeongJu Kim, Kwangmoon Cho and Do Hyun Lee. 2026, "Fine-tuning of Korean Text-to-Speech Model Based on Elderly Speaker Voice Data", Journal of Internet of Things and Convergence, vol.12, no.1 pp.9-16.

YeongJu Kim, Kwangmoon Cho, Do Hyun Lee "Fine-tuning of Korean Text-to-Speech Model Based on Elderly Speaker Voice Data" Journal of Internet of Things and Convergence 12.1 pp.9-16 (2026) : 9.

YeongJu Kim, Kwangmoon Cho, Do Hyun Lee. Fine-tuning of Korean Text-to-Speech Model Based on Elderly Speaker Voice Data. 2026; 12(1), 9-16.

YeongJu Kim, Kwangmoon Cho and Do Hyun Lee. "Fine-tuning of Korean Text-to-Speech Model Based on Elderly Speaker Voice Data" Journal of Internet of Things and Convergence 12, no.1 (2026) : 9-16.

YeongJu Kim; Kwangmoon Cho; Do Hyun Lee. Fine-tuning of Korean Text-to-Speech Model Based on Elderly Speaker Voice Data. Journal of Internet of Things and Convergence, 12(1), 9-16.

YeongJu Kim; Kwangmoon Cho; Do Hyun Lee. Fine-tuning of Korean Text-to-Speech Model Based on Elderly Speaker Voice Data. Journal of Internet of Things and Convergence. 2026; 12(1) 9-16.

YeongJu Kim, Kwangmoon Cho, Do Hyun Lee. Fine-tuning of Korean Text-to-Speech Model Based on Elderly Speaker Voice Data. 2026; 12(1), 9-16.

YeongJu Kim, Kwangmoon Cho and Do Hyun Lee. "Fine-tuning of Korean Text-to-Speech Model Based on Elderly Speaker Voice Data" Journal of Internet of Things and Convergence 12, no.1 (2026) : 9-16.

KJCKorea
Journal Central

Journal of Internet of Things and Convergence 2025 KCI Impact Factor : 0.75

Fine-tuning of Korean Text-to-Speech Model Based on Elderly Speaker Voice Data

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2025 are currently being built.

Journal of Internet of Things and Convergence 2025 KCI Impact Factor : 0.75

Fine-tuning of Korean Text-to-Speech Model Based on Elderly Speaker Voice Data

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (0)

REFERENCES (0) * References for papers published after 2025 are currently being built.

Search PDF

Citation

* References for papers published after 2025 are currently being built.