본문 바로가기
  • Home

Implementation of CNN-WavLM-based Environmental Sound and Speech Emotion Recognition Platform

  • Journal of Internet of Things and Convergence
  • Abbr : JKIOTS
  • 2026, 12(3), 23
  • Publisher : The Korea Internet of Things Society
  • Research Area : Engineering > Computer Science > Internet Information Processing
  • Received : December 24, 2025
  • Accepted : June 22, 2026
  • Published : June 30, 2026

Hye-Rim Yoon 1 Hyun-Seung Lee 1 KIM, TAEKOOK 1

1국립부경대학교

Accredited

ABSTRACT

This study implements an auditory-centered emotion recording platform that enables users to record and recall their emotions using speech and environmental sounds. Conventional recording methods based on photos, videos, and text have limitations in capturing the emotional atmosphere and contextual information conveyed by everyday sounds. To address this issue, this study uses sounds directly recorded by users as input data and applies artificial intelligence-based emotion analysis techniques to implement an automatic emotion-tagging function. The proposed system is designed as a branching structure that first determines whether the input audio contains speech. If speech is detected, a WavLM-based speech emotion recognition model is applied. If speech is not detected, a Mel-spectrogram-based CNN model is used to analyze environmental sounds. The CNN model classifies environmental sounds at the scene level and then converts the classification results into emotion tags based on predefined scene-emotion mapping rules. In addition, the platform stores the emotion tags together with the time, location, and user records at the moment of recording, allowing users to explore and recall their records by emotion, date, and location. Through this implementation, this study integrates speech emotion recognition and environmental sound analysis models into an actual service flow and demonstrates the feasibility of an emotion archiving platform that utilizes auditory information.

Citation status

* References for papers published after 2024 are currently being built.

This paper was written with support from the National Research Foundation of Korea.