본문 바로가기
  • Home

Sound Classification Performance of Deep Neural Networks in the Presence of Disturbing Sounds

  • Journal of Knowledge Information Technology and Systems
  • Abbr : JKITS
  • 2020, 15(6), pp.973-981
  • DOI : 10.34163/jkits.2020.15.6.006
  • Publisher : Korea Knowledge Information Technology Society
  • Research Area : Interdisciplinary Studies > Interdisciplinary Research
  • Received : November 27, 2020
  • Accepted : December 11, 2020
  • Published : December 31, 2020

Wongeun Oh 1 Lim, Dong Kyun 2

1순천대학교
2한양사이버대학교

Accredited

ABSTRACT

Environmental sound classification is an area that automatically classifies sounds in our surroundings. It can be applied to home automation, security, and surveillance. Recently, the deep learning approaches have been adopted as a classifier for increasing performance. In this method, a deep neural network is trained using many sound data, and after the learning is completed, the microphone pickup sound is applied for classification. However, during this stage, the ambient noise can be put into the microphone along with the sound to be identified. And the sound cannot be properly classified due to this disturbing noise. The recognition rate of the deep neural network decreases as the loudness of the disturbing sound increases, but the analysis about the noise effect on the classification have been limited. In this paper, we present the effect of the disturbing noise on the classification rate. For this purpose, UrbanSound8K, which is composed of 10 types of urban environmental sounds, is used for training and test data. And the VGG16-based CNN which shows good performance in image classification was adopted as a baseline model. For the disturbing noise, we use three types of sounds that are consist of daily noises (hairdryer, vacuum cleaner, faucet water, and hammer), voices(male, female, and synthesized sound), and music(cello, piano, and trumpet). In the experiment, these disturbing noises are mixed with the clean sounds so that the signal-to-noise ratio is in the range of -50dB to 50dB. Then the mixed sound was applied to a deep neural network to obtain a relative recognition rate when compared to the clean cases. The results show that the recognition rate is more than 90% compared to the clean sound cases when the SNR is between 10 and 15 dB, and 95% or more when the SNR is greater than 20 dB, regardless of the type of the disturbing sound.

Citation status

* References for papers published after 2023 are currently being built.

This paper was written with support from the National Research Foundation of Korea.