본문 바로가기
  • Home

Time-Frequency Representations for Improving Environmental Sound Classification with Deep Learning Models

  • Journal of Software Assessment and Valuation
  • Abbr : JSAV
  • 2024, 20(4), pp.233-242
  • Publisher : Korea Software Assessment and Valuation Society
  • Research Area : Engineering > Computer Science
  • Received : December 3, 2024
  • Accepted : December 20, 2024
  • Published : December 31, 2024

Moon-Ki Back 1 Shim, Hyoung-Seop 1

1한국과학기술정보연구원

Accredited

ABSTRACT

Spectrograms are widely utilized in audio signal processing research to effectively analyze the magnitude of frequency components but they are limited in representing time-varying phase information. To overcome this limitation, this paper explores a time-frequency representation that combines Power Spectrogram (PS) and Instantaneous Frequency (IF) features and validates its effectiveness through environmental sound classification tasks using various deep learning architectures. Experiments on the ESC-50 dataset demonstrate that ConvNeXt model, leveraging the vertical integration of PS and IF, achieves a classification accuracy of 87.16%, reflecting a 1.7% improvement over conventional methods. The confusion matrix analysis reveals that misclassifications often occur for water-related sounds and sirens, as they exhibit highly similar time- frequency patterns, making them challenging to distinguish. This study highlights the potential of the proposed approach to enhance the performance of deep learning models in audio-related tasks, particularly for small- to medium-scale datasets and anticipates broad applicability in sound-related applications.

Citation status

* References for papers published after 2023 are currently being built.