본문 바로가기
  • Home

Robust Audio Spectrogram Transformer for Sound Source Localization in Noisy Environments

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2025, 30(10), pp.33~42
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : August 22, 2025
  • Accepted : October 14, 2025
  • Published : October 31, 2025

Won Jun Lee 1 Woo Jin Jung 2 Hyun-Jong Cha 1 Ah Reum Kang 1

1배재대학교
2배재대학교 정보보안학과

Accredited

ABSTRACT

Conventional sound source localization methods suffer from significant accuracy degradation in low SNR (Signal-to-Noise Ratio) environments. In this paper, we propose a sound source localization model based on an audio spectrogram transformer, which takes GCC (Generalized Cross Correlation) features extracted from multichannel audio signals as input. The proposed model was evaluated under various indoor environments and SNR conditions, and its performance was compared with conventional GCC-PHAT (Generalized Cross Correlation – Phase Transform) and MUSIC (MUltiple SIgnal Classification) algorithms. Experimental results show that the proposed model achieves superior performance, with a mean angular error of 10.0163°, a mean distance error of 0.1626, and a RMSE (Root Mean Square Error) of 0.89 in a 5 m × 5 m × 5 m environment, even at 0 dB SNR. Additionally, the model demonstrates robust performance under changes in room size and noise conditions. This study demonstrates that transformer-based models can be effectively applied to achieve reliable sound source localization in noisy environments.

Citation status

* References for papers published after 2024 are currently being built.