본문 바로가기
  • Home

Channel-Wise Thread Indexing for Performance Improvement of Decomposable Winograd Convolution

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2025, 30(12), pp.15~24
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : October 21, 2025
  • Accepted : December 1, 2025
  • Published : December 31, 2025

Wonho Lee 1 Jong Wook Kwak 1

1영남대학교

Accredited

ABSTRACT

Convolutional neural networks (CNNs) often require large receptive fields, making acceleration algorithms and GPU kernel configurations key factors in optimizing inference performance. In decomposable Winograd convolution algorithms, previous thread indexing methods lead to thread divergence, where threads within a warp are serialized due to differences in filter sizes. In this paper, we introduce a channel-wise thread indexing that eliminates thread divergence by mapping each convolutional filter channel to the same warp. This approach ensures uniform filter sizes across threads within a warp, significantly enhancing performance. Experiments show that the proposed method removes all potential thread divergence across diverse convolution configurations, including various filter sizes and input/output channel counts, reducing execution time up to 15% on state-of-the-art CNN models. These results demonstrate the potential for improving CNN computational efficiency on SIMT architectures.

Citation status

* References for papers published after 2024 are currently being built.