본문 바로가기
  • Home

SuperSmall-R1: A Lightweight Reinforcement Learning Model for Mathematical Reasoning

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2025, 30(12), pp.51~60
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : October 23, 2025
  • Accepted : November 24, 2025
  • Published : December 31, 2025

Jaegun Lee 1 Janghoon Choi 1

1경북대학교

Accredited

ABSTRACT

In this study, we propose SuperSmall-R1, a lightweight reasoning model specialized for mathematical problem solving, built upon the compact text language model DeepSeek-R1-Distill-Qwen-1.5B. Unlike conventional approaches, the proposed model improves performance solely through reinforcement learning without supervised fine-tuning (SFT). Specifically, we introduce ZeroGRPO, a variant of the GRPO algorithm with the KL-divergence penalty removed, thereby increasing the freedom of policy exploration. In addition, instead of employing a complex penalty-based reward scheme, we adopt a simple yet effective reward function based only on format consistency and answer correctness. To address the issue of insufficient rewards when tackling difficult problems from the beginning, we further incorporate curriculum learning, where the problem difficulty is gradually increased. Experimental results on the Math-500 benchmark demonstrate that our approach outperforms not only the base model but also existing methods based on GRPO and Penalty GRPO, confirming that mathematical reasoning ability can be effectively enhanced even under resource-constrained environments.

Citation status

* References for papers published after 2024 are currently being built.

This paper was written with support from the National Research Foundation of Korea.