@article{ART003280324},
author={Jaegun Lee and Janghoon Choi},
title={SuperSmall-R1: A Lightweight Reinforcement Learning Model for Mathematical Reasoning},
journal={Journal of The Korea Society of Computer and Information},
issn={1598-849X},
year={2025},
volume={30},
number={12},
pages={51-60}
TY - JOUR
AU - Jaegun Lee
AU - Janghoon Choi
TI - SuperSmall-R1: A Lightweight Reinforcement Learning Model for Mathematical Reasoning
JO - Journal of The Korea Society of Computer and Information
PY - 2025
VL - 30
IS - 12
PB - The Korean Society Of Computer And Information
SP - 51
EP - 60
SN - 1598-849X
AB - In this study, we propose SuperSmall-R1, a lightweight reasoning model specialized for mathematical problem solving, built upon the compact text language model DeepSeek-R1-Distill-Qwen-1.5B. Unlike conventional approaches, the proposed model improves performance solely through reinforcement learning without supervised fine-tuning (SFT). Specifically, we introduce ZeroGRPO, a variant of the GRPO algorithm with the KL-divergence penalty removed, thereby increasing the freedom of policy exploration. In addition, instead of employing a complex penalty-based reward scheme, we adopt a simple yet effective reward function based only on format consistency and answer correctness. To address the issue of insufficient rewards when tackling difficult problems from the beginning, we further incorporate curriculum learning, where the problem difficulty is gradually increased. Experimental results on the Math-500 benchmark demonstrate that our approach outperforms not only the base model but also existing methods based on GRPO and Penalty GRPO, confirming that mathematical reasoning ability can be effectively enhanced even under resource-constrained environments.
KW - Reinforcement learning;Policy optimization;Curriculum learning;Reward design;;Large Language Model
DO -
UR -
ER -
Jaegun Lee and Janghoon Choi. (2025). SuperSmall-R1: A Lightweight Reinforcement Learning Model for Mathematical Reasoning. Journal of The Korea Society of Computer and Information, 30(12), 51-60.
Jaegun Lee and Janghoon Choi. 2025, "SuperSmall-R1: A Lightweight Reinforcement Learning Model for Mathematical Reasoning", Journal of The Korea Society of Computer and Information, vol.30, no.12 pp.51-60.
Jaegun Lee, Janghoon Choi "SuperSmall-R1: A Lightweight Reinforcement Learning Model for Mathematical Reasoning" Journal of The Korea Society of Computer and Information 30.12 pp.51-60 (2025) : 51.
Jaegun Lee, Janghoon Choi. SuperSmall-R1: A Lightweight Reinforcement Learning Model for Mathematical Reasoning. 2025; 30(12), 51-60.
Jaegun Lee and Janghoon Choi. "SuperSmall-R1: A Lightweight Reinforcement Learning Model for Mathematical Reasoning" Journal of The Korea Society of Computer and Information 30, no.12 (2025) : 51-60.
Jaegun Lee; Janghoon Choi. SuperSmall-R1: A Lightweight Reinforcement Learning Model for Mathematical Reasoning. Journal of The Korea Society of Computer and Information, 30(12), 51-60.
Jaegun Lee; Janghoon Choi. SuperSmall-R1: A Lightweight Reinforcement Learning Model for Mathematical Reasoning. Journal of The Korea Society of Computer and Information. 2025; 30(12) 51-60.
Jaegun Lee, Janghoon Choi. SuperSmall-R1: A Lightweight Reinforcement Learning Model for Mathematical Reasoning. 2025; 30(12), 51-60.
Jaegun Lee and Janghoon Choi. "SuperSmall-R1: A Lightweight Reinforcement Learning Model for Mathematical Reasoning" Journal of The Korea Society of Computer and Information 30, no.12 (2025) : 51-60.