Jooeun Lee
|
Minseob Song
|
Namgyu Kim
| 2026, 31(2)
| pp.31~42
| number of Cited : 0
With the advancement of large language models(LLMs), in-context learning has become a key approach, driving research on prompting techniques. Among them, few-shot Chain-of-Thought(CoT) prompting, which induces explicit reasoning, shows strong performance but depends on example composition. While prior work focused on diversity or uncertainty in example selection, difficulty-based approaches remain underexplored. This study proposes a method to identify high-difficulty questions by combining pairwise difficulty comparisons conducted by an LLM with a Swiss tournament structure, constructing few-shot CoT exemplars with human reasoning annotations. Experiments on 1,319 GSM8K problems show that the proposed method outperforms random, uncertainty-based, and direct difficulty evaluation approaches by 2.12%p, 1.36%p, and 10.16%p, respectively.