Skip to content

Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models

Saurabh Srivastava, Janit Bidhan, Hao Yan, Abhishek Dey, Tanu Kansal, +5 more

2025-11-06T06:47:39Z

Abstract

Large Reasoning Models (LRMs) achieve strong performance through explicit chain-of-thought reasoning but suffer from \textit{overthinking}: generating excessive reasoning tokens even for trivial queries. {Beyond inflating cost, overthinking can be self-defeating: models enter recursive self-doubt loops that exhaust token budgets without producing an answer, causing API timeouts that directly hurt accuracy.} We present an empirical study showing that \textbf{batch prompting}, originally introduced for throughput optimization, effectively suppresses overthinking at inference time. Across 13 diverse benchmarks with DeepSeek-R1 and OpenAI-o1, batch prompting {reduces reasoning tokens by 76\% (2{,}950$\mapsto$710), on average, while preserving or improving accuracy}. Through behavioral analysis, we find that batching induces three beneficial effects: (1) it reduces per-query reasoning effort when multiple queries share a context; (2) it enables pattern induction, where models generalize from earlier examples to solve later ones; and (3) it suppresses hedging behavior (e.g., ``\texttt{wait,}'' ``\texttt{let me double-check}'') that signals metacognitive loops. We also show that explicit prompt constraints (``\texttt{Use no more than 100 tokens in thinking.}'') fail to reduce overthinking; models either ignore them or sacrifice accuracy. These findings reframe batch prompting as more than a cost optimization: it is a practical inference-time technique that improves efficiency and reliability without model modification.

Full analysis loading… Code implementations, benchmark data, and reproduction guides are being assembled. Please check back shortly.

Browse all papers

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.