Skip to content
← Back to explorer

What If We Allocate Test-Time Compute Adaptively?

Ahsan Bilal, Ahmed Mohsin, Muhammad Umer, Ali Subhan, Hassan Rizwan, Ayesha Mohsin, Dean Hougen · Feb 1, 2026 · Citations: 0

Abstract

Test-time compute scaling allocates inference computation uniformly, uses fixed sampling strategies, and applies verification only for reranking. In contrast, we propose a verifier-guided adaptive framework treating reasoning as iterative trajectory generation and selection. For each problem, the agent runs multiple inference iterations. In each iteration, it optionally produces a high-level plan, selects a set of reasoning tools and a compute strategy together with an exploration parameter, and then generates a candidate reasoning trajectory. A process reward model (PRM) serves as a unified control signal: within each iteration, step-level PRM scores are aggregated to guide pruning and expansion during generation, and across iterations, aggregated trajectory rewards are used to select the final response. Across datasets, our dynamic, PRM-guided approach consistently outperforms direct test-time scaling, yielding large gains on MATH-500 and several-fold improvements on harder benchmarks such as AIME24 and AMO-Bench. We characterize efficiency using theoretical FLOPs and a compute intensity metric penalizing wasted generation and tool overhead, demonstrating that verification-guided allocation concentrates computation on high-utility reasoning paths.

Human Data Lens

  • Uses human feedback: No
  • Feedback types: None
  • Rater population: Unknown
  • Unit of annotation: Ranking
  • Expertise required: Math

Evaluation Lens

  • Evaluation modes: Automatic Metrics
  • Agentic eval: Long Horizon
  • Quality controls: Not reported
  • Confidence: 0.50
  • Flags: ambiguous

Research Summary

Contribution Summary

  • Test-time compute scaling allocates inference computation uniformly, uses fixed sampling strategies, and applies verification only for reranking.
  • In contrast, we propose a verifier-guided adaptive framework treating reasoning as iterative trajectory generation and selection.
  • For each problem, the agent runs multiple inference iterations.

Why It Matters For Eval

  • For each problem, the agent runs multiple inference iterations.
  • Across datasets, our dynamic, PRM-guided approach consistently outperforms direct test-time scaling, yielding large gains on MATH-500 and several-fold improvements on harder benchmarks such as AIME24 and AMO-Bench.

Related Papers