- QED-Nano: Teaching a Tiny Model to Prove Hard Theorems
LM-Provers, Yuxiao Qu, Amrith Setlur, Jasper Dekoninck, Edward Beeching · Apr 6, 2026 · Citations: 0
Automatic Metrics MathCoding
To support further research on open mathematical reasoning, we release the full QED-Nano pipeline, including the QED-Nano and QED-Nano-SFT models, the FineProofs-SFT and FineProofs-RL datasets, and the training and evaluation code.
- CAMEL: Confidence-Gated Reflection for Reward Modeling
Zirui Zhu, Hailun Xu, Yang Luo, Yong Liu, Kanchan Sarkar · Feb 24, 2026 · Citations: 0
Automatic Metrics General
Building on this insight, we propose CAMEL, a confidence-gated reflection framework that performs a lightweight single-token preference decision first and selectively invokes reflection only for low-confidence instances.
- "Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation
Amin Seffo, Aladin Djuhera, Masataro Asai, Holger Boche · Jun 4, 2025 · Citations: 0
Simulation Env MathCoding
Recent advancements in large language models (LLMs) have spurred interest in robotic navigation that incorporates complex spatial, mathematical, and conditional constraints from natural language into the planning problem.
- $\texttt{YC-Bench}$: Benchmarking AI Agents for Long-Term Planning and Consistent Execution
Muyu He, Adit Jain, Anand Kumar, Vincent Tu, Soumyadeep Bakshi · Apr 1, 2026 · Citations: 0
Automatic Metrics General
As LLM agents tackle increasingly complex tasks, a critical question is whether they can maintain strategic coherence over long horizons: planning under uncertainty, learning from delayed feedback, and adapting when early mistakes compound.
- Luna-2: Scalable Single-Token Evaluation with Small Language Models
Vatsal Goel, Rishon Dsouza, Nikhil Ega, Amey Ramesh Rambatla, Rob Friel · Feb 20, 2026 · Citations: 0
Llm As JudgeAutomatic Metrics General
We present Luna-2, a novel architecture that leverages decoder-only small language models (SLMs) into a deterministic evaluation model to reliably compute complex task-specific LLMAJ metrics (e.g.
- GAIN: Multiplicative Modulation for Domain Adaptation
Hengshuai Yao, Xing Chen, Ahmed Murtadha, Guan Wang · Apr 6, 2026 · Citations: 0
- VISion On Request: Enhanced VLLM efficiency with sparse, dynamically selected, vision-language interactions
Adrian Bulat, Alberto Baldrati, Ioannis Maniadis Metaxas, Yassine Ouali, Georgios Tzimiropoulos · Mar 24, 2026 · Citations: 0
- ConsRoute:Consistency-Aware Adaptive Query Routing for Cloud-Edge-Device Large Language Models
Haoyu Qiao, Hao Zhang, Shanwen Mao, Siyao Cheng, Jie Liu · Mar 22, 2026 · Citations: 0
- Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards
Philipp Normann, Andreas Happe, Jürgen Cito, Daniel Arp · Mar 18, 2026 · Citations: 0
- Auto-Unrolled Proximal Gradient Descent: An AutoML Approach to Interpretable Waveform Optimization
Ahmet Kaplan · Mar 18, 2026 · Citations: 0
- Thinking in Latents: Adaptive Anchor Refinement for Implicit Reasoning in LLMs
Disha Sheshanarayana, Rajat Subhra Pal, Manjira Sinha, Tirthankar Dasgupta · Mar 16, 2026 · Citations: 0
- Efficient and Interpretable Multi-Agent LLM Routing via Ant Colony Optimization
Xudong Wang, Chaoning Zhang, Jiaquan Zhang, Chenghao Li, Qigan Sun · Mar 13, 2026 · Citations: 0
- Slow-Fast Inference: Training-Free Inference Acceleration via Within-Sentence Support Stability
Xingyu Xie, Zhaochen Yu, Yue Liao, Tao Wang, Kim-Chuan Toh · Mar 12, 2026 · Citations: 0
- UniPrompt-CL: Sustainable Continual Learning in Medical AI with Unified Prompt Pools
Gyutae Oh, Jitae Shin · Aug 14, 2025 · Citations: 0