Featured Papers
Popular high-signal papers with direct links to full protocol pages.
- Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance
May 14, 2026 · Citations: 0
On several benchmarks, FEST outperforms baselines with magnitudes less SFT data, even matching their performance with full dataset.
- The Scientific Contribution Graph: Automated Literature-based Technological Roadmapping at Scale
May 14, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Quantifying and Mitigating Premature Closure in Frontier LLMs
May 14, 2026 · Citations: 0
In open-ended evaluation, models gave inappropriate answers on an average of 30% of 861 HealthBench questions and 78% of 191 physician-authored adversarial queries.
- Explainable Detection of Depression Status Shifts from User Digital Traces
May 14, 2026 · Citations: 0
To enhance interpretability, the framework integrates a large language model to generate concise and human-readable reports that describe the evolution of mental-health signals and highlight key transitions.
- Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing
May 14, 2026 · Citations: 0
PPOW achieves average acceptance lengths of 6.29-6.52 and speedups of 3.39-4.36\times across multiple model families and benchmarks under a unified decoding protocol.
- Chain-of-Procedure: Hierarchical Visual-Language Reasoning for Procedural QA
May 14, 2026 · Citations: 0
To systematically evaluate VLMs on this practical task, we propose ProcedureVQA, a novel multimodal benchmark specifically designed for visual procedural reasoning.
- Tokenizer Fertility and Zero-Shot Performance of Foundation Models on Ukrainian Legal Text: A Comparative Study
May 14, 2026 · Citations: 0
We benchmark seven models from five providers on 273 validated court decisions from Ukraine's state registry (EDRSR), measuring tokenizer fertility and zero-shot performance on three tasks.
- Holistic Evaluation and Failure Diagnosis of AI Agents
May 14, 2026 · Citations: 0
We present a holistic agent evaluation framework that pairs top-down agent-level diagnosis with bottom-up span-level evaluation, decomposing analysis into independent per-span assessments.
- Speculative Interaction Agents: Building Real-Time Agents with Asynchronous I/O and Speculative Tool Calling
May 13, 2026 · Citations: 0
In our work, we propose Speculative Interaction Agents to enable real-time interaction even for agents with complex multi-turn tool calling.
- Inducing Overthink: Hierarchical Genetic Algorithm-based DoS Attack on Black-Box Large Language Reasoning Models
May 13, 2026 · Citations: 0
Across four state-of-the-art reasoning models, the proposed method substantially amplifies output length, achieving up to a 26.1x increase on the MATH benchmark and consistently outperforming benign and manually crafted missing-premise…
- Leveraging Speech to Identify Signatures of Insight and Transfer in Problem Solving
May 13, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Learning, Fast and Slow: Towards LLMs That Adapt Continually
May 12, 2026 · Citations: 0
Moreover, humans also likely learn at different time scales (e.g., System 1 vs 2).