Featured Papers
Popular high-signal papers with direct links to full protocol pages.
- LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
May 8, 2026 · Citations: 0
We further introduce beta parameterization to make the search tractable and fine-grained execution trace feedback to improve discovery efficiency by helping the agent diagnose why a TTS program fails.
- Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration
May 8, 2026 · Citations: 0
Experiments on benchmarks show that CPR significantly improves the Empirical Coverage Rate by 34% while reducing average prediction set size by 40% compared to conformal baselines.
- The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents
May 8, 2026 · Citations: 0
Context window expansion is often treated as a straightforward capability upgrade for LLMs, but we find it systematically fails in multi-agent social dilemmas.
- CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation
May 8, 2026 · Citations: 0
While recent advancements in inference-time learning have improved LLM reasoning on Text-to-SQL tasks, current solutions still struggle to perform well on the most challenging tasks in the Bird-Bench (BIRD) benchmark.
- Accurate and Efficient Statistical Testing for Word Semantic Breadth
May 8, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs
May 8, 2026 · Citations: 0
Uncertainty integrates three complementary principles -- distribution plausibility, sampling stability, and cross-field consistency -- to triage human review.
- Fast Byte Latent Transformer
May 8, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims
May 8, 2026 · Citations: 0
A two-human-coder audit on n=30 reproduces the direction of the main finding: dedicated identification sections are absent, and validation-metric substitution is common, though exact Dim B/D counts are coding-rule sensitive.
- Tool Calling is Linearly Readable and Steerable in Language Models
May 8, 2026 · Citations: 0
When a tool-calling agent picks the wrong tool, the failure is invisible until execution: the email gets sent, the meeting gets missed.
- GLiGuard: Schema-Conditioned Classification for LLM Safeguard
May 8, 2026 · Citations: 0
Ensuring safe, policy-compliant outputs from large language models requires real-time content moderation that can scale across multiple safety dimensions.
- Ask Early, Ask Late, Ask Right: When Does Clarification Timing Matter for Long-Horizon Agents?
May 8, 2026 · Citations: 0
Long-horizon AI agents execute complex workflows spanning hundreds of sequential actions, yet a single wrong assumption early on can cascade into irreversible errors.
- How to Train Your Latent Diffusion Language Model Jointly With the Latent Space
May 8, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.