Featured Papers
Popular high-signal papers with direct links to full protocol pages.
- Beyond Transcripts: Iterative Peer-Editing with Audio Unlocks High-Quality Human Summaries of Conversational Speech
May 17, 2026 · Citations: 0
There are not enough established benchmarks for the task fo speech summarization.
- Causal Intervention-Based Memory Selection for Long-Horizon LLM Agents
May 17, 2026 · Citations: 0
Long-horizon LLM agents rely on persistent memory to support interactions across sessions, yet existing memory systems often retrieve context using semantic similarity or broad history inclusion, treating retrieved memories as uniformly…
- Temporal Decay of Co-Citation Predictability: A 20-Year Statute Retrieval Benchmark from 396M Ukrainian Court Citations
May 17, 2026 · Citations: 0
We test this assumption longitudinally by constructing UA-StatuteRetrieval, a benchmark that measures co-citation predictability across 20 annual snapshots (2007-2026) of 396 million codex citations from 101 million Ukrainian court…
- AI Agents May Always Fall for Prompt Injections
May 17, 2026 · Citations: 0
Prompt injection is the most critical vulnerability in deployed AI agents.
- SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening
May 17, 2026 · Citations: 0
The rapid growth of online video platforms and AI-generated content has made reliable video guardrails a key challenge for safety and real-world deployment.
- Mixture of Experts for Low-Resource LLMs
May 17, 2026 · Citations: 0
Routing improvements correlate with consistent downstream benchmark gains, positioning routing entropy and expert specialization as principled diagnostics for multilingual capacity in MoE systems.
- How Off-Policy Can GRPO Be? Mu-GRPO for Efficient LLM Reinforcement Learning
May 17, 2026 · Citations: 0
Across five language models and multiple math reasoning benchmarks, Mu-GRPO matches or exceeds the performance of standard GRPO while achieving around 2x speedup in wall-clock training time, establishing a substantially improved…
- Generalization or Memorization? Brittleness Testing for Chess-Trained Language Models
May 17, 2026 · Citations: 0
Recent work has fine-tuned language models on chess data and reported high benchmark scores as evidence that the resulting models can understand the rules of chess, play full chess games at a professional level, or generate human-readable…
- Firefly: Illuminating Large-Scale Verified Tool-Call Data Generation from Real APIs
May 17, 2026 · Citations: 0
Training tool-calling agents requires large-scale trajectory data with verifiable labels, yet existing approaches either synthesize environments that diverge from real API behavior or generate tasks without ground-truth outcomes for…
- No Free Swap: Protocol-Dependent Layer Redundancy in Transformers
May 15, 2026 · Citations: 0
When researchers ask whether two transformer layers are "equivalent" for compression, they often conflate distinct tests.
- DimMem: Dimensional Structuring for Efficient Long-Term Agent Memory
May 15, 2026 · Citations: 0
Large language model (LLM) agents require long-term memory to leverage information from past interactions.
- STS: Efficient Sparse Attention with Speculative Token Sparsity
May 15, 2026 · Citations: 0
This challenge is particularly acute for emerging agentic applications that require processing multi-million token sequences.