- AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling
Liang Ding · Mar 22, 2026 · Citations: 0
Demonstrations Human EvalLlm As Judge Long Horizon
LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass@1 on ToolBench (Zhou et al., 2024; Qin et al., 2024) -- yet every failed trajectory is routinely…
- Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Language Navigation
Swagat Padhan, Lakshya Jain, Bhavya Minesh Shah, Omkar Patil, Thao Nguyen · Mar 19, 2026 · Citations: 0
Demonstrations Simulation Env Multi Agent
To address this limitation, we propose MAPG (Multi-Agent Probabilistic Grounding), an agentic framework that decomposes language queries into structured subcomponents and queries a VLM to ground each component.
- SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning
Philip Schroeder, Thomas Weng, Karl Schmeckpeper, Eric Rosen, Stephen Hart · Mar 30, 2026 · Citations: 0
Demonstrations Simulation Env Long Horizon
To address this limitation, we introduce SOLE-R1 (Self-Observing LEarner), a video-language reasoning model explicitly designed to serve as the sole reward signal for online RL.
- Reason and Verify: A Framework for Faithful Retrieval-Augmented Generation
Eeham Khan, Luis Rodriguez, Marc Queudot · Mar 10, 2026 · Citations: 0
Demonstrations Automatic Metrics
We evaluate this framework on the BioASQ and PubMedQA benchmarks, specifically analyzing the impact of dynamic in-context learning and rerank- ing under constrained token budgets.
- RoboPocket: Improve Robot Policies Instantly with Your Phone
Junjie Fang, Wendi Chen, Han Xue, Fangyuan Zhou, Tian Le · Mar 5, 2026 · Citations: 0
Demonstrations Long Horizon
To reconcile this trade-off, we introduce RoboPocket, a portable system that enables Robot-Free Instant Policy Iteration using single consumer smartphones.
- TimeWarp: Evaluating Web Agents by Revisiting the Past
Md Farhan Ishmam, Kenneth Marino · Mar 5, 2026 · Citations: 0
Demonstrations Web Browsing
The improvement of web agents on current benchmarks raises the question: Do today's agents perform just as well when the web changes?
- State-of-the-Art Arabic Language Modeling with Sparse MoE Fine-Tuning and Chain-of-Thought Distillation
Navan Preet Singh, Anurag Garikipati, Ahmed Abulkhair, Jyani Akshay Jagdishbhai, Atul Yaduvanshi · Apr 7, 2026 · Citations: 0
Demonstrations Automatic Metrics
Arabic-DeepSeek-R1 achieves the highest average score across the seven-benchmark OALL suite while establishing SOTA or near-SOTA, including dominant results on grammar-focused MadinahQA (surpassing both GPT-5.1 and the OALL leader by…
- A Framework for Closed-Loop Robotic Assembly, Alignment and Self-Recovery of Precision Optical Systems
Seou Choi, Sachin Vaidya, Caio Silva, Shiekh Zia Uddin, Sajib Biswas Shuvo · Mar 23, 2026 · Citations: 0
Demonstrations
In this work, we present a robotics framework for the autonomous construction, alignment, and maintenance of precision optical systems.
- Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis
Michael Cuccarese · Apr 7, 2026 · Citations: 0
Demonstrations
This paper presents epistemic blinding in the context of an agentic system that uses large language models to reason across multiple biological datasets for drug target prioritization.
- A Survey of On-Policy Distillation for Large Language Models
Mingyang Song, Mao Zheng · Apr 1, 2026 · Citations: 0
Expert VerificationDemonstrations
We systematically analyze representative methods, examine industrial deployments, and identify open problems including distillation scaling laws, uncertainty-aware feedback, and agent-level distillation.
- In-Context Learning in Speech Language Models: Analyzing the Role of Acoustic Features, Linguistic Structure, and Induction Heads
Charlotte Pouw, Hosein Mohebbi, Afra Alishahi, Willem Zuidema · Apr 7, 2026 · Citations: 0
Demonstrations
In-Context Learning (ICL) has been extensively studied in text-only Language Models, but remains largely unexplored in the speech domain.
- Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings
Yuning Wu, Ke Wang, Devin Chen, Kai Wei · Mar 11, 2026 · Citations: 0
Demonstrations
To resolve this dilemma, we introduce Hindsight-Anchored Policy Optimization (HAPO).
- COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics
Kartik Sharma, Rakshit S. Trivedi · Mar 6, 2026 · Citations: 0
Pairwise PreferenceDemonstrations
Experiments across a variety of steering tasks and benchmarks demonstrate that COLD-Steer achieves upto 95% steering effectiveness while using 50 times fewer samples compared to the best baseline.