- AD-Bench: A Real-World, Trajectory-Aware Advertising Analytics Benchmark for LLM Agents
Lingxiang Hu, Yiding Sun, Tianle Xia, Wenwei Li, Ming Xu · Feb 15, 2026 · Citations: 0
Expert Verification Simulation Env Long Horizon
While Large Language Model (LLM) agents have achieved remarkable progress in complex reasoning tasks, evaluating their performance in real-world environments has become a critical problem.
- AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling
Liang Ding · Mar 22, 2026 · Citations: 0
Demonstrations Human EvalLlm As Judge Long Horizon
LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass@1 on ToolBench (Zhou et al., 2024; Qin et al., 2024) -- yet every failed trajectory is routinely…
- Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification
Moises Andrade, Joonhyuk Cha, Brandon Ho, Vriksha Srihari, Karmesh Yadav · Jul 15, 2025 · Citations: 0
Pairwise Preference Automatic MetricsSimulation Env Long Horizon
We evaluate MLLM verifiers across web navigation, computer use, and robotics, spanning 13+ models, 28+ designs, and thousands of trajectories from diverse agents.
- Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification
Yichi Zhang, Nabeel Seedat, Yinpeng Dong, Peng Cui, Jun Zhu · Mar 3, 2026 · Citations: 0
Expert Verification Automatic Metrics Long Horizon
As LLM-powered agents have been used for high-stakes decision-making, such as clinical diagnosis, it becomes critical to develop reliable verification of their decisions to facilitate trustworthy deployment.
- \$OneMillion-Bench: How Far are Language Agents from Human Experts?
Qianyu Yang, Yang Liu, Jiaqi Li, Jun Bai, Hao Chen · Mar 9, 2026 · Citations: 0
Rubric Rating Automatic Metrics Tool Use
To this end, we introduce \OneMillion-Bench \OneMillion-Bench, a benchmark of 400 expert-curated tasks spanning Law, Finance, Industry, Healthcare, and Natural Science, built to evaluate agents across economically consequential scenarios.
- From Days to Minutes: An Autonomous AI Agent Achieves Reliable Clinical Triage in Remote Patient Monitoring
Seunghwan Kim, Tiffany H. Kung, Heena Verma, Dilan Edirisinghe, Kaveh Sedehi · Mar 10, 2026 · Citations: 0
Expert Verification Automatic Metrics Long Horizon
Results: Against a human majority-vote standard (N=467), the agent achieved 95.8% emergency sensitivity and 88.5% sensitivity for all actionable alerts (85.7% specificity).
- TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories
Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang, Yun-Nung Chen · Apr 8, 2026 · Citations: 0
Red Team Automatic Metrics Long Horizon
As large language models (LLMs) evolve from static chatbots into autonomous agents, the primary vulnerability surface shifts from final outputs to intermediate execution traces.
- PEARL: Self-Evolving Assistant for Time Management with Reinforcement Learning
Bingxuan Li, Jeonghwan Kim, Cheng Qian, Xiusi Chen, Eitan Anzenberg · Jan 17, 2026 · Citations: 0
Pairwise Preference Automatic Metrics Long Horizon
To enable a systematic study of this question, we introduce CalConflictBench, a benchmark for long-horizon calendar conflict resolution.
- When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web Navigation
Henry Peng Zou, Chunyu Miao, Wei-Chieh Huang, Yankai Chen, Yue Zhou · Apr 1, 2026 · Citations: 0
Critique Edit Simulation Env Long Horizon
As LLM agents transition from short, static problem solving to executing complex, long-horizon tasks in dynamic environments, the ability to handle user interruptions, such as adding requirement or revising goals, during mid-task execution…
- LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation
Feiyu Duan, Xuanjing Huang, Zhongyu Wei · Mar 12, 2026 · Citations: 0
Pairwise Preference Simulation Env Long Horizon
However, existing benchmarks for personalized assistants remain misaligned with real-world user-assistant interactions, failing to capture the complexity of external contexts and users' cognitive states.
- Step-CoT: Stepwise Visual Chain-of-Thought for Medical Visual Question Answering
Lin Fan, Yafei Ou, Zhipeng Deng, Pengyu Dai, Hou Chongxian · Mar 14, 2026 · Citations: 0
Expert Verification Automatic Metrics Long Horizon
Benchmark: github.com/hahaha111111/Step-CoT.
- Decoupling Strategy and Execution in Task-Focused Dialogue via Goal-Oriented Preference Optimization
Jingyi Xu, Xingyu Ren, Zhoupeng Shou, Yumeng Zhang, Zhiqiang You · Jan 24, 2026 · Citations: 0
Pairwise Preference Automatic Metrics Long Horizon
To address this, we propose Goal-Oriented Preference Optimization (GOPO), a hierarchical reinforcement learning framework that decouples strategy planning from response generation via an Expert Agent and a Customer Service Agent.
- APEX-Agents
Bertie Vidgen, Austin Mann, Abby Fennelly, John Wright Stanly, Lucas Rothman · Jan 20, 2026 · Citations: 0
Rubric RatingExpert Verification Automatic Metrics Long Horizon
We introduce the AI Productivity Index for Agents (APEX-Agents), a benchmark for assessing whether AI agents can execute long-horizon, cross-application tasks created by investment banking analysts, management consultants, and corporate…
- Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning
Chi-Pin Huang, Yunze Man, Zhiding Yu, Min-Hung Chen, Jan Kautz · Jan 14, 2026 · Citations: 0
Pairwise Preference Simulation Env Long Horizon
Fast-ThinkAct learns to reason efficiently with latent CoTs by distilling from a teacher, driven by a preference-guided objective to align manipulation trajectories that transfers both linguistic and visual planning capabilities for embodie
- SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning
Philip Schroeder, Thomas Weng, Karl Schmeckpeper, Eric Rosen, Stephen Hart · Mar 30, 2026 · Citations: 0
Demonstrations Simulation Env Long Horizon
To address this limitation, we introduce SOLE-R1 (Self-Observing LEarner), a video-language reasoning model explicitly designed to serve as the sole reward signal for online RL.
- Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants
Alejandro Breen Herrera, Aayush Sheth, Steven G. Xu, Zhucheng Zhan, Charles Wright · Mar 3, 2026 · Citations: 0
Pairwise PreferenceRubric Rating Llm As JudgeSimulation Env Long Horizon
Conversational shopping assistants (CSAs) represent a compelling application of agentic AI, but moving from prototype to production reveals two underexplored challenges: how to evaluate multi-turn interactions and how to optimize tightly…
- RAPTOR: A Foundation Policy for Quadrotor Control
Jonas Eschmann, Dario Albani, Giuseppe Loianno · Sep 15, 2025 · Citations: 0
Demonstrations Simulation Env Long Horizon
Humans are remarkably data-efficient when adapting to new unseen conditions, like driving a new car.
- Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models
Yunqing Liu, Nan Zhang, Zhiming Tan · Sep 1, 2025 · Citations: 0
Pairwise Preference Automatic Metrics Long Horizon
We additionally contribute a CAD dataset with human preference annotations.
- Signals: Trajectory Sampling and Triage for Agentic Interactions
Shuguang Chen, Adil Hafeez, Salman Paracha · Apr 1, 2026 · Citations: 0
Pairwise Preference Automatic Metrics Long Horizon
We propose a lightweight, signal-based framework for triaging agentic interaction trajectories.
- Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure
Davide Di Gioia · Mar 23, 2026 · Citations: 0
Pairwise Preference Automatic Metrics Long Horizon
Autonomous agents operating in continuous environments must decide not only what to do, but when to act.
- MIND: Unified Inquiry and Diagnosis RL with Criteria Grounded Clinical Supports for Psychiatric Consultation
Guoyi Li, Shihao Xu, Jiatong Ma, Yunyun Han, Jianhua Chen · Mar 4, 2026 · Citations: 0
Rubric Rating Automatic Metrics Long Horizon
Large language models (LLMs) have advanced medical dialogue systems, yet psychiatric consultation poses substantially higher demands due to subjective ambiguity and comorbidity complexity: an agent must continuously extract…
- Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning
Ruize Zhang, Sirui Xiang, Zelai Xu, Feng Gao, Shilong Ji · May 7, 2025 · Citations: 0
Demonstrations Automatic Metrics Long Horizon
The task is turn-based, multi-agent, and physically grounded, posing significant challenges due to its long-horizon dependencies, tight inter-agent coupling, and the underactuated dynamics of quadrotors.
- Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing
Anmol Gulati, Sahil Sen, Waqar Sarguroh, Kevin Paul · Mar 6, 2026 · Citations: 0
Human EvalAutomatic Metrics Long Horizon
We introduce Beyond Rows to Reasoning (BRTR), a multimodal agentic framework for spreadsheet understanding that replaces single-pass retrieval with an iterative tool-calling loop, supporting end-to-end Excel workflows from complex analysis…
- Do Compact SSL Backbones Matter for Audio Deepfake Detection? A Controlled Study with RAPTOR
Ajinkya Kulkarni, Sandipana Dowerah, Atharva Kulkarni, Tanel Alumäe, Mathew Magimai Doss · Mar 6, 2026 · Citations: 0
Pairwise Preference Long Horizon
We present RAPTOR, Representation Aware Pairwise-gated Transformer for Out-of-domain Recognition a controlled study of compact SSL backbones from the HuBERT and WavLM within a unified pairwise-gated fusion detector, evaluated across 14…
- Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought
Xinghao Zhao · Mar 19, 2026 · Citations: 0
Automatic Metrics Long Horizon
Chain-of-thought (CoT) reasoning improves LLM accuracy, yet detecting failures cheaply remains elusive.
- ReDAct: Uncertainty-Aware Deferral for LLM Agents
Dzianis Piatrashyn, Nikita Kotelevskii, Kirill Grishchenkov, Nikita Glazkov, Ivan Nasonov · Apr 8, 2026 · Citations: 0
Simulation Env Long Horizon
Recently, LLM-based agents have become increasingly popular across many applications, including complex sequential decision-making problems.
- Embodied Task Planning via Graph-Informed Action Generation with Large Language Model
Xiang Li, Ning Yan, Masood Mortazavi · Jan 29, 2026 · Citations: 0
Simulation Env Long Horizon
We propose GiG, a novel planning framework that structures embodied agents' memory using a Graph-in-Graph architecture.
- RASPRef: Retrieval-Augmented Self-Supervised Prompt Refinement for Large Reasoning Models
Rahul Soni · Mar 27, 2026 · Citations: 0
Critique Edit Long Horizon
Recent reasoning-focused language models such as DeepSeek R1 and OpenAI o1 have demonstrated strong performance on structured reasoning benchmarks including GSM8K, MATH, and multi-hop question answering tasks.
- KLong: Training LLM Agent for Extremely Long-horizon Tasks
Yue Liu, Yingwei Ma, Yibo Miao, Yanhao Li, Yuchong Xie · Feb 19, 2026 · Citations: 0
Rubric Rating Long Horizon
Then, we introduce Research-Factory, an automated pipeline that generates high-quality training data by collecting research papers and constructing evaluation rubrics.
- Watch and Learn: Learning to Use Computers from Online Videos
Chan Hee Song, Yiwen Song, Palash Goyal, Yu Su, Oriana Riva · Oct 6, 2025 · Citations: 0
Demonstrations Long Horizon
Computer-using agents (CUAs) must plan task workflows across diverse and evolving applications, yet progress is limited by the lack of large-scale, high-quality training data.
- Efficient Agent Training for Computer Use
Yanheng He, Jiahe Jin, Pengfei Liu · May 20, 2025 · Citations: 0
Demonstrations Long Horizon
We introduce PC Agent-E, an efficient agent training framework that significantly reduces reliance on large-scale human demonstrations.
- From Control to Foresight: Simulation as a New Paradigm for Human-Agent Collaboration
Gaole He, Brian Y. Lim · Mar 12, 2026 · Citations: 0
Pairwise Preference Simulation Env Long Horizon
Large Language Models (LLMs) are increasingly used to power autonomous agents for complex, multi-step tasks.
- MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation
Chengshu Li, Mengdi Xu, Arpit Bahety, Hang Yin, Yunfan Jiang · Oct 21, 2025 · Citations: 0
Demonstrations Simulation Env Long Horizon
Imitation learning from large-scale, diverse human demonstrations has been shown to be effective for training robots, but collecting such data is costly and time-consuming.
- Tutoring Large Language Models to be Domain-adaptive, Precise, and Safe
Somnath Banerjee · Feb 14, 2026 · Citations: 0
Pairwise Preference Long Horizon
The methodological trajectory moves from classical supervised adaptation for task-specific demands to decoding-time alignment for safety, finally leveraging human feedback and preference modeling to achieve sociolinguistic acuity.
- DeceptGuard :A Constitutional Oversight Framework For Detecting Deception in LLM Agents
Snehasis Mukhopadhyay · Mar 14, 2026 · Citations: 0
Automatic MetricsSimulation Env Long Horizon
We introduce DECEPTGUARD, a unified framework that systematically compares three monitoring regimes: black-box monitors (actions and outputs only), CoT-aware monitors (additionally observing the agent's chain-of-thought reasoning trace),…
- LIT-RAGBench: Benchmarking Generator Capabilities of Large Language Models in Retrieval-Augmented Generation
Koki Itai, Shunichi Hasegawa, Yuta Yamamoto, Gouki Minegishi, Masaki Otsuki · Mar 6, 2026 · Citations: 0
Llm As JudgeAutomatic Metrics Long Horizon
To bridge the gap between existing evaluations and practical use, we introduce LIT-RAGBench (the Logic, Integration, Table, Reasoning, and Abstention RAG Generator Benchmark), which defines five categories: Integration, Reasoning, Logic,…
- HUMORCHAIN: Theory-Guided Multi-Stage Reasoning for Interpretable Multimodal Humor Generation
Jiajun Zhang, Shijia Luo, Ruikang Zhang, Qi Su · Nov 21, 2025 · Citations: 0
Pairwise Preference Automatic Metrics Long Horizon
Humor, as both a creative human activity and a social binding mechanism, has long posed a major challenge for AI generation.
- BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning
Qiusi Zhan, Hyeonjeong Ha, Rui Yang, Sirui Xu, Hanyang Chen · Oct 31, 2025 · Citations: 0
Pairwise Preference Automatic Metrics Long Horizon
We introduce BEAT, the first framework to inject such visual backdoors into VLM-based embodied agents using objects in the environments as triggers.
- FrameRef: A Framing Dataset and Simulation Testbed for Modeling Bounded Rational Information Health
Victor De Lima, Jiqun Liu, Grace Hui Yang · Feb 17, 2026 · Citations: 0
Human EvalSimulation Env Long Horizon
Within this framework, we construct framing-sensitive agent personas by fine-tuning language models with framing-conditioned loss attenuation, inducing targeted biases while preserving overall task competence.
- FrontierFinance: A Long-Horizon Computer-Use Benchmark of Real-World Financial Tasks
Michael Krumdick, Varshini Reddy, Shivani Chaudhary, William Day, Maarij Ahmed · Apr 7, 2026 · Citations: 0
Rubric Rating Long Horizon
To address this, we introduce FrontierFinance, a long-horizon benchmark of 25 complex financial modeling tasks across five core finance models, requiring an average of over 18 hours of skilled human labor per task to complete.
- TREX: Trajectory Explanations for Multi-Objective Reinforcement Learning
Dilina Rajapakse, Juan C. Rosero, Ivana Dusparic · Mar 23, 2026 · Citations: 0
Pairwise Preference Long Horizon
Multi-Objective Reinforcement Learning (MORL) addresses this limitation by enabling agents to optimize several objectives simultaneously, explicitly reasoning about trade-offs between them.
- MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs
Baorong Shi, Bo Cui, Boyuan Jiang, Deli Yu, Fang Qian · Feb 13, 2026 · Citations: 0
Pairwise PreferenceRubric Rating Long Horizon
MedXIAOHE achieves state-of-the-art performance across diverse medical benchmarks and surpasses leading closed-source multimodal systems on multiple capabilities.
- DEER: A Benchmark for Evaluating Deep Research Agents on Expert Report Generation
Janghoon Han, Heegyu Kim, Changho Lee, Dahm Lee, Min Hyung Park · Dec 19, 2025 · Citations: 0
Rubric RatingExpert Verification Long Horizon
However, evaluating such reports remains challenging: report quality is multifaceted, making it difficult to determine what to assess and by what criteria; LLM-based judges may miss errors that require domain expertise to identify; and…
- Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
Yihe Deng, I-Hung Hsu, Jun Yan, Zifeng Wang, Rujun Han · Oct 29, 2025 · Citations: 0
Demonstrations Long Horizon
Beyond reasoning benchmarks, SRL generalizes effectively to agentic software engineering tasks, establishing it as a robust and versatile training framework for reasoning-oriented LLMs.
- DataSTORM: Deep Research on Large-Scale Databases using Exploratory Data Analysis and Data Storytelling
Shicheng Liu, Yucheng Jiang, Sajid Farook, Camila Nicollier Sanchez, David Fernando Castro Pena · Apr 7, 2026 · Citations: 0
Human Eval Long Horizon
Deep research with Large Language Model (LLM) agents is emerging as a powerful paradigm for multi-step information discovery, synthesis, and analysis.
- Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization
He Du, Qiming Ge, Jiakai Hu, Aijun Yang, Zheng Cai · Mar 30, 2026 · Citations: 0
Critique Edit Long Horizon
We present Kernel-Smith, a framework for high-performance GPU kernel and operator generation that combines a stable evaluation-driven evolutionary agent with an evolution-oriented post-training recipe.
- Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering
Xinyu Zhu, Yuzhu Cai, Zexi Liu, Bingyang Zheng, Cheng Wang · Jan 15, 2026 · Citations: 0
Simulation Env Long Horizon
The advancement of artificial intelligence toward agentic science is currently bottlenecked by the challenge of ultra-long-horizon autonomy, the ability to sustain strategic coherence and iterative correction over experimental cycles spanni
- MAPO: Mixed Advantage Policy Optimization for Long-Horizon Multi-Turn Dialogue
Naifan Zhang, Ruihan Sun, Jinwei Su, Hengjie Yang, Zhengyuan Pan · Mar 6, 2026 · Citations: 0
Llm As JudgeSimulation Env Long Horizon
We propose a critic-free and efficient RL algorithm named MAPO that leverages dense process feedback from a judge model and propagates long-horizon effects through Monte Carlo returns.
- Mind the Shift: Decoding Monetary Policy Stance from FOMC Statements with Large Language Models
Yixuan Tang, Yi Yang · Mar 15, 2026 · Citations: 0
Llm As JudgeAutomatic Metrics Long Horizon
Across four LLM backbones, DCS consistently outperforms supervised probes and LLM-as-judge baselines, achieving up to 71.1% accuracy on sentence-level hawkish--dovish classification.
- JAWS: Enhancing Long-term Rollout of Neural PDE Solvers via Spatially-Adaptive Jacobian Regularization
Fengxiang Nie, Yasuhiro Suzuki · Mar 4, 2026 · Citations: 0
Automatic MetricsSimulation Env Long Horizon
Experiments demonstrate that JAWS serves as an effective spectral pre-conditioner for trajectory optimization, allowing short-horizon, memory-efficient training to match the accuracy of long-horizon baselines.
- The Trinity of Consistency as a Defining Principle for General World Models
Jingxuan Wei, Siyuan Li, Yuhang Xu, Zheng Sun, Junjie Jiang · Feb 26, 2026 · Citations: 0
Simulation Env Long Horizon
To complement this conceptual framework, we introduce CoW-Bench, a benchmark centered on multi-frame reasoning and generation scenarios.
- Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks
Shuo He, Lang Feng, Qi Wei, Xin Cheng, Lei Feng · Feb 26, 2026 · Citations: 0
Simulation Env Long Horizon
Group-based reinforcement learning (RL), such as GRPO, has advanced the capabilities of large language models on long-horizon agentic tasks.
- SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards
Dengjia Zhang, Xiaoou Liu, Lu Cheng, Yaqing Wang, Kenton Murray · Feb 24, 2026 · Citations: 0
Simulation Env Long Horizon
Large language models (LLMs) are increasingly deployed as multi-step decision-making agents, where effective reward design is essential for guiding learning.
- Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents
Haiyang Xu, Xi Zhang, Haowei Liu, Junyang Wang, Zhaozai Zhu · Feb 15, 2026 · Citations: 0
Simulation Env Long Horizon
The paper introduces GUI-Owl-1.5, the latest native GUI agent model that features instruct/thinking variants in multiple sizes (2B/4B/8B/32B/235B) and supports a range of platforms (desktop, mobile, browser, and more) to enable cloud-edge…
- LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Yu Fan, Jingwei Ni, Jakob Merane, Yang Tian, Yoan Hermstrüwer · May 19, 2025 · Citations: 0
Llm As JudgeAutomatic Metrics Long Horizon
To address this, we introduce LEXam, a novel benchmark derived from 340 law exams spanning 116 law school courses across a range of subjects and degree levels.
- PMG: Parameterized Motion Generator for Human-like Locomotion Control
Chenxi Han, Yuheng Min, Zihao Huang, Ao Hong, Hang Liu · Feb 13, 2026 · Citations: 0
Automatic Metrics Long Horizon
Recent advances in data-driven reinforcement learning and motion tracking have substantially improved humanoid locomotion, yet critical practical challenges remain.
- Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models
Aryan Kasat, Smriti Singh, Aman Chadha, Vinija Jain · Mar 23, 2026 · Citations: 0
Llm As Judge Long Horizon
Using an LLM-as-judge scoring pipeline validated across three judge models, we classify more than 600 responses from 13 LLMs spanning a range of architectures, parameter scales, and training regimes across six classical moral dilemmas, and…
- Spatio-Temporal Attention Enhanced Multi-Agent DRL for UAV-Assisted Wireless Networks with Limited Communications
Che Chen, Lanhua Li, Shimin Gong, Yu Zhao, Yuming Fang · Mar 23, 2026 · Citations: 0
Simulation Env Long Horizon
To maximize the overall throughput, we first propose a delay-tolerant multi-agent deep reinforcement learning (MADRL) algorithm that integrates a delay-penalized reward to encourage information sharing among UAVs, while jointly optimizing…
- AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations
Cheng Jiayang, Dongyu Ru, Lin Qiu, Yiyang Li, Xuezhi Cao · Mar 2, 2026 · Citations: 0
Simulation Env Long Horizon
Long-horizon interactions between users and LLM-based assistants necessitate effective memory management, yet current approaches face challenges in training and evaluation of memory.
- Self-Correcting VLA: Online Action Refinement via Sparse World Imagination
Chenyv Liu, Wentao Tan, Lei Zhu, Fengling Li, Jingjing Li · Feb 25, 2026 · Citations: 0
Simulation Env Long Horizon
Reinforcement learning enhances physical grounding through exploration yet typically relies on external reward signals that remain isolated from the agent's internal states.