Skip to content

Researcher Tools

Human Feedback and Eval Paper Explorer

A focused feed for RLHF, preference data, rater protocols, agent evaluation, and LLM-as-judge research. Every paper includes structured metadata for quick triage.

Total papers: 45 Search mode: keyword Shortlist (0) RSS

Featured Papers

Popular high-signal papers with direct links to full protocol pages.

Browse by Topic

Jump directly into tag and hub pages to crawl deeper content clusters.

Popular Tags

Top Protocol Hubs

Weekly Eval Paper Digest

The top RLHF, evaluation, and human feedback papers — curated and summarized every Friday.

No spam. Unsubscribe anytime.

Start Here By Objective

Pick your immediate research objective and jump directly to high-signal pages, not generic search.

Scale Your Evaluation Team

Need human evaluators for your benchmark or preference study? OpenTrain sources pre-vetted domain experts into your annotation pipeline.

Milestone-Guided Policy Learning for Long-Horizon Language Agents

Zixuan Wang, Yuchen Yan, Hongxing Li, Teng Pan, Dingming Li, Ruiqing Zhang · May 7, 2026

Citations: 0

Match reason: Matches selected tags (Long Horizon, Simulation Env).

Score: 65% High protocol signal Freshness: Hot Status: Fallback
Simulation Env Long Horizon Coding
  • While long-horizon agentic tasks require language agents to perform dozens of sequential decisions, training such agents with reinforcement learning remains challenging.
  • BEACON partitions trajectories at milestone boundaries, applies temporal reward shaping within segments to credit partial progress, and estimates advantages at dual scales to prevent distant failures from corrupting the evaluation of local…
Open paper
Democratizing Tool Learning with Environments Fully Simulated by a Free 8B Language Model

Chenming Tang, Hsiu-Yuan Huang, Weijie Liu, Junqiang Zheng, Saiyong Yang, Yunfang Wu · Apr 20, 2026

Citations: 0

Match reason: Matches selected tags (Long Horizon, Simulation Env).

Score: 65% Moderate protocol signal Freshness: Hot Status: Fallback
Simulation Env Long Horizon General
  • Reinforcement learning (RL) has become a prevalent paradigm for training tool calling agents, which typically requires online interactive environments.
  • In this work, we propose TRUSTEE, a cost-friendly method for training tool calling agents with dynamic environments fully simulated by free open-source LMs that can be as small as 8B, including task generation, user simulation, tool…
Open paper
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

Meng Chu, Xuan Billy Zhang, Kevin Qinghong Lin, Lingdong Kong, Jize Zhang, Teng Tu · Apr 24, 2026

Citations: 0

Match reason: Matches selected tags (Long Horizon, Simulation Env).

Score: 62% Moderate protocol signal Freshness: Hot Status: Fallback
Simulation Env Long Horizon Law
  • Agents that manipulate objects, navigate software, coordinate with others, or design experiments require predictive environment models, yet the term world model carries different meanings across research communities.
  • Using this framework, we synthesize over 400 works and summarize more than 100 representative systems spanning model-based reinforcement learning, video generation, web and GUI agents, multi-agent social simulation, and AI-driven scientific…
Open paper
When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web Navigation

Henry Peng Zou, Chunyu Miao, Wei-Chieh Huang, Yankai Chen, Yue Zhou, Hanrong Zhang · Apr 1, 2026

Citations: 0

Match reason: Matches selected tags (Long Horizon, Simulation Env).

Score: 58% High protocol signal Freshness: Warm Status: Ready
Critique Edit Simulation Env Long Horizon Coding
  • As LLM agents transition from short, static problem solving to executing complex, long-horizon tasks in dynamic environments, the ability to handle user interruptions, such as adding requirement or revising goals, during mid-task execution…
  • In this paper, we present the first systematic study of interruptible agents in long-horizon, environmentally grounded web navigation tasks, where actions induce persistent state changes.
Open paper
MolQuest: A Benchmark for Agentic Evaluation of Abductive Reasoning in Chemical Structure Elucidation

Taolin Han, Shuang Wu, Jinghang Wang, Yuhao Zhou, Renquan Lv, Bing Zhao · Mar 26, 2026

Citations: 0

Match reason: Matches selected tags (Long Horizon, Simulation Env).

Score: 58% Moderate protocol signal Freshness: Warm Status: Ready
Automatic MetricsSimulation Env Long Horizon General
  • Current scientific evaluation benchmarks predominantly rely on static, single-turn Question Answering (QA) formats, which are inadequate for measuring model performance in complex scientific tasks that require multi-step iteration and…
  • To address this gap, we introduce MolQuest, a novel agent-based evaluation framework for molecular structure elucidation built upon authentic chemical experimental data.
Open paper

Match reason: Matches selected tags (Long Horizon, Simulation Env).

Score: 58% High protocol signal Freshness: Warm Status: Ready
Demonstrations Human EvalLlm As Judge Long Horizon General
  • LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass@1 on ToolBench (Zhou et al., 2024; Qin et al., 2024) -- yet every failed trajectory is routinely…
  • We introduce AgentHER, a framework that recovers this lost training signal by adapting the Hindsight Experience Replay (HER; Andrychowicz et al., 2017) principle to natural-language agent trajectories for offline data augmentation.
Open paper

Match reason: Matches selected tags (Long Horizon, Simulation Env).

Score: 58% High protocol signal Freshness: Warm Status: Ready
Automatic MetricsSimulation Env Long Horizon General
  • We introduce DECEPTGUARD, a unified framework that systematically compares three monitoring regimes: black-box monitors (actions and outputs only), CoT-aware monitors (additionally observing the agent's chain-of-thought reasoning trace),…
  • We introduce DECEPTSYNTH, a scalable synthetic pipeline for generating deception-positive and deception-negative agent trajectories across a novel 12-category taxonomy spanning verbal, behavioral, and structural deception.
Open paper
LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation

Feiyu Duan, Xuanjing Huang, Zhongyu Wei · Mar 12, 2026

Citations: 0

Match reason: Matches selected tags (Long Horizon, Simulation Env).

Score: 58% High protocol signal Freshness: Warm Status: Ready
Pairwise Preference Simulation Env Long Horizon General
  • However, existing benchmarks for personalized assistants remain misaligned with real-world user-assistant interactions, failing to capture the complexity of external contexts and users' cognitive states.
  • Based on LifeSim, we introduce LifeSim-Eval, a comprehensive benchmark for multi-scenario, long-horizon personalized assistance.
Open paper

Match reason: Matches selected tags (Long Horizon, Simulation Env).

Score: 58% Moderate protocol signal Freshness: Warm Status: Ready
Pairwise Preference Simulation Env Long Horizon General
  • Large Language Models (LLMs) are increasingly used to power autonomous agents for complex, multi-step tasks.
  • We propose simulation-in-the-loop, an interaction paradigm that enables users and agents to explore simulated future trajectories before committing to decisions.
Open paper
ReDAct: Uncertainty-Aware Deferral for LLM Agents

Dzianis Piatrashyn, Nikita Kotelevskii, Kirill Grishchenkov, Nikita Glazkov, Ivan Nasonov, Ilya Makarov · Apr 8, 2026

Citations: 0

Match reason: Matches selected tags (Long Horizon, Simulation Env).

Score: 58% High protocol signal Freshness: Warm Status: Fallback
Simulation Env Long Horizon General
  • Recently, LLM-based agents have become increasingly popular across many applications, including complex sequential decision-making problems.
  • In ReDAct, an agent is equipped with two LLMs: a small, cheap model used by default, and a large, more reliable but expensive model.
Open paper
Spatio-Temporal Attention Enhanced Multi-Agent DRL for UAV-Assisted Wireless Networks with Limited Communications

Che Chen, Lanhua Li, Shimin Gong, Yu Zhao, Yuming Fang, Dusit Niyato · Mar 23, 2026

Citations: 0

Match reason: Matches selected tags (Long Horizon, Simulation Env).

Score: 58% Moderate protocol signal Freshness: Warm Status: Fallback
Simulation Env Long Horizon General
  • To maximize the overall throughput, we first propose a delay-tolerant multi-agent deep reinforcement learning (MADRL) algorithm that integrates a delay-penalized reward to encourage information sharing among UAVs, while jointly optimizing…
Open paper
Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants

Alejandro Breen Herrera, Aayush Sheth, Steven G. Xu, Zhucheng Zhan, Charles Wright, Marcus Yearwood · Mar 3, 2026

Citations: 0

Match reason: Matches selected tags (Long Horizon, Simulation Env).

Score: 58% Moderate protocol signal Freshness: Warm Status: Fallback
Pairwise PreferenceRubric Rating Llm As JudgeSimulation Env Long Horizon General
  • Conversational shopping assistants (CSAs) represent a compelling application of agentic AI, but moving from prototype to production reveals two underexplored challenges: how to evaluate multi-turn interactions and how to optimize tightly…
  • We introduce a multi-faceted evaluation rubric that decomposes end-to-end shopping quality into structured dimensions and develop a calibrated LLM-as-judge pipeline aligned with human annotations.
Open paper
Contextualized Privacy Defense for LLM Agents

Yule Wen, Yanzhe Zhang, Jianxun Lian, Xiaoyuan Yi, Xing Xie, Diyi Yang · Mar 3, 2026

Citations: 0

Match reason: Matches selected tags (Long Horizon, Simulation Env).

Score: 58% Moderate protocol signal Freshness: Warm Status: Fallback
Simulation Env Long Horizon General
  • LLM agents increasingly act on users' personal information, yet existing privacy defenses remain limited in both design and adaptability.
  • These paradigms are insufficient for supporting contextual, proactive privacy decisions in multi-step agent execution.
Open paper
AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations

Cheng Jiayang, Dongyu Ru, Lin Qiu, Yiyang Li, Xuezhi Cao, Yangqiu Song · Mar 2, 2026

Citations: 0

Match reason: Matches selected tags (Long Horizon, Simulation Env).

Score: 58% Moderate protocol signal Freshness: Warm Status: Fallback
Simulation Env Long Horizon General
  • Long-horizon interactions between users and LLM-based assistants necessitate effective memory management, yet current approaches face challenges in training and evaluation of memory.
  • To address these gaps, we introduce AMemGym, an interactive environment enabling on-policy evaluation and optimization for memory-driven personalization.
Open paper
Citations: 0

Match reason: Matches selected tags (Long Horizon, Simulation Env).

Score: 55% Moderate protocol signal Freshness: Warm Status: Fallback
Simulation Env Long Horizon General
  • However, existing benchmarks remain constrained to isolated scenarios, narrow action spaces, or synthetic data, failing to capture the holistic nature of authentic human behavior.
  • To bridge this gap, we introduce OmniBehavior, the first user simulation benchmark constructed entirely from real-world data, integrating long-horizon, cross-scenario, and heterogeneous behavioral patterns into a unified framework.
Open paper
From High-Dimensional Spaces to Verifiable ODD Coverage for Safety-Critical AI-based Systems

Thomas Stefani, Johann Maximilian Christensen, Elena Hoemann, Frank Köster, Sven Hallerbach · Apr 2, 2026

Citations: 0

Match reason: Matches selected tags (Long Horizon, Simulation Env).

Score: 55% Moderate protocol signal Freshness: Warm Status: Fallback
Simulation Env Long Horizon Math
  • While Artificial Intelligence (AI) offers transformative potential for operational performance, its deployment in safety-critical domains such as aviation requires strict adherence to rigorous certification standards.
  • Ultimately, this method enables the validation of ODD coverage in higher dimensions, advancing a Safety-by-Design approach while complying with EASA's standards.
Open paper

Protocol Hubs

Get Started

Join the #1 Platform for AI Training Talent

Where top AI builders and expert AI Trainers connect to build the future of AI.
Self-Service
Post a Job
Post your project and get a shortlist of qualified AI Trainers and Data Labelers. Hire and manage your team in the tools you already use.
Managed Service
For Large Projects
Done-for-You
We recruit, onboard, and manage a dedicated team inside your tools. End-to-end operations for large or complex projects.
For Freelancers
Join as an AI Trainer
Find AI training and data labeling projects across platforms, all in one place. One profile, one application process, more opportunities.