OpenTrain Research Tools

Human Feedback and Eval Paper Explorer

A focused feed for RLHF, preference data, rater protocols, agent evaluation, and LLM-as-judge research. Every paper includes structured metadata for quick triage.

Total papers: 2 Search mode: keyword RSS

Filter by tag

All Automatic Metrics (876) General (528) Coding (281) Simulation Env (109) Multilingual (92) Math (90) Long Horizon (74) Medicine (69) Pairwise Preference (64) Law (43) Multi Agent (38) Human Eval (36) Expert Verification (23) Red Team (21) Web Browsing (21) Critique Edit (19)

SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards

Dengjia Zhang, Xiaoou Liu, Lu Cheng, Yaqing Wang, Kenton Murray, Hua Wei · Feb 24, 2026

Citations: 0

Automatic Metrics Long Horizon General

Large language models (LLMs) are increasingly deployed as multi-step decision-making agents, where effective reward design is essential for guiding learning.
We introduce SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards, a reinforcement learning framework that incorporates uncertainty directly into the reward design.

TSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM Agents

Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Heiko Ludwig, Holger Boche · Feb 12, 2026

Citations: 0

Simulation Env Long Horizon General

Advances in large language models (LLMs) are driving a shift toward using reinforcement learning (RL) to train agents from iterative, multi-turn interactions across tasks.
By moving search from inference time to the rollout stage of training, TSR provides a simple and general mechanism for stronger multi-turn agent learning, complementary to existing frameworks and rejection-sampling-style selection methods.

Protocol Hubs

Simulation Env Papers (109) Multilingual Papers (92) Math Papers (90) Automatic Metrics Papers (876) General Papers (528) Coding Papers (281) Long Horizon Papers (74) Medicine Papers (69) Automatic Metrics + Long Horizon Papers (55) Pairwise Preference Papers (64) Automatic Metrics + Pairwise Preference Papers (51) Law Papers (43) Multi Agent Papers (38) Human Eval Papers (36) Automatic Metrics + Multi Agent Papers (25) Simulation Env + Long Horizon Papers (20)

Human Feedback and Eval Paper Explorer

Filter by tag

Protocol Hubs

Benchmark Hubs

Metric Hubs

Daily Archives