OpenTrain Research Tools

Human Feedback and Eval Paper Explorer

A focused feed for RLHF, preference data, rater protocols, agent evaluation, and LLM-as-judge research. Every paper includes structured metadata for quick triage.

Total papers: 172 Search mode: keyword RSS

Filter by tag

All Automatic Metrics (978) General (590) Coding (314) Simulation Env (115) Math (103) Multilingual (99) Long Horizon (82) Medicine (78) Pairwise Preference (70) Law (45) Multi Agent (41) Human Eval (38) Expert Verification (25) Web Browsing (22) Critique Edit (21) Red Team (21)

Toward Safe and Human-Aligned Game Conversational Recommendation via Multi-Agent Decomposition

Zheng Hui, Xiaokai Wei, Yexi Jiang, Kevin Gao, Chen Wang, Frank Ong · Apr 26, 2025

Citations: 0

Pairwise Preference Automatic Metrics Multi Agent General

These domains typically involve fixed content and passive consumption, where user preferences can be matched by genre or theme.
In contrast, games present distinct challenges: fast-evolving catalogs, interaction-driven preferences (e.g., skill level, mechanics, hardware), and increased risk of unsafe responses in open-ended conversation.

Evaluating the Diversity and Quality of LLM Generated Content

Alexander Shypula, Shuo Li, Botong Zhang, Vishakh Padmakumar, Kayo Yin, Osbert Bastani · Apr 16, 2025

Citations: 0

Pairwise Preference Automatic Metrics General

Recent work suggests that preference-tuning techniques -- such as Reinforcement Learning from Human Feedback (RLHF) methods like PPO and GRPO, as well as alternatives like DPO -- reduce diversity, creating a dilemma given that these models
Using open-ended tasks that require no human intervention, we find counterintuitive results: when using diversity metrics that do not explicitly consider quality, preference-tuned models -- particularly those trained via RL -- often produce

Measuring AI Ability to Complete Long Software Tasks

Thomas Kwa, Ben West, Joel Becker, Amy Deng, Katharyn Garcia, Max Hasin · Mar 18, 2025

Citations: 0

Expert Verification Automatic Metrics Tool Use General

Despite rapid progress on AI benchmarks, the real-world meaning of benchmark performance remains unclear.
To quantify the capabilities of AI systems in terms of human capabilities, we propose a new metric: 50%-task-completion time horizon.

A Survey on the Optimization of Large Language Model-based Agents

Shangheng Du, Jiabao Zhao, Jinxin Shi, Zhentao Xie, Xin Jiang, Yanhong Bai · Mar 16, 2025

Citations: 0

Simulation Env Long Horizon General

With the rapid development of Large Language Models (LLMs), LLM-based agents have been widely adopted in various fields, becoming essential for autonomous decision-making and interactive tasks.
However, current work typically relies on prompt design or fine-tuning strategies applied to vanilla LLMs, which often leads to limited effectiveness or suboptimal performance in complex agent-related environments.

Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks

Hanjiang Hu, Alexander Robey, Changliu Liu · Feb 28, 2025

Citations: 0

Red Team Automatic Metrics General

To address this challenge, we propose a safety steering framework grounded in safe control theory, ensuring invariant safety in multi-turn dialogues.
Our method achieves invariant safety at each turn of dialogue by learning a safety predictor that accounts for adversarial queries, preventing potential context drift toward jailbreaks.

Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence

Wenzhe Yin, Zehao Xiao, Pan Zhou, Shujian Yu, Jiayi Shen, Jan-Jakob Sonke · Feb 24, 2025

Citations: 0

Pairwise Preference Automatic Metrics General

SEFL: A Framework for Generating Synthetic Educational Assignment Feedback with LLM Agents

Mike Zhang, Amalie Pernille Dilling, Léon Gondelman, Niels Erik Ruan Lyngdorf, Euan D. Lindsay, Johannes Bjerva · Feb 18, 2025

Citations: 0

Critique Edit Automatic Metrics General

Through comprehensive evaluations with three LLM judges and three human experts, across a subset of 900 outputs, we demonstrate that SEFL-tuned models outperform both their untuned counterparts and an existing baseline in terms of feedback
The potential for societal impact is reinforced by extensive qualitative comments and ratings from human stakeholders -- both students and higher education instructors.

Should You Use Your Large Language Model to Explore or Exploit?

Keegan Harris, Aleksandrs Slivkins · Jan 31, 2025

Citations: 0

Automatic Metrics Tool Use General

We evaluate the ability of the current generation of large language models (LLMs) to help a decision-making agent facing an exploration-exploitation tradeoff.

Efficient Context Propagating Perceiver Architectures for Auto-Regressive Language Modeling

Kaleel Mahmood, Shaoyi Huang · Dec 8, 2024

Citations: 0

Pairwise Preference Automatic Metrics General

Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes

Rahul Garg, Trilok Padhi, Hemang Jain, Ugur Kursuncu, Ponnurangam Kumaraguru · Nov 19, 2024

Citations: 0

Automatic MetricsSimulation Env General

Experimental results from our study on two hate speech benchmark datasets demonstrate superior performance over the state-of-the-art baselines across AU-ROC, F1, and Recall with improvements of 1.1%, 7%, and 35%, respectively.

Uncovering Autoregressive LLM Knowledge of Thematic Fit in Event Representation

Safeyah Khaled Alshemali, Daniel Bauer, Yuval Marton · Oct 19, 2024

Citations: 0

Automatic Metrics Long Horizon General

We set a new state-of-the-art on thematic fit benchmarks, but show that closed and open weight LLMs respond differently to our prompting strategies: Closed models achieve better scores overall and benefit from multi-step reasoning, but they

Usability Study of Security Features in Programmable Logic Controllers

Karen Li, Kopo M. Ramokapane, Awais Rashid · Aug 4, 2022

Citations: 0

Automatic Metrics Web Browsing General

Our results uncover various misperceptions about the security controls and how design constraints, e.g., safety and lack of regular updates due to the long-term nature of such systems, provide significant challenges to the realization of mo

Protocol Hubs

Expert Verification Papers (25) CS.CL + Expert Verification Papers (20) Pairwise Preference Papers (70) CS.CL + Pairwise Preference Papers (62) CS.AI + Expert Verification Papers (15) CS.AI + Pairwise Preference Papers (42) Rubric Rating Papers (17) CS.CL + Rubric Rating Papers (16) General + Pairwise Preference Papers (43) Expert Verification Or Rubric Rating Papers (39) CS.CL + Math Papers (84) Long Horizon Papers (82) CS.CL + Human Eval Papers (35) CS.CL + Long Horizon Papers (58) Expert Verification + Medicine Papers (11) Human Eval Papers (38)

Human Feedback and Eval Paper Explorer

Filter by tag

Protocol Hubs

Benchmark Hubs

Metric Hubs

Daily Archives