Skip to content

Researcher Tools

Human Feedback and Eval Paper Explorer

A focused feed for RLHF, preference data, rater protocols, agent evaluation, and LLM-as-judge research. Every paper includes structured metadata for quick triage.

Total papers: 321 Search mode: keyword Shortlist (0) RSS

Featured Papers

Popular high-signal papers with direct links to full protocol pages.

Browse by Topic

Jump directly into tag and hub pages to crawl deeper content clusters.

Popular Tags

Top Protocol Hubs

Weekly Eval Paper Digest

The top RLHF, evaluation, and human feedback papers — curated and summarized every Friday.

No spam. Unsubscribe anytime.

Start Here By Objective

Pick your immediate research objective and jump directly to high-signal pages, not generic search.

Scale Your Evaluation Team

Need human evaluators for your benchmark or preference study? OpenTrain sources pre-vetted domain experts into your annotation pipeline.

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

Truong Nguyen, Tien-Phat Nguyen, Linh Ngo Van, Duy Minh Ho Nguyen, Khoa D. Doan, Trung Le · May 12, 2026

Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics General
  • Direct Preference Optimization (DPO) is a widely used RL-free method for aligning language models from pairwise preferences, but it models preferences over full sequences even though generation is driven by per-token decisions.
  • We introduce Token-level Bregman Preference Optimization (TBPO), which posits a token-level Bradley-Terry preference model over next-token actions conditioned on the prefix, and derive a Bregman-divergence density-ratio matching objective…
Open paper
How Value Induction Reshapes LLM Behaviour

Arnav Arora, Natalie Schluter, Katherine Metcalf, Maartje ter Hoeve · May 8, 2026

Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics General
  • This is done to increase utility, ensure safety, and improve the experience of the people interacting with the model.
  • We fine-tune models using curated value subsets of existing preference datasets, measuring the impact of value induction on expression of other values, model safety, anthropomorphic language, and various QA benchmarks.
Open paper
TRACE: Tourism Recommendation with Accountable Citation Evidence

Zixu Zhao, Sijin Wang, Yu Hou, Yuanyuan Xu, Yufan Sheng, Xike Xie · May 8, 2026

Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics General
  • Existing CRS benchmarks primarily evaluate systems with a single Recall@k score over entity mentions, and tourism-specific resources add spatial or knowledge-graph context, yet none of them couple multi-turn recommendation with verbatim…
  • This leaves an evaluation gap for tourism recommendation that is simultaneously trustworthy, verifiable, and adaptive: recommend the right point of interest (POI) for multi-aspect preferences (such as cuisine, price, atmosphere, walking…
Open paper
MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval

Chunyu Li, Jingyi Kang, Ding Chen, Mengyuan Zhang, Jiajun Shen, Bo Tang · May 7, 2026

Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics General
  • In agent memory systems, the reranking model serves as the critical bridge connecting user queries with long-term memory.
  • On the memory retrieval benchmark, MemReranker-0.6B substantially outperforms BGE-Reranker and matches open-source 4B/8B models as well as GPT-4o-mini on key metrics.
Open paper
Misaligned by Reward: Socially Undesirable Preferences in LLMs

Gayane Ghazaryan, Esra Dönmez · May 6, 2026

Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics General
  • Reward models are a key component of large language model alignment, serving as proxies for human preferences during training.
  • We introduce a framework that converts social evaluation datasets into pairwise preference data, leveraging gold labels where available and directional bias indicators otherwise.
Open paper
StoryAlign: Evaluating and Training Reward Models for Story Generation

Haotian Xia, Hao Peng, Yunjia Qi, Xiaozhi Wang, Bin Xu, Lei Hou · May 6, 2026

Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics Coding
  • Although large language models (LLMs) have significantly advanced text generation, stories generated by LLMs still diverge from human-authored works regarding complex narrative structure and human-aligned preferences.
  • We find existing reward models struggle to select human-preferred stories, with the best model achieving only 66.3\% accuracy.
Open paper
SWAN: Semantic Watermarking with Abstract Meaning Representation

Ziping Ye, Gourab Dey, Christos Christodoulopoulos, Charith Peris, Anil Ramakrishna, Weitong Ruan · May 5, 2026

Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics General
  • In contrast to existing watermarking methods, which typically encode signatures by adjusting token selection preferences during text generation, SWAN embeds the signature directly in the sentence's semantic representation.
  • Empirical evaluation on the RealNews benchmark shows SWAN matches state-of-the-art detection performance on unaltered watermarked text, while significantly improving robustness against paraphrasing, increasing detection AUC by up to 13.9…
Open paper
Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction

Yao-Shun Chuang, Tushti Mody, Uday Pratap Singh, Shirindokht Shiraz, Chun-Teh Lee, Ryan Brandon · May 5, 2026

Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics Medicine
  • Using 1,200 annotated notes, we evaluated candidate open-weight models with multi-prompt ensemble inference and further adapted selected models using QLoRA-based supervised fine-tuning and direct preference optimization.
  • Model performance varied substantially, highlighting the need for task-specific evaluation rather than reliance on generic benchmarks.
Open paper
HATS: An Open data set Integrating Human Perception Applied to the Evaluation of Automatic Speech Recognition Metrics

Thibault Bañeras Roux, Jane Wottawa, Mickael Rouvier, Teva Merlin, Richard Dufour · Apr 30, 2026

Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics General
  • However, they remain system-oriented, even when transcripts are intended for humans.
  • In this paper, we firstly present Human Assessed Transcription Side-by-side (HATS), an original French manually annotated data set in terms of human perception of transcription errors produced by various ASR systems.
Open paper
Preference Heads in Large Language Models: A Mechanistic Framework for Interpretable Personalization

Weixu Zhang, Ye Yuan, Changjiang Han, Yuxing Tian, Zipeng Sun, Linfeng Du · Apr 24, 2026

Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics General
  • In this work, we adopt a mechanistic interpretability perspective and hypothesize the existence of a sparse set of Preference Heads, attention heads that encode user specific stylistic and topical preferences and exert a causal influence on…
  • We introduce Differential Preference Steering (DPS), a training free framework that (1) identifies Preference Heads through causal masking analysis and (2) leverages them for controllable and interpretable personalization at inference time.
Open paper
Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 62% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Long Horizon General
  • We introduce MoMo, a preference-conditioned contrastive planner allowing a scalar user preference to continuously modulate plan conservativeness at inference time, without retraining.
  • Across six environments, MoMo smoothly adapts plan safety according to user preferences, yielding improved temporal and preferential consistency over state augmentation baselines.
Open paper
Measuring and Mitigating the Distributional Gap Between Real and Simulated User Behaviors

Shuhaib Mehri, Philippe Laban, Sumuk Shashidhar, Marwa Abdulhai, Sergey Levine, Michel Galley · May 8, 2026

Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 62% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Simulation Env Coding
  • As user simulators are increasingly used for interactive training and evaluation of AI assistants, it is essential that they represent the diverse behaviors of real users.
  • In this work, we introduce a method to measure the distributional gap between real and simulated user behaviors, validated through a human study and ablations.
Open paper
Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 62% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Llm As Judge Law
  • We propose RLearner-LLM with Hybrid-DPO: an automated preference pipeline that fuses a DeBERTa-v3 NLI signal with a verifier LLM score, removing human annotation while overcoming the "alignment tax" of single-signal optimization.
  • Our Qwen3-8B RLearner-LLM wins 95% of pairwise comparisons against its own SFT baseline; GPT-4o-mini in turn wins 95% against our concise output -- alongside the 69% win the same judge gives a verbose SFT over our DPO model, this replicates…
Open paper
Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 62% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Simulation Env Coding
  • A model's ability to reliably process these sources is key to system safety.
  • Our findings reveal general patterns: most models rely more on document assertions than user assertions, and this preference is reinforced by post-training.
Open paper
Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 62% Moderate protocol signal Freshness: Hot Status: Fallback
Pairwise Preference General
  • Personalized LLMs can significantly enhance user experiences by tailoring responses to preferences such as helpfulness, conciseness, and humor.
  • In this paper, we introduce CLIPer(Classifier-guided Inference-time Personalization), a lightweight personalization approach that leverages a classifier model to steer LLM generation dynamically to different user preferences at inference…
Open paper
Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models

Huatian Zhang, Zhendong Mao, Lei Zhang, Yongdong Zhang · May 6, 2026

Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 58% Sparse protocol signal Freshness: Hot Status: Fallback
Pairwise Preference General
  • Direct Preference Optimization (DPO) has proven to be an effective solution for mitigating hallucination in Multimodal Large Language Models (MLLMs) by learning from preference pairs.
  • In this work, we propose an Uncertainty-aware Exploratory Direct Preference Optimization (UE-DPO) method for MLLMs, which enables the model to uncover its cognitive deficiencies and actively explore for self-correction, guided by…
Open paper
Graph-Augmented LLMs for Swiss MP Ideology Prediction

Yifei Yuan, Luis Salamanca, Sophia Schlosser, Laurence Brandenberger · May 6, 2026

Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 58% Sparse protocol signal Freshness: Hot Status: Fallback
Pairwise Preference General
  • Approximating the ideological position of Members of Parliament (MPs) is a fundamental task in political science, helping researchers understand legislative behavior, party alignment, and policy preferences.
Open paper
SSG: Logit-Balanced Vocabulary Partitioning for LLM Watermarking

Chenxi Gu, Xiaoning Du, John Grundy · Apr 24, 2026

Citations: 0

Match reason: Matches selected tags (Pairwise Preference).

Score: 58% Sparse protocol signal Freshness: Hot Status: Fallback
Pairwise Preference MathCoding
  • A crucial step in the KGW method is random vocabulary partitioning, which enables adjustments to token selection based on specific preferences.
Open paper

Protocol Hubs

Get Started

Join the #1 Platform for AI Training Talent

Where top AI builders and expert AI Trainers connect to build the future of AI.
Self-Service
Post a Job
Post your project and get a shortlist of qualified AI Trainers and Data Labelers. Hire and manage your team in the tools you already use.
Managed Service
For Large Projects
Done-for-You
We recruit, onboard, and manage a dedicated team inside your tools. End-to-end operations for large or complex projects.
For Freelancers
Join as an AI Trainer
Find AI training and data labeling projects across platforms, all in one place. One profile, one application process, more opportunities.