Skip to content

Researcher Tools

Human Feedback and Eval Paper Explorer

A focused feed for RLHF, preference data, rater protocols, agent evaluation, and LLM-as-judge research. Every paper includes structured metadata for quick triage.

Total papers: 117 Search mode: keyword Shortlist (0) RSS

Featured Papers

Popular high-signal papers with direct links to full protocol pages.

Browse by Topic

Jump directly into tag and hub pages to crawl deeper content clusters.

Popular Tags

Top Protocol Hubs

Weekly Eval Paper Digest

The top RLHF, evaluation, and human feedback papers — curated and summarized every Friday.

No spam. Unsubscribe anytime.

Start Here By Objective

Pick your immediate research objective and jump directly to high-signal pages, not generic search.

Scale Your Evaluation Team

Need human evaluators for your benchmark or preference study? OpenTrain sources pre-vetted domain experts into your annotation pipeline.

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

Truong Nguyen, Tien-Phat Nguyen, Linh Ngo Van, Duy Minh Ho Nguyen, Khoa D. Doan, Trung Le · May 12, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics General
  • Direct Preference Optimization (DPO) is a widely used RL-free method for aligning language models from pairwise preferences, but it models preferences over full sequences even though generation is driven by per-token decisions.
  • We introduce Token-level Bregman Preference Optimization (TBPO), which posits a token-level Bradley-Terry preference model over next-token actions conditioned on the prefix, and derive a Bregman-divergence density-ratio matching objective…
Open paper
How Value Induction Reshapes LLM Behaviour

Arnav Arora, Natalie Schluter, Katherine Metcalf, Maartje ter Hoeve · May 8, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics General
  • This is done to increase utility, ensure safety, and improve the experience of the people interacting with the model.
  • We fine-tune models using curated value subsets of existing preference datasets, measuring the impact of value induction on expression of other values, model safety, anthropomorphic language, and various QA benchmarks.
Open paper
TRACE: Tourism Recommendation with Accountable Citation Evidence

Zixu Zhao, Sijin Wang, Yu Hou, Yuanyuan Xu, Yufan Sheng, Xike Xie · May 8, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics General
  • Existing CRS benchmarks primarily evaluate systems with a single Recall@k score over entity mentions, and tourism-specific resources add spatial or knowledge-graph context, yet none of them couple multi-turn recommendation with verbatim…
  • This leaves an evaluation gap for tourism recommendation that is simultaneously trustworthy, verifiable, and adaptive: recommend the right point of interest (POI) for multi-aspect preferences (such as cuisine, price, atmosphere, walking…
Open paper
MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval

Chunyu Li, Jingyi Kang, Ding Chen, Mengyuan Zhang, Jiajun Shen, Bo Tang · May 7, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics General
  • In agent memory systems, the reranking model serves as the critical bridge connecting user queries with long-term memory.
  • On the memory retrieval benchmark, MemReranker-0.6B substantially outperforms BGE-Reranker and matches open-source 4B/8B models as well as GPT-4o-mini on key metrics.
Open paper
Misaligned by Reward: Socially Undesirable Preferences in LLMs

Gayane Ghazaryan, Esra Dönmez · May 6, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics General
  • Reward models are a key component of large language model alignment, serving as proxies for human preferences during training.
  • We introduce a framework that converts social evaluation datasets into pairwise preference data, leveraging gold labels where available and directional bias indicators otherwise.
Open paper
StoryAlign: Evaluating and Training Reward Models for Story Generation

Haotian Xia, Hao Peng, Yunjia Qi, Xiaozhi Wang, Bin Xu, Lei Hou · May 6, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics Coding
  • Although large language models (LLMs) have significantly advanced text generation, stories generated by LLMs still diverge from human-authored works regarding complex narrative structure and human-aligned preferences.
  • We find existing reward models struggle to select human-preferred stories, with the best model achieving only 66.3\% accuracy.
Open paper
SWAN: Semantic Watermarking with Abstract Meaning Representation

Ziping Ye, Gourab Dey, Christos Christodoulopoulos, Charith Peris, Anil Ramakrishna, Weitong Ruan · May 5, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics General
  • In contrast to existing watermarking methods, which typically encode signatures by adjusting token selection preferences during text generation, SWAN embeds the signature directly in the sentence's semantic representation.
  • Empirical evaluation on the RealNews benchmark shows SWAN matches state-of-the-art detection performance on unaltered watermarked text, while significantly improving robustness against paraphrasing, increasing detection AUC by up to 13.9…
Open paper
Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction

Yao-Shun Chuang, Tushti Mody, Uday Pratap Singh, Shirindokht Shiraz, Chun-Teh Lee, Ryan Brandon · May 5, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics Medicine
  • Using 1,200 annotated notes, we evaluated candidate open-weight models with multi-prompt ensemble inference and further adapted selected models using QLoRA-based supervised fine-tuning and direct preference optimization.
  • Model performance varied substantially, highlighting the need for task-specific evaluation rather than reliance on generic benchmarks.
Open paper
HATS: An Open data set Integrating Human Perception Applied to the Evaluation of Automatic Speech Recognition Metrics

Thibault Bañeras Roux, Jane Wottawa, Mickael Rouvier, Teva Merlin, Richard Dufour · Apr 30, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics General
  • However, they remain system-oriented, even when transcripts are intended for humans.
  • In this paper, we firstly present Human Assessed Transcription Side-by-side (HATS), an original French manually annotated data set in terms of human perception of transcription errors produced by various ASR systems.
Open paper
Preference Heads in Large Language Models: A Mechanistic Framework for Interpretable Personalization

Weixu Zhang, Ye Yuan, Changjiang Han, Yuxing Tian, Zipeng Sun, Linfeng Du · Apr 24, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics General
  • In this work, we adopt a mechanistic interpretability perspective and hypothesize the existence of a sparse set of Preference Heads, attention heads that encode user specific stylistic and topical preferences and exert a causal influence on…
  • We introduce Differential Preference Steering (DPS), a training free framework that (1) identifies Preference Heads through causal masking analysis and (2) leverages them for controllable and interpretable personalization at inference time.
Open paper
Behavioral Canaries: Auditing Private Retrieved Context Usage in RL Fine-Tuning

Chaoran Chen, Dayu Yuan, Peter Kairouz · Apr 24, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 65% Moderate protocol signal Freshness: Hot Status: Ready
Pairwise Preference Automatic Metrics Law
  • In agentic workflows, LLMs frequently process retrieved contexts that are legally protected from further training.
  • The framework instruments preference data by pairing document triggers with feedback that rewards a distinctive stylistic response, inducing a latent trigger-conditioned preference if such data are used in training.
Open paper
HyperMem: Hypergraph Memory for Long-Term Conversations

Juwei Yue, Chuanrui Hu, Jiawei Sheng, Zuyi Zhou, Wenyuan Zhang, Tingwen Liu · Apr 9, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 58% Moderate protocol signal Freshness: Warm Status: Ready
Pairwise Preference Llm As JudgeAutomatic Metrics General
  • Long-term memory is essential for conversational agents to maintain coherence, track persistent tasks, and provide personalized interactions across extended dialogues.
  • Experiments on the LoCoMo benchmark show that HyperMem achieves state-of-the-art performance with 92.73% LLM-as-a-judge accuracy, demonstrating the effectiveness of HyperMem for long-term conversations.
Open paper
Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization

Qiyao Ma, Dechen Gao, Rui Cai, Boqi Zhao, Hanchu Zhou, Junshan Zhang · Apr 8, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 58% High protocol signal Freshness: Warm Status: Ready
Pairwise PreferenceRubric Rating Human EvalAutomatic Metrics General
  • Pluralistic alignment has emerged as a critical frontier in the development of Large Language Models (LLMs), with reward models (RMs) serving as a central mechanism for capturing diverse human values.
  • To bridge this gap, we introduce Personalized RewardBench, a novel benchmark designed to rigorously assess reward models' capacity to model personalized preferences.
Open paper
MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control

Yuchi Wang, Haiyang Yu, Weikang Bian, Jiefeng Long, Xiao Liang, Chao Feng · Apr 7, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 58% Moderate protocol signal Freshness: Warm Status: Ready
Pairwise Preference Automatic Metrics General
  • Experiments on the MMEB-V2 benchmark demonstrate that our model achieves a score of 71.2 with only 4B parameters, establishing a new state-of-the-art while significantly reducing reasoning overhead and inference latency.
Open paper
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

Weian Mao, Xi Lin, Wei Huang, Yuxin Xie, Tianfu Fu, Bohan Zhuang · Apr 6, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 58% Moderate protocol signal Freshness: Warm Status: Ready
Pairwise Preference Automatic Metrics Law
  • Via the trigonometric series, we use the distance preference characterized by these centers to score keys according to their positions, and also leverage Q/K norms as an additional signal for importance estimation.
Open paper
Blinded Radiologist and LLM-Based Evaluation of LLM-Generated Japanese Translations of Chest CT Reports: Comparative Study

Yosuke Yamagishi, Atsushi Takamatsu, Yasunori Hamaguchi, Tomohiro Kikuchi, Shouhei Hanaoka, Takeharu Yoshikawa · Apr 2, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 58% Moderate protocol signal Freshness: Warm Status: Ready
Pairwise Preference Llm As JudgeAutomatic Metrics MedicineMultilingual
  • A board-certified radiologist and a radiology resident independently performed blinded pairwise evaluations across 4 criteria: terminology accuracy, readability, overall quality, and radiologist-style authenticity.
  • Radiologist 2 rated readability as equivalent in 75% of cases and favored the human-edited translation for overall quality (40% vs 21%).
Open paper
Optimizing RAG Rerankers with LLM Feedback via Reinforcement Learning

Yuhang Wu, Xiangqing Shen, Fanfan Wang, Cangqi Zhou, Zhen Wu, Xinyu Dai · Apr 2, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 58% Moderate protocol signal Freshness: Warm Status: Ready
Pairwise Preference Automatic Metrics General
  • However, current reranking models are typically optimized on static human annotated relevance labels in isolation, decoupled from the downstream generation process.
  • To bridge this gap, we introduce ReRanking Preference Optimization (RRPO), a reinforcement learning framework that directly aligns reranking with the LLM's generation quality.
Open paper
PLOT: Enhancing Preference Learning via Optimal Transport

Liang Zhu, Yuelin Bai, Xiankun Ren, Jiaxi Yang, Lei Zhang, Feiteng Fang · Apr 2, 2026

Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 58% Moderate protocol signal Freshness: Warm Status: Ready
Pairwise Preference Automatic Metrics General
  • Preference learning in Large Language Models (LLMs) has advanced significantly, yet existing methods remain limited by modest performance gains, high computational costs, hyperparameter sensitivity, and insufficient modeling of global…
  • We introduce PLOT, which enhances Preference Learning in fine-tuning-based alignment through a token-level loss derived from Optimal Transport.
Open paper
Citations: 0

Match reason: Matches selected tags (Automatic Metrics, Pairwise Preference).

Score: 58% Moderate protocol signal Freshness: Warm Status: Ready
Pairwise Preference Automatic Metrics General
  • By contrast, zero-shot chain-of-thought on the base Gemma3-1B harms accuracy relative to direct answers, and preference optimization with a simple format+accuracy reward underperforms supervised reasoning.
  • To probe the latter, we introduce GSMClaims and a domain-specialized variant, ThinknCheck-Science, which improves across benchmarks, including 61.0\% accuracy on GSMClaims.
Open paper

Protocol Hubs

Get Started

Join the #1 Platform for AI Training Talent

Where top AI builders and expert AI Trainers connect to build the future of AI.
Self-Service
Post a Job
Post your project and get a shortlist of qualified AI Trainers and Data Labelers. Hire and manage your team in the tools you already use.
Managed Service
For Large Projects
Done-for-You
We recruit, onboard, and manage a dedicated team inside your tools. End-to-end operations for large or complex projects.
For Freelancers
Join as an AI Trainer
Find AI training and data labeling projects across platforms, all in one place. One profile, one application process, more opportunities.