Researcher Tools

Human Feedback and Eval Paper Explorer

A focused feed for RLHF, preference data, rater protocols, agent evaluation, and LLM-as-judge research. Every paper includes structured metadata for quick triage.

Total papers: 9 Search mode: keyword Shortlist (0) RSS

Filter by tag

All Automatic Metrics (1,818) General (570) Long Horizon (352) Pairwise Preference (308) Coding (237) Multi Agent (204) Simulation Env (203) Medicine (124) Llm As Judge (116) Expert Verification (104) Human Eval (95) Rubric Rating (89) Math (87) Web Browsing (84) Demonstrations (73) Red Team (69)

Featured Papers

Popular high-signal papers with direct links to full protocol pages.

Implicit Representations of Grammaticality in Language Models
May 6, 2026 · Citations: 0

Grammaticality and likelihood are distinct notions in human language.
MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge
May 6, 2026 · Citations: 0

Background: Existing MRI LLM benchmarks rely mainly on review-book multiple-choice questions, where top proprietary models already score highly, limiting discrimination.
The First Token Knows: Single-Decode Confidence for Hallucination Detection
May 6, 2026 · Citations: 0

Across three 7-8B instruction-tuned models and two benchmarks, phi_first achieves a mean AUROC of 0.820, compared with 0.793 for semantic agreement and 0.791 for standard surface-form self-consistency.
PSK at SemEval-2026 Task 9: Multilingual Polarization Detection Using Ensemble Gemma Models with Synthetic Data Augmentation
May 6, 2026 · Citations: 0

Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
Beyond Semantics: An Evidential Reasoning-Aware Multi-View Learning Framework for Trustworthy Mental Health Prediction
May 6, 2026 · Citations: 0

Benchmarking on three real-world datasets, Dreaddit, SDCNL, and DepSeverity, reports accuracies of 0.835, 0.731, and 0.751, respectively, demonstrating its potential for reliable mental health prediction.
Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement
May 6, 2026 · Citations: 0

Concept Fields provide a fast, lightweight, and interpretable signal for groundedness and novelty, complementary to LLM-as-judge and white-box detectors.
Continual Knowledge Updating in LLM Systems: Learning Through Multi-Timescale Memory Dynamics
May 6, 2026 · Citations: 0

Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
Automatically Finding and Validating Unexpected Side-Effects of Interventions on Language Models
May 6, 2026 · Citations: 0

We present an automated, contrastive evaluation pipeline for auditing the behavioral impact of interventions on large language models.
The Pinocchio Dimension: Phenomenality of Experience as the Primary Axis of LLM Psychometric Differences
May 6, 2026 · Citations: 0

To test this hypothesis at the item level, we introduce the Pinocchio score (π_i), the ratio of inter-model response variance under neutral prompting to that under a human-simulation prompt, as an annotation-free measure of each item's…
The Impossibility Triangle of Long-Context Modeling
May 6, 2026 · Citations: 0

Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise
May 6, 2026 · Citations: 0

Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals
May 6, 2026 · Citations: 0

Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.

Browse by Topic

Jump directly into tag and hub pages to crawl deeper content clusters.

Top Protocol Hubs

Start Here By Objective

Pick your immediate research objective and jump directly to high-signal pages, not generic search.

Benchmark Selection

Find papers with explicit benchmark anchors and comparable metric reporting.

Rater Protocol Design

Compare pairwise, rubric, and expert-verification setups before drafting your protocol.

LLM-as-Judge Setup

Start with established judge pipelines and then compare with human-eval references.

Scale Your Evaluation Team

Need human evaluators for your benchmark or preference study? OpenTrain sources pre-vetted domain experts into your annotation pipeline.

See How It Works →

No exact ID match for "2203.02155". Showing results for "Training language models to follow instructions with human feedback" instead.

Merge and Conquer: Instructing Multilingual Models by Adding Target Language Weights

Eneko Valero, Maria Ribalta i Albado, Oscar Sainz, Naiara Perez, German Rigau · Mar 30, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 32% Sparse protocol signal Freshness: Warm Status: Ready

Multilingual

Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.

Open paper

Imperative Interference: Social Register Shapes Instruction Topology in Large Language Models

Tony Mason · Mar 26, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 28% Sparse protocol signal Freshness: Warm Status: Ready

Multilingual

Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.

Open paper

TiCo: Time-Controllable Training for Spoken Dialogue Models

Kai-Wei Chang, Wei-Chih Chen, En-Pei Hu, Hung-yi Lee, James Glass · Mar 23, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 28% Sparse protocol signal Freshness: Warm Status: Ready

General

This capability is valuable for real-world spoken language systems such as voice assistants and interactive agents, where controlling response duration can improve interaction quality.
Through an empirical evaluation of both open-source and commercial SDMs, we show that they frequently fail to satisfy such time-control requirements.

Open paper

F-Actor: Controllable Conversational Behaviour in Full-Duplex Models

Maike Züfle, Ondrej Klejch, Nicholas Sanders, Jan Niehues, Alexandra Birch, Tsz Kin Lam · Jan 16, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 28% Sparse protocol signal Freshness: Warm Status: Ready

Coding

Spoken conversational systems require more than accurate speech generation to have human-like conversations: to feel natural and engaging, they must produce conversational behaviour that adapts dynamically to the context.

Open paper

Reasoning Up the Instruction Ladder for Controllable Language Models

Zishuo Zheng, Vidhisha Balachandran, Chan Young Park, Faeze Brahman, Sachin Kumar · Oct 30, 2025

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 33% Moderate protocol signal Freshness: Cold Status: Ready

Red Team Automatic Metrics General

Our finetuned models achieve consistent improvements on instruction following and instruction hierarchy benchmarks, achieving roughly a 20% improvement on the IHEval conflict setup.
By treating safety issues as resolving conflicts between adversarial user inputs and predefined higher-priority policies, our trained model enhances robustness against jailbreak and prompt injection attacks, providing up to a 20% reduction…

Open paper

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing

Keming Wu, Sicong Jiang, Max Ku, Ping Nie, Minghao Liu, Wenhu Chen · Sep 30, 2025

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 33% Moderate protocol signal Freshness: Cold Status: Fallback

Pairwise Preference Llm As Judge General

To address this critical bottleneck, we built EditReward, trained with our new large-scale human preference dataset, meticulously annotated by trained experts following a rigorous protocol containing over 200K preference pairs.
EditReward demonstrates superior alignment with human preferences in instruction-guided image editing tasks.

Open paper

InstructPro: Natural Language Guided Ligand-Binding Protein Design

Zhenqiao Song, Ramith Hettiarachchi, Chuan Li, Jianwen Xie, Lei Li · Jun 11, 2025

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 26% Sparse protocol signal Freshness: Cold Status: Ready

General

To enable training and evaluation, we develop InstructProBench, a large-scale dataset of 9.6 million (function description, ligand, protein) triples.

Open paper

Training with Pseudo-Code for Instruction Following

Prince Kumar, Rudra Murthy, Riyaz Bhat, Danish Contractor · May 23, 2025

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 26% Sparse protocol signal Freshness: Cold Status: Fallback

Demonstrations MathCoding

We evaluate our method on 12 publicly available benchmarks spanning instruction-following, mathematical reasoning, and commonsense reasoning, across six base models.
Our results show that models trained with pseudo-code follow instructions more reliably, achieving relative gains of 8-21\% on instruction following benchmarks, while largely preserving and in some cases improving performance on…

Open paper

Instruction Following by Principled Boosting Attention of Large Language Models

Vitoria Guardieiro, Avishree Khare, Adam Stein, Eric Wong · Jun 16, 2025

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 23% Sparse protocol signal Freshness: Cold Status: Ready

General

Yet in practice these constraints can be violated under long contexts or when user-provided context conflicts with them, creating reliability and safety risks.

Open paper

Protocol Hubs

Benchmark Hubs

Metric Hubs

Daily Archives

Get Started

Join the #1 Platform for AI Training Talent

Where top AI builders and expert AI Trainers connect to build the future of AI.

Self-Service

Post a Job

Post your project and get a shortlist of qualified AI Trainers and Data Labelers. Hire and manage your team in the tools you already use.

Create Account & Post a Job

Managed Service

For Large Projects

Done-for-You

We recruit, onboard, and manage a dedicated team inside your tools. End-to-end operations for large or complex projects.

Learn About Managed Service

For Freelancers

Join as an AI Trainer

Find AI training and data labeling projects across platforms, all in one place. One profile, one application process, more opportunities.

Join Now

Human Feedback and Eval Paper Explorer

Filter by tag

Featured Papers

Browse by Topic

Popular Tags

Top Protocol Hubs

Weekly Eval Paper Digest

Start Here By Objective

Benchmark Selection

Rater Protocol Design

LLM-as-Judge Setup

Scale Your Evaluation Team

Protocol Hubs

Benchmark Hubs

Metric Hubs

Daily Archives

Join the #1 Platform for AI Training Talent