Researcher Tools

Human Feedback and Eval Paper Explorer

A focused feed for RLHF, preference data, rater protocols, agent evaluation, and LLM-as-judge research. Every paper includes structured metadata for quick triage.

Total papers: 438 Search mode: keyword Shortlist (0) RSS

Filter by tag

All Automatic Metrics (1,610) General (530) Long Horizon (319) Pairwise Preference (287) Coding (216) Simulation Env (186) Multi Agent (182) Medicine (115) Llm As Judge (106) Expert Verification (97) Human Eval (89) Rubric Rating (82) Web Browsing (79) Math (77) Demonstrations (67) Critique Edit (63)

Browse by Topic

Jump directly into tag and hub pages to crawl deeper content clusters.

Top Protocol Hubs

Start Here By Objective

Pick your immediate research objective and jump directly to high-signal pages, not generic search.

Benchmark Selection

Find papers with explicit benchmark anchors and comparable metric reporting.

Rater Protocol Design

Compare pairwise, rubric, and expert-verification setups before drafting your protocol.

LLM-as-Judge Setup

Start with established judge pipelines and then compare with human-eval references.

Scale Your Evaluation Team

Need human evaluators for your benchmark or preference study? OpenTrain sources pre-vetted domain experts into your annotation pipeline.

See How It Works →

When AI Meets Early Childhood Education: Large Language Models as Assessment Teammates in Chinese Preschools

Xingming Li, Runke Huang, Yanan Bao, Yuye Jin, Yuru Jiao, Qingyong Hu · Mar 25, 2026

Citations: 0

Match reason: Keyword overlap 1/1 across title and protocol fields.

Score: 90% High protocol signal Freshness: Hot Status: Ready

Rubric Rating Automatic Metrics General

In this paper, we investigate whether AI can serve as a scalable assessment teammate by extracting structured quality indicators and validating their alignment with human expert judgments.
Our contributions include: (1) TEPE-TCI-370h (Tracing Effective Preschool Education), the first large-scale dataset of naturalistic teacher-child interactions in Chinese preschools (370 hours, 105 classrooms) with standardized ECQRS-EC and…

Open paper

System-Anchored Knee Estimation for Low-Cost Context Window Selection in PDE Forecasting

Wenshuo Wang, Fan Zhang · Mar 26, 2026

Citations: 0

Match reason: Title directly matches "cost".

Score: 80% Sparse protocol signal Freshness: Hot Status: Ready

Open paper

The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence

Biplab Pal, Santanu Bhattacharya · Mar 25, 2026

Citations: 0

Match reason: Title directly matches "cost".

Score: 80% Sparse protocol signal Freshness: Hot Status: Ready

Open paper

Cost-Sensitive Neighborhood Aggregation for Heterophilous Graphs: When Does Per-Edge Routing Help?

Eyal Weiss · Mar 25, 2026

Citations: 0

Match reason: Title directly matches "cost".

Score: 80% Sparse protocol signal Freshness: Hot Status: Ready

Open paper

FEAST: Fully Connected Expressive Attention for Spatial Transcriptomics

Taejin Jeong, Joohyeok Kim, Jinyeong Kim, Chanyoung Kim, Seong Jae Hwang · Mar 26, 2026

Citations: 0

Match reason: Keyword overlap 1/1 across title and protocol fields.

Score: 87% Moderate protocol signal Freshness: Hot Status: Fallback

Pairwise Preference Coding

Open paper

Prompt Attack Detection with LLM-as-a-Judge and Mixture-of-Models

Hieu Xuan Le, Benjamin Goh, Quy Anh Tang · Mar 26, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 45% Moderate protocol signal Freshness: Hot Status: Ready

Red Team Llm As Judge General

In production, guardrails must mitigate these attacks under strict low-latency constraints, resulting in a deployment gap in which lightweight classifiers and rule-based systems struggle to generalize under distribution shift, while…
In this work, we examine whether lightweight, general-purpose LLMs can reliably serve as security judges under real-world production constraints.

Open paper

OneSearch-V2: The Latent Reasoning Enhanced Self-distillation Generative Search Framework

Ben Chen, Siyuan Wang, Yufei Ma, Zihan Liang, Xuxin Zhang, Yue Lv · Mar 25, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 45% Moderate protocol signal Freshness: Hot Status: Ready

Pairwise Preference Automatic Metrics General

However, its inadequate understanding of complex queries, inefficient exploitation of latent user intents, and overfitting to narrow historical preferences have limited its further performance improvement.
It contains three key innovations: (1) a thought-augmented complex query understanding module, which enables deep query understanding and overcomes the shallow semantic matching limitations of direct inference; (2) a reasoning-internalized…

Open paper

Lightweight GenAI for Network Traffic Synthesis: Fidelity, Augmentation, and Classification

Giampaolo Bovenzi, Domenico Ciuonzo, Jonatan Krolikowski, Antonio Montieri, Alfredo Nascita, Antonio Pescapè · Mar 26, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 35% Sparse protocol signal Freshness: Hot Status: Ready

Open paper

Retraining as Approximate Bayesian Inference

Harrison Katz · Mar 26, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 35% Sparse protocol signal Freshness: Hot Status: Ready

Open paper

Modernising Reinforcement Learning-Based Navigation for Embodied Semantic Scene Graph Generation

Roman Kueble, Marco Hueller, Mrunmai Phatak, Rainer Lienhart, Joerg Haehner · Mar 26, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 35% Sparse protocol signal Freshness: Hot Status: Ready

Open paper

Decidable By Construction: Design-Time Verification for Trustworthy AI

Houston Haynes · Mar 26, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 35% Sparse protocol signal Freshness: Hot Status: Ready

Open paper

Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models

Chengyu Fang, Heng Guo, Zheng Jiang, Chunming He, Xiu Li, Minfeng Xu · Mar 26, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 35% Sparse protocol signal Freshness: Hot Status: Ready

Open paper

FD$^2$: A Dedicated Framework for Fine-Grained Dataset Distillation

Hongxu Ma, Guang Li, Shijie Wang, Dongzhan Zhou, Baoli Sun, Takahiro Ogawa · Mar 26, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 35% Sparse protocol signal Freshness: Hot Status: Ready

Open paper

MoireMix: A Formula-Based Data Augmentation for Improving Image Classification Robustness

Yuto Matsuo, Yoshihiro Fukuhara, Yuki M. Asano, Rintaro Yanagi, Hirokatsu Kataoka, Akio Nakamura · Mar 26, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 35% Sparse protocol signal Freshness: Hot Status: Ready

Open paper

The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Creates Exploitable Vulnerabilities

Ron Litvak · Mar 26, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 35% Sparse protocol signal Freshness: Hot Status: Ready

Open paper

Mechanistically Interpreting Compression in Vision-Language Models

Veeraraju Elluru, Arth Singh, Roberto Aguero, Ajay Agarwal, Debojyoti Das, Hreetam Paul · Mar 26, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 35% Sparse protocol signal Freshness: Hot Status: Ready

Open paper

Subject-Specific Low-Field MRI Synthesis via a Neural Operator

Ziqi Gao, Nicha Dvornek, Xiaoran Zhang, Gigi Galiana, Hemant Tagare, Todd Constable · Mar 26, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 35% Sparse protocol signal Freshness: Hot Status: Ready

Open paper

Evaluating Chunking Strategies For Retrieval-Augmented Generation in Oil and Gas Enterprise Documents

Samuel Taiwo, Mohd Amaluddin Yusoff · Mar 25, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 35% Sparse protocol signal Freshness: Hot Status: Ready

Open paper

From Liar Paradox to Incongruent Sets: A Normal Form for Self-Reference

Shalender Singh, Vishnu Priya Singh Parmar · Mar 25, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 35% Sparse protocol signal Freshness: Hot Status: Ready

Open paper

Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

Cheng Cui, Ting Sun, Suyin Liang, Tingquan Gao, Zelun Zhang, Jiaxuan Liu · Mar 25, 2026

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 35% Sparse protocol signal Freshness: Hot Status: Ready

Open paper

Protocol Hubs

Benchmark Hubs

Metric Hubs

Daily Archives

Get Started

Join the #1 Platform for AI Training Talent

Where top AI builders and expert AI Trainers connect to build the future of AI.

Self-Service

Post a Job

Post your project and get a shortlist of qualified AI Trainers and Data Labelers. Hire and manage your team in the tools you already use.

Create Account & Post a Job

Managed Service

For Large Projects

Done-for-You

We recruit, onboard, and manage a dedicated team inside your tools. End-to-end operations for large or complex projects.

Learn About Managed Service

For Freelancers

Join as an AI Trainer

Find AI training and data labeling projects across platforms, all in one place. One profile, one application process, more opportunities.

Join Now

Human Feedback and Eval Paper Explorer

Filter by tag

Featured Papers

Browse by Topic

Popular Tags

Top Protocol Hubs

Start Here By Objective

Benchmark Selection

Rater Protocol Design

LLM-as-Judge Setup

Scale Your Evaluation Team

Protocol Hubs

Benchmark Hubs

Metric Hubs

Daily Archives

Join the #1 Platform for AI Training Talent

Human Feedback and Eval Paper Explorer

Filter by tag

Featured Papers

Browse by Topic

Popular Tags

Top Protocol Hubs

Weekly Eval Paper Digest

Start Here By Objective

Benchmark Selection

Rater Protocol Design

LLM-as-Judge Setup

Scale Your Evaluation Team

Protocol Hubs

Benchmark Hubs

Metric Hubs

Daily Archives

Join the #1 Platform for AI Training Talent