Researcher Tools

Human Feedback and Eval Paper Explorer

A focused feed for RLHF, preference data, rater protocols, agent evaluation, and LLM-as-judge research. Every paper includes structured metadata for quick triage.

Total papers: 1 Search mode: keyword Shortlist (0) RSS

Filter by tag

All Automatic Metrics (2,174) General (669) Long Horizon (424) Pairwise Preference (365) Coding (287) Simulation Env (248) Multi Agent (228) Medicine (143) Llm As Judge (134) Expert Verification (117) Human Eval (107) Math (107) Rubric Rating (102) Web Browsing (98) Tool Use (94) Red Team (85)

Featured Papers

Popular high-signal papers with direct links to full protocol pages.

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents
Jun 18, 2026 · Citations: 0

Policy-adherent tool-calling agents in customer-service domains must maintain task states across turns while calling tools and obeying domain policies.
StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs
Jun 18, 2026 · Citations: 0

Multimodal large language models (MLLMs) are increasingly deployed in personally and societally consequential settings, yet the visual cues that shape how these models judge people remain poorly understood.
Beyond Global Replanning: Hierarchical Recovery for Cross-Device Agent Systems
Jun 18, 2026 · Citations: 0

We propose H-RePlan, a hierarchical replanning framework for multi-device agents with unified API--CLI--GUI execution.
Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users
Jun 18, 2026 · Citations: 0

To align a Large Language Model (LLM), most existing methods collect explicit human feedback and train a reward model to predict the human preference based on the response text.
Scalable Training of Spatially Grounded 2D Vision-Language Models for Radiology
Jun 18, 2026 · Citations: 0

On external VQA benchmarks (Slake, VQA-RAD), RadGrounder achieves competitive results with specialized medical VLMs.
CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges
Jun 18, 2026 · Citations: 0

While LLMs represent a scalable solution for assisting humans in the generation of counterspeech for both threats, zero-shot models frequently generate repetitive and vague responses, underscoring the need for high-quality examples to steer…
Token-Operations-Oriented Inference Optimization Techniques for Large Models
Jun 18, 2026 · Citations: 0

Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
PsyScore: A Psychometrically-Aware Framework for Trait-Adaptive Essay Scoring and ZPD-Scaffolded Feedback
Jun 18, 2026 · Citations: 0

PsyScore comprises three key modules: a Trait-Adaptive Neural IRT Scorer that incorporates the Graded Partial Credit Model (GPCM) into a neural architecture, enabling the precise estimation of student ability while maintaining psychometric…
The Register Gap: A Meaning Intelligence Framework for Nigerian Public Discourse
Jun 18, 2026 · Citations: 0

We introduce the Meaning Intelligence Framework (MIF), a nine-dimension annotation and evaluation schema for Nigerian public discourse that separates surface sentiment from true communicative intent.
Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model Families
Jun 18, 2026 · Citations: 0

Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
CzechDocs: A Multiway Parallel Dataset of Formatted Documents for Minority Languages in Czechia
Jun 18, 2026 · Citations: 0

The dataset is designed to support the evaluation of machine translation systems that aim to preserve document formatting during translation.
Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact
Jun 18, 2026 · Citations: 0

Psychological instruments designed for humans are increasingly used to assign large language models (LLMs) stable psychological profiles that affect their usability, safety assessment, and use as proxies for human participants in research.

Browse by Topic

Jump directly into tag and hub pages to crawl deeper content clusters.

Top Protocol Hubs

Start Here By Objective

Pick your immediate research objective and jump directly to high-signal pages, not generic search.

Benchmark Selection

Find papers with explicit benchmark anchors and comparable metric reporting.

Rater Protocol Design

Compare pairwise, rubric, and expert-verification setups before drafting your protocol.

LLM-as-Judge Setup

Start with established judge pipelines and then compare with human-eval references.

Scale Your Evaluation Team

Need human evaluators for your benchmark or preference study? OpenTrain sources pre-vetted domain experts into your annotation pipeline.

See How It Works →

Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences

Julian Minder, Clément Dumas, Stewart Slocum, Helena Casademunt, Cameron Holmes, Robert West · Oct 14, 2025

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 23% Sparse protocol signal Freshness: Cold Status: Ready

General

We demonstrate that these analyses contain crucial information by creating an LLM-based interpretability agent to understand the finetuning domain.
With access to the bias, the agent performs significantly better compared to baseline agents using simple prompting.

Open paper

Protocol Hubs

Human Feedback and Eval Paper Explorer

Filter by tag

Featured Papers

Browse by Topic