Skip to content

Researcher Tools

Human Feedback and Eval Paper Explorer

A focused feed for RLHF, preference data, rater protocols, agent evaluation, and LLM-as-judge research. Every paper includes structured metadata for quick triage.

Total papers: 151 Search mode: keyword Shortlist (0) RSS

Featured Papers

Popular high-signal papers with direct links to full protocol pages.

Browse by Topic

Jump directly into tag and hub pages to crawl deeper content clusters.

Popular Tags

Top Protocol Hubs

Weekly Eval Paper Digest

The top RLHF, evaluation, and human feedback papers — curated and summarized every Friday.

No spam. Unsubscribe anytime.

Start Here By Objective

Pick your immediate research objective and jump directly to high-signal pages, not generic search.

Scale Your Evaluation Team

Need human evaluators for your benchmark or preference study? OpenTrain sources pre-vetted domain experts into your annotation pipeline.

ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models

Jewon Lee, Wooksu Shin, Seungmin Yang, Ki-Ung Song, DongUk Lim, Jaeyeon Kim · Sep 26, 2025

Citations: 0

Match reason: Keyword overlap 2/2 across title and protocol fields.

Score: 75% Moderate protocol signal Freshness: Cold Status: Ready
Automatic Metrics Coding
  • For instance, ERGO surpasses Qwen2.5-VL-7B on the V* benchmark by 4.7 points while using only 23% of the vision tokens, achieving a 3x inference speedup.
Open paper
Prior-based Noisy Text Data Filtering: Fast and Strong Alternative For Perplexity

Yeongbin Seo, Gayoung Kim, Jaehyung Kim, Jinyoung Yeo · Sep 23, 2025

Citations: 0

Match reason: Keyword overlap 2/2 across title and protocol fields.

Score: 75% Moderate protocol signal Freshness: Cold Status: Ready
Automatic Metrics MathCoding
  • Despite its simplicity, the prior-based filter achieves the highest average performance across 20 downstream benchmarks, while reducing time cost by over 1000x compared to PPL-based filtering.
Open paper
Complexity-aware fine-tuning

Andrey Goncharov, Daniil Vyazhev, Petr Sychev, Edvard Khalafyan, Alexey Zaytsev · Jun 26, 2025

Citations: 0

Match reason: Keyword overlap 2/2 across title and protocol fields.

Score: 75% Moderate protocol signal Freshness: Cold Status: Ready
Automatic Metrics General
  • Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
Open paper
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

Xixi Wu, Kuan Li, Yida Zhao, Liwen Zhang, Litu Ou, Huifeng Yin · Sep 16, 2025

Citations: 0

Match reason: Keyword overlap 2/2 across title and protocol fields.

Score: 71% Sparse protocol signal Freshness: Cold Status: Ready
Tool Use General
  • Large Language Model (LLM)-based web agents excel at knowledge-intensive tasks but face a fundamental conflict between the need for extensive exploration and the constraints of limited context windows.
  • Current solutions typically rely on architectural modifications, e.g., internal memory tokens, which break compatibility with pre-existing agents and necessitate costly end-to-end retraining.
Open paper
Token Distillation: Attention-aware Input Embeddings For New Tokens

Konstantin Dobler, Desmond Elliott, Gerard de Melo · May 26, 2025

Citations: 0

Match reason: Keyword overlap 2/2 across title and protocol fields.

Score: 71% Sparse protocol signal Freshness: Cold Status: Ready
General
  • Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
Open paper
Dynamic Token Reweighting for Robust Vision-Language Models

Tanqiu Jiang, Jiacheng Liang, Rongyi Zhu, Jiawei Zhou, Fenglong Ma, Ting Wang · May 22, 2025

Citations: 0

Match reason: Keyword overlap 2/2 across title and protocol fields.

Score: 71% Sparse protocol signal Freshness: Cold Status: Fallback
Red Team Coding
  • Large vision-language models (VLMs) are highly vulnerable to multimodal jailbreak attacks that exploit visual-textual interactions to bypass safety guardrails.
  • Rather than relying on curated safety-specific data or costly image-to-text conversion, we introduce a new formulation of the safety-relevant distributional shift induced by the visual modality.
Open paper
EpidemIQs: Prompt-to-Paper LLM Agents for Epidemic Modeling and Analysis

Mohammad Hossein Samaei, Faryad Darabi Sahneh, Lee W. Cohnstaedt, Caterina Scoglio · Sep 24, 2025

Citations: 0

Match reason: Keyword overlap 1/2 across title and protocol fields.

Score: 56% High protocol signal Freshness: Cold Status: Ready
Expert Verification Llm As JudgeSimulation Env Multi Agent General
  • We introduce EpidemIQs, a novel multi-agent LLM framework that integrates user inputs and autonomously conducts literature review, analytical derivation, network modeling, mechanistic modeling, stochastic simulations, data visualization and
  • We introduce two types of agents: a scientist agent for planning, coordination, reflection, and generation of final results, and a task-expert agent to focus exclusively on one specific duty serving as a tool to the scientist agent.
Open paper
Diffusion Language Models Know the Answer Before Decoding

Pengxiang Li, Yefan Zhou, Dilxat Muhtar, Lu Yin, Shilin Yan, Li Shen · Aug 27, 2025

Citations: 0

Match reason: Keyword overlap 1/2 across title and protocol fields.

Score: 56% Moderate protocol signal Freshness: Cold Status: Ready
Automatic Metrics MathCoding
  • Empirical evaluations of LLaDA-8B and Dream-7B across multiple tasks show that Prophet reduces the number of decoding steps by up to 3.4x while preserving high generation quality.
Open paper
FrugalPrompt: Reducing Contextual Overhead in Large Language Models via Token Attribution

Syed Rifat Raiyan, Md Farhan Ishmam, Abdullah Al Imran, Mohammad Ali Moni · Oct 18, 2025

Citations: 0

Match reason: Keyword overlap 1/2 across title and protocol fields.

Score: 52% Moderate protocol signal Freshness: Cold Status: Ready
Automatic Metrics General
  • Human communication heavily relies on laconism and inferential pragmatics, allowing listeners to successfully reconstruct rich meaning from sparse, telegraphic speech.
Open paper
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration

Zhicheng Yang, Zhijiang Guo, Yinya Huang, Yongxin Wang, Dongchun Xie, Hanhui Li · Aug 19, 2025

Citations: 0

Match reason: Keyword overlap 1/2 across title and protocol fields.

Score: 52% Moderate protocol signal Freshness: Cold Status: Ready
Automatic Metrics General
  • Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
Open paper
Decoding Translation-Related Functional Sequences in 5'UTRs Using Interpretable Deep Learning Models

Yuxi Lin, Yaxue Fang, Zehong Zhang, Zhouwu Liu, Siyun Zhong, Zhongfang Wang · Jul 22, 2025

Citations: 0

Match reason: Keyword overlap 1/2 across title and protocol fields.

Score: 52% Moderate protocol signal Freshness: Cold Status: Ready
Automatic Metrics Multilingual
  • Evaluated across three benchmark datasets, UTR-STCNet consistently outperforms state-of-the-art baselines in predicting mean ribosome load (MRL), a key proxy for translational efficiency.
Open paper
CLAUSE: Agentic Neuro-Symbolic Knowledge Graph Reasoning via Dynamic Learnable Context Engineering

Yang Zhao, Chengxiao Dai, Wei Zhuo, Yue Xiu, Dusit Niyato · Sep 25, 2025

Citations: 0

Match reason: Keyword overlap 1/2 across title and protocol fields.

Score: 56% High protocol signal Freshness: Cold Status: Fallback
Automatic Metrics Multi Agent General
  • We introduce CLAUSE, an agentic three-agent neuro-symbolic framework that treats context construction as a sequential decision process over knowledge graphs, deciding what to expand, which paths to follow or backtrack, what evidence to…
  • CLAUSE employs the proposed Lagrangian-Constrained Multi-Agent Proximal Policy Optimization (LC-MAPPO) algorithm to coordinate three agents: Subgraph Architect, Path Navigator, and Context Curator, so that subgraph construction,…
Open paper
PonderLM-2: Pretraining LLM with Latent Thoughts in Continuous Space

Boyi Zeng, He Li, Shixiang Song, Yixuan Wang, Zitong Wang, Ziwei He · Sep 27, 2025

Citations: 0

Match reason: Keyword overlap 1/2 across title and protocol fields.

Score: 49% Sparse protocol signal Freshness: Cold Status: Ready
Coding
  • Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
Open paper
Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents

Yaorui Shi, Yuxin Chen, Siyuan Wang, Sihang Li, Hengxing Cai, Qi Gu · Sep 27, 2025

Citations: 0

Match reason: Keyword overlap 1/2 across title and protocol fields.

Score: 49% Sparse protocol signal Freshness: Cold Status: Ready
General
  • To tackle these challenges, we present ReMemR1, which integrates the mechanism of memory retrieval into the memory update process, enabling the agent to selectively callback historical memories for non-linear reasoning.
Open paper
MARS: toward more efficient multi-agent collaboration for LLM reasoning

Xiao Wang, Jia Wang, Yijie Wang, Pengtao Dang, Sha Cao, Chi Zhang · Sep 24, 2025

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 33% High protocol signal Freshness: Cold Status: Ready
Critique Edit Automatic Metrics Multi Agent Coding
  • Large language models (LLMs) have achieved impressive results in natural language understanding, yet their reasoning capabilities remain limited when operating as single agents.
  • In this paper, we propose MARS (Multi-Agent Review System), a role-based collaboration framework inspired by the review process.
Open paper
Annotation-Efficient Universal Honesty Alignment

Shiyu Ni, Keping Bi, Jiafeng Guo, Minghao Tang, Jingtong Wu, Zengxin Han · Oct 20, 2025

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 30% Moderate protocol signal Freshness: Cold Status: Ready
General
  • To support a large-scale study, we release HonestyBench, a benchmark covering ten free-form QA datasets with 560k training and 70k evaluation instances annotated with correctness and self-consistency signals.
Open paper

Match reason: Matched by broad semantic/index fallback.

Score: 33% Moderate protocol signal Freshness: Cold Status: Fallback
Llm As JudgeAutomatic Metrics General
  • A qualitative audit by an independent LLM-as-a-judge confirms the discovery of meaningful functional axes, such as policy intent, that thematic ground-truth labels fail to capture.
Open paper
StateX: Enhancing RNN Recall via Post-training State Expansion

Xingyu Shen, Yingfa Chen, Zhen Leng Thai, Xu Han, Zhiyuan Liu, Maosong Sun · Sep 26, 2025

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 26% Sparse protocol signal Freshness: Cold Status: Ready
General
  • Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
Open paper
Revela: Dense Retriever Learning via Language Modeling

Fengyu Cai, Tong Chen, Xinran Zhao, Sihao Chen, Hongming Zhang, Sherry Tongshuang Wu · Jun 19, 2025

Citations: 0

Match reason: Matched by broad semantic/index fallback.

Score: 26% Sparse protocol signal Freshness: Cold Status: Ready
Coding
  • We evaluate Revela on domain-specific (CoIR), reasoning-intensive (BRIGHT), and general-domain (BEIR) benchmarks across various retriever backbones.
Open paper

Protocol Hubs

Get Started

Join the #1 Platform for AI Training Talent

Where top AI builders and expert AI Trainers connect to build the future of AI.
Self-Service
Post a Job
Post your project and get a shortlist of qualified AI Trainers and Data Labelers. Hire and manage your team in the tools you already use.
Managed Service
For Large Projects
Done-for-You
We recruit, onboard, and manage a dedicated team inside your tools. End-to-end operations for large or complex projects.
For Freelancers
Join as an AI Trainer
Find AI training and data labeling projects across platforms, all in one place. One profile, one application process, more opportunities.