HFEPX Hub

Pairwise Preference Papers (Last 45 Days)

Updated from current HFEPX corpus (Mar 1, 2026). 52 papers are grouped in this hub page.

Read Full Context

Updated from current HFEPX corpus (Mar 1, 2026). 52 papers are grouped in this hub page. Common evaluation modes: Automatic Metrics, Human Eval. Most common rater population: Domain Experts. Common annotation unit: Pairwise. Frequent quality control: Inter Annotator Agreement Reported. Frequently cited benchmark: LiveCodeBench. Common metric signal: cost. Use this page to compare protocol setup, judge behavior, and labeling design decisions before running new eval experiments. Newest paper in this set is from Feb 20, 2026.

Papers: 52 Last published: Feb 20, 2026 Global RSS Tag RSS

Pairwise PreferenceLast 45d

Researcher Quick Triage

This hub is best used for protocol triage and replication planning from abstract-level evidence. Quality band: Developing .

All Sampled Papers (52) Replication-Ready Only (3)

High-Signal Coverage

100.0%

52 / 52 sampled papers are not low-signal flagged.

Replication-Ready Set

Benchmark + metric + eval mode explicitly present.

Judge/Human Comparability

Papers containing both `human_eval` and `llm_as_judge`.

3 papers are replication-ready (benchmark + metric + explicit evaluation mode).
0 papers support judge-vs-human agreement analysis.
5 papers report explicit quality controls (calibration/adjudication/IAA).

Primary action: Start with the top 2 papers in “Start Here”, then validate assumptions in the protocol matrix.

Why This Matters (Expanded)

Why This Matters For Eval Research

100% of papers report explicit human-feedback signals, led by pairwise preferences.
automatic metrics appears in 36.5% of papers in this hub.
LiveCodeBench is a recurring benchmark anchor for cross-paper comparisons in this page.

Protocol Notes (Expanded)

Protocol Takeaways

Most common quality-control signal is inter-annotator agreement reporting (5.8% of papers).
Rater context is mostly domain experts, and annotation is commonly pairwise annotation; use this to scope replication staffing.
Compare papers that report both human_eval and llm_as_judge to quantify judge-human agreement drift.

Benchmark Interpretation

LiveCodeBench appears in 3.8% of hub papers (2/52); use this cohort for benchmark-matched comparisons.
AIME appears in 1.9% of hub papers (1/52); use this cohort for benchmark-matched comparisons.

Metric Interpretation

cost is reported in 9.6% of hub papers (5/52); compare with a secondary metric before ranking methods.
accuracy is reported in 7.7% of hub papers (4/52); compare with a secondary metric before ranking methods.

Researcher Checklist (Expanded)

Researcher Checklist

Strong: Papers with explicit human feedback

Coverage is strong (100% vs 45% target).
Gap: Papers reporting quality controls

Coverage is a replication risk (9.6% vs 30% target).
Gap: Papers naming benchmarks/datasets

Coverage is a replication risk (15.4% vs 35% target).
Moderate: Papers naming evaluation metrics

Coverage is usable but incomplete (32.7% vs 35% target).
Gap: Papers with known rater population

Coverage is a replication risk (17.3% vs 35% target).
Strong: Papers with known annotation unit

Coverage is strong (36.5% vs 35% target).

Strengths

Strong human-feedback signal (100% of papers).
Contains both human-eval and LLM-as-judge protocols for head-to-head methodology comparison.
Agentic evaluation appears in 25% of papers.

Known Gaps

Only 9.6% of papers report quality controls; prioritize calibration/adjudication evidence.
Rater population is under-specified (17.3% coverage).
Benchmark coverage is thin (15.4% of papers mention benchmarks/datasets).

Suggested Next Analyses

Compare papers that report both human_eval and llm_as_judge to quantify judge-human agreement drift.
Stratify by benchmark (LiveCodeBench vs AIME) before comparing methods.
Track metric sensitivity by reporting both cost and accuracy.

Recommended Queries (Expanded)

Recommended Queries

Judge vs Human Agreement Benchmark Slice: LiveCodeBench Metric Slice: cost IAA-Reported Evaluations Recent High-Signal Papers

Start Here (Best First 6)

Ranked for protocol completeness (human signal, benchmark + metric anchors, quality controls, and judge/human overlap).

SCOPE: Selective Conformal Optimized Pairwise LLM Judging
Feb 13, 2026 · Citations: 0 · Score: 9.5

HF: Pairwise Preference · Eval: Automatic Metrics · Benchmark: MT Bench · Metric: Error rate
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
Feb 11, 2026 · Citations: 0 · Score: 8.0

HF: Pairwise Preference · Eval: Not reported · Benchmark: LiveCodeBench · Metric: Latency
MemoryArena: Benchmarking Agent Memory in Interdependent Multi-Session Agentic Tasks
Feb 18, 2026 · Citations: 0 · Score: 8.0

HF: Pairwise Preference · Eval: Automatic Metrics · Benchmark: Memoryarena · Metric: Recall
Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences
Feb 25, 2026 · Citations: 0 · Score: 8.0

HF: Pairwise Preference · Eval: Automatic Metrics · Benchmark: LiveCodeBench · Metric: Accuracy
Validating Political Position Predictions of Arguments
Feb 20, 2026 · Citations: 0 · Score: 7.5

HF: Pairwise Preference · Eval: Human Eval · Benchmark: Not Reported · Metric: Agreement
Yor-Sarc: A gold-standard dataset for sarcasm detection in a low-resource African language
Feb 21, 2026 · Citations: 0 · Score: 7.5

HF: Pairwise Preference · Eval: Automatic Metrics · Benchmark: Not Reported · Metric: Agreement

Protocol Matrix (Top 12)

Use this to quickly compare protocol ingredients instead of scanning long prose.

Paper	HF Signal	Eval Modes	Benchmarks	Metrics	QC
SCOPE: Selective Conformal Optimized Pairwise LLM Judging Feb 13, 2026	Yes Pairwise Preference	Automatic Metrics	MT Bench , LMSYS Chatbot Arena	Error rate	Calibration
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Feb 11, 2026	Yes Pairwise Preference	Not Reported	LiveCodeBench , BrowseComp	Latency , Cost	Not Reported
MemoryArena: Benchmarking Agent Memory in Interdependent Multi-Session Agentic Tasks Feb 18, 2026	Yes Pairwise Preference	Automatic Metrics	Memoryarena	Recall	Not Reported
Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences Feb 25, 2026	Yes Pairwise Preference	Automatic Metrics	LiveCodeBench , Mathbench	Accuracy	Not Reported
Validating Political Position Predictions of Arguments Feb 20, 2026	Yes Pairwise Preference	Human Eval	Not Reported	Agreement	Gold Questions , Inter Annotator Agreement Reported
Yor-Sarc: A gold-standard dataset for sarcasm detection in a low-resource African language Feb 21, 2026	Yes Pairwise Preference	Automatic Metrics	Not Reported	Agreement	Inter Annotator Agreement Reported , Adjudication
Same Words, Different Judgments: Modality Effects on Preference Alignment Feb 26, 2026	Yes Pairwise Preference , Rlaif Or Synthetic Feedback	Automatic Metrics	Not Reported	Agreement	Inter Annotator Agreement Reported
Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models Feb 21, 2026	Yes Pairwise Preference	Human Eval	GSM8K , AIME	Not Reported	Not Reported
ChartEditBench: Evaluating Grounded Multi-Turn Chart Editing in Multimodal Language Models Feb 17, 2026	Yes Pairwise Preference	Automatic Metrics	Charteditbench	Not Reported	Not Reported
ReCoN-Ipsundrum: An Inspectable Recurrent Persistence Loop Agent with Affect-Coupled Control and Mechanism-Linked Consciousness Indicator Assays Feb 26, 2026	Yes Pairwise Preference	Not Reported	DROP	Not Reported	Not Reported
RebuttalAgent: Strategic Persuasion in Academic Rebuttal via Theory of Mind Jan 22, 2026	Yes Pairwise Preference , Critique Edit	Human Eval	Rebuttalbench	Not Reported	Not Reported
Tutoring Large Language Models to be Domain-adaptive, Precise, and Safe Feb 14, 2026	Yes Pairwise Preference	Not Reported	Not Reported	Precision	Not Reported

Protocol Diff (Top Papers)

Fast side-by-side comparison for the highest-ranked papers in this hub.

Signal	SCOPE: Selective Conformal Optimized Pairwise LLM J…	Step 3.5 Flash: Open Frontier-Level Intelligence wi…	MemoryArena: Benchmarking Agent Memory in Interdepe…
Human Feedback	Pairwise Preference	Pairwise Preference	Pairwise Preference
Evaluation Modes	Automatic Metrics	Not reported	Automatic Metrics
Benchmarks	MT Bench, LMSYS Chatbot Arena	LiveCodeBench, BrowseComp	Memoryarena
Metrics	Error rate	Latency, Cost	Recall
Quality Controls	Calibration	Not reported	Not reported
Rater Population	Unknown	Domain Experts	Unknown
Annotation Unit	Pairwise	Unknown	Unknown

Research Utility Snapshot

Human Feedback Mix

Pairwise Preference (52)
Rubric Rating (4)
Critique Edit (2)
Expert Verification (2)

Evaluation Modes

Automatic Metrics (19)
Human Eval (4)
Llm As Judge (1)

Top Benchmarks

LiveCodeBench (2)
AIME (1)
BrowseComp (1)
Charteditbench (1)

Top Metrics

Cost (5)
Accuracy (4)
Agreement (4)
Latency (2)

Rater Population Mix

Domain Experts (8)
Mixed (1)

Quality Controls

Inter Annotator Agreement Reported (3)
Calibration (2)
Adjudication (1)
Gold Questions (1)

Coverage diagnostics (sample-based): human-feedback 100.0% · benchmarks 15.4% · metrics 32.7% · quality controls 9.6%.

Top Papers

Validating Political Position Predictions of Arguments
Jordan Robinson, Angus R. Williams, Katie Atkinson, Anthony G. Cohn · Feb 20, 2026 · Citations: 0

Pairwise Preference Human Eval

Real-world knowledge representation often requires capturing subjective, continuous attributes -- such as political positions -- that conflict with pairwise validation, the widely accepted gold standard for human evaluation.
SCOPE: Selective Conformal Optimized Pairwise LLM Judging
Sher Badshah, Ali Emami, Hassan Sajjad · Feb 13, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

Large language models (LLMs) are increasingly used as judges to replace costly human preference labels in pairwise evaluation.
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
Ailin Huang, Ang Li, Aobo Kong, Bin Wang, Binxing Jiao · Feb 11, 2026 · Citations: 0

Pairwise Preference Tool Use

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency.
Decoupling Strategy and Execution in Task-Focused Dialogue via Goal-Oriented Preference Optimization
Jingyi Xu, Xingyu Ren, Zhoupeng Shou, Yumeng Zhang, Zhiqiang You · Jan 24, 2026 · Citations: 0

Pairwise Preference Automatic Metrics Long Horizon

To address this, we propose Goal-Oriented Preference Optimization (GOPO), a hierarchical reinforcement learning framework that decouples strategy planning from response generation via an Expert Agent and a Customer Service Agent.
MemoryArena: Benchmarking Agent Memory in Interdependent Multi-Session Agentic Tasks
Zexue He, Yu Wang, Churan Zhi, Yuanzhe Hu, Tzu-Ping Chen · Feb 18, 2026 · Citations: 0

Pairwise Preference Automatic Metrics Web Browsing

Existing evaluations of agents with memory typically assess memorization and action in isolation.
Yor-Sarc: A gold-standard dataset for sarcasm detection in a low-resource African language
Toheeb Aduramomi Jimoh, Tabea De Wille, Nikola S. Nikolov · Feb 21, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

This protocol incorporates context-sensitive interpretation and community-informed guidelines and is accompanied by a comprehensive analysis of inter-annotator agreement to support replication in other African languages.
Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences
Sweta Karlekar, Carolina Zheng, Magnus Saebo, Nicolas Beltran-Velez, Shuyang Yu · Feb 25, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

Building on this observation, we introduce Duel-Evolve, an evolutionary optimization algorithm that replaces external scalar rewards with pairwise preferences elicited from the same LLM used to generate candidates.
Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models
Abraham Paul Elenjical, Vivek Hruday Kavuri, Vasudeva Varma · Feb 21, 2026 · Citations: 0

Pairwise Preference Human Eval

We introduce a psychologically grounded metacognitive framework that operationalizes Ann Brown's regulatory cycle (Planning, Monitoring, and Evaluation) as a structured prompting architecture, and study its integration within a lightweight…
RebuttalAgent: Strategic Persuasion in Academic Rebuttal via Theory of Mind
Zhitao He, Zongwei Lyu, Yi R Fung · Jan 22, 2026 · Citations: 0

Pairwise PreferenceCritique Edit Human Eval

In this paper, we introduce RebuttalAgent, the first framework to ground academic rebuttal in Theory of Mind (ToM), operationalized through a ToM-Strategy-Response (TSR) framework that models reviewer mental state, formulates persuasion…
Same Words, Different Judgments: Modality Effects on Preference Alignment
Aaron Broukhim, Nadir Weibel, Eshin Jolly · Feb 26, 2026 · Citations: 0

Pairwise PreferenceRlaif Or Synthetic Feedback Automatic Metrics

Preference-based reinforcement learning (PbRL) is the dominant framework for aligning AI systems to human preferences, but its application to speech remains underexplored.
Tutoring Large Language Models to be Domain-adaptive, Precise, and Safe
Somnath Banerjee · Feb 14, 2026 · Citations: 0

Pairwise Preference Long Horizon

The methodological trajectory moves from classical supervised adaptation for task-specific demands to decoding-time alignment for safety, finally leveraging human feedback and preference modeling to achieve sociolinguistic acuity.
Multi-Objective Alignment of Language Models for Personalized Psychotherapy
Mehrab Beikzadeh, Yasaman Asadollah Salmanpour, Ashima Suvarna, Sriram Sankararaman, Matteo Malgaroli · Feb 17, 2026 · Citations: 0

Pairwise PreferenceExpert Verification Automatic Metrics

While AI systems show therapeutic promise, current alignment approaches optimize objectives independently, failing to balance patient preferences with clinical safety.
Modeling Distinct Human Interaction in Web Agents
Faria Huq, Zora Zhiruo Wang, Zhanqiu Guo, Venu Arvind Arangarajan, Tianyue Ou · Feb 19, 2026 · Citations: 0

Pairwise Preference Automatic Metrics Web Browsing

In this work, we introduce the task of modeling human intervention to support collaborative web task execution.
PrivAct: Internalizing Contextual Privacy Preservation via Multi-Agent Preference Training
Yuhan Cheng, Hancheng Ye, Hai Helen Li, Jingwei Sun, Yiran Chen · Feb 14, 2026 · Citations: 0

Pairwise Preference Automatic Metrics Multi Agent

We propose PrivAct, a contextual privacy-aware multi-agent learning framework that internalizes contextual privacy preservation directly into models' generation behavior for privacy-compliant agentic actions.
Multi-Agent Comedy Club: Investigating Community Discussion Effects on LLM Humor Generation
Shiwei Hong, Lingyao Li, Ethan Z. Rong, Chenxinran Shen, Zhicong Lu · Feb 16, 2026 · Citations: 0

Pairwise PreferenceRubric Rating Multi Agent

Prior work has explored multi-turn interaction and feedback for LLM writing, but evaluations still largely center on prompts and localized feedback, leaving persistent public reception in online communities underexamined.
MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs
Baorong Shi, Bo Cui, Boyuan Jiang, Deli Yu, Fang Qian · Feb 13, 2026 · Citations: 0

Pairwise PreferenceRubric Rating Long Horizon

MedXIAOHE achieves state-of-the-art performance across diverse medical benchmarks and surpasses leading closed-source multimodal systems on multiple capabilities.
Open Rubric System: Scaling Reinforcement Learning with Pairwise Adaptive Rubric
Ruipeng Jia, Yunyi Yang, Yuxin Wu, Yongbo Gai, Siyuan Tao · Feb 15, 2026 · Citations: 0

Pairwise PreferenceRubric Rating Llm As Judge

To operationalize this view, we present the Open Rubric System (OpenRS), a plug-and-play, rubrics-based LLM-as-a-Judge framework built around Pairwise Adaptive Meta-Rubrics (PAMR) and lightweight Pointwise Verifiable Rubrics (PVRs), which…
CAMEL: Confidence-Gated Reflection for Reward Modeling
Zirui Zhu, Hailun Xu, Yang Luo, Yong Liu, Kanchan Sarkar · Feb 24, 2026 · Citations: 0

Pairwise PreferenceCritique Edit Automatic Metrics

Building on this insight, we propose CAMEL, a confidence-gated reflection framework that performs a lightweight single-token preference decision first and selectively invokes reflection only for low-confidence instances.
Who can we trust? LLM-as-a-jury for Comparative Assessment
Mengjie Qian, Guangzhi Sun, Mark J. F. Gales, Kate M. Knill · Feb 18, 2026 · Citations: 0

Pairwise Preference

Large language models (LLMs) are increasingly applied as automatic evaluators for natural language generation assessment often using pairwise comparative judgements.
Toward Expert Investment Teams:A Multi-Agent LLM System with Fine-Grained Trading Tasks
Kunihiro Miyazaki, Takanobu Kawahara, Stephen Roberts, Stefan Zohren · Feb 26, 2026 · Citations: 0

Pairwise Preference Multi Agent

While mainstream approaches deploy multi-agent systems mimicking analyst and manager roles, they often rely on abstract instructions that overlook the intricacies of real-world workflows, which can lead to degraded inference performance and…
Decentralized Ranking Aggregation: Gossip Algorithms for Borda and Copeland Consensus
Anna Van Elst, Kerrian Le Caillec, Igor Colin, Stephan Clémençon · Feb 26, 2026 · Citations: 0

Pairwise Preference Multi Agent

The concept of ranking aggregation plays a central role in preference analysis, and numerous algorithms for calculating median rankings, often originating in social choice theory, have been documented in the literature, offering theoretical…
The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems
Xiaoze Liu, Ruowang Zhang, Weichen Yu, Siheng Xiong, Liu He · Feb 17, 2026 · Citations: 0

Pairwise Preference Multi Agent

Multi-Agent Systems (MAS) powered by Large Language Models have unlocked advanced collaborative reasoning, yet they remain shackled by the inefficiency of discrete text communication, which imposes significant runtime overhead and…
Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback
Chenyang Zhao, Vinny Cahill, Ivana Dusparic · Feb 24, 2026 · Citations: 0

Pairwise PreferenceRlaif Or Synthetic Feedback Human Eval

Preference-based RL offers an appealing alternative by learning from human preferences over pairs of behavioural outcomes.
ChartEditBench: Evaluating Grounded Multi-Turn Chart Editing in Multimodal Language Models
Manav Nitin Kapadnis, Lawanya Baghel, Atharva Naik, Carolyn Rosé · Feb 17, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

In practice, users iteratively refine visualizations through multi-turn interactions that require maintaining common ground, tracking prior edits, and adapting to evolving preferences.
RLHFless: Serverless Computing for Efficient RLHF
Rui Wei, Hanfei Yu, Shubham Jain, Yogarajan Sivakumar, Devesh Tiwari · Feb 26, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

Reinforcement Learning from Human Feedback (RLHF) has been widely applied to Large Language Model (LLM) post-training to align model outputs with human preferences.
DynamicGTR: Leveraging Graph Topology Representation Preferences to Boost VLM Capabilities on Graph QAs
Yanbin Wei, Jiangyue Yan, Chun Kang, Yang Chen, Hua Liu · Feb 25, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

This ``one-size-fits-all'' strategy often neglects model-specific and task-specific preferences, resulting in inaccurate or over-lengthy responses to graph-related queries.
Learning to Reason for Multi-Step Retrieval of Personal Context in Personalized Question Answering
Maryam Amirizaniani, Alireza Salemi, Hamed Zamani · Feb 22, 2026 · Citations: 0

Pairwise Preference Long Horizon

Personalization in Question Answering (QA) requires answers that are both accurate and aligned with users' background, preferences, and historical context.
Bridging the Multilingual Safety Divide: Efficient, Culturally-Aware Alignment for Global South Languages
Somnath Banerjee, Rima Hazra, Animesh Mukherjee · Feb 14, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

Yet safety pipelines, benchmarks, and alignment still largely target English and a handful of high-resource languages, implicitly assuming safety and factuality ''transfer'' across languages.
Hierarchical Reward Design from Language: Enhancing Alignment of Agent Behavior with Human Specifications
Zhiqin Qian, Ryan Diaz, Sangwon Seo, Vaibhav Unhelkar · Feb 20, 2026 · Citations: 0

Pairwise Preference Long Horizon

When training artificial intelligence (AI) to perform tasks, humans often care not only about whether a task is completed but also how it is performed.
CCCaption: Dual-Reward Reinforcement Learning for Complete and Correct Image Captioning
Zhijiang Tang, Linhua Wang, Jiaxin Qi, Weihao Jiang, Peng Hou · Feb 25, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

Image captioning remains a fundamental task for vision language understanding, yet ground-truth supervision still relies predominantly on human-annotated references.
Rethinking Metrics for Lexical Semantic Change Detection
Roksana Goworek, Haim Dubossarsky · Feb 17, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

Lexical semantic change detection (LSCD) increasingly relies on contextualised language model embeddings, yet most approaches still quantify change using a small set of semantic change metrics, primarily Average Pairwise Distance (APD) and
Investigation for Relative Voice Impression Estimation
Kenichi Fujita, Yusuke Ijima · Feb 15, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

The estimation target is a low-dimensional vector derived from subjective evaluations, quantifying the perceptual shift of the second utterance relative to the first along an antonymic axis (e.g., ``Dark--Bright'').
ReCoN-Ipsundrum: An Inspectable Recurrent Persistence Loop Agent with Affect-Coupled Control and Mechanism-Linked Consciousness Indicator Assays
Aishik Sanyal · Feb 26, 2026 · Citations: 0

Pairwise Preference

Inspired by Humphrey's ipsundrum hypothesis, we implement ReCoN-Ipsundrum, an inspectable agent that extends a ReCoN state machine with a recurrent persistence loop over sensory salience Ns and an optional affect proxy reporting…
HiSAC: Hierarchical Sparse Activation Compression for Ultra-long Sequence Modeling in Recommenders
Kun Yuan, Junyu Bi, Daixuan Cheng, Changfa Wu, Shuwen Xiao · Feb 24, 2026 · Citations: 0

Pairwise Preference

Modern recommender systems leverage ultra-long user behavior sequences to capture dynamic preferences, but end-to-end modeling is infeasible in production due to latency and memory constraints.
Simplifying Outcomes of Language Model Component Analyses with ELIA
Aaron Louis Eidt, Nils Feldhus · Feb 20, 2026 · Citations: 0

Pairwise Preference

The effectiveness of this approach was empirically validated through a mixed-methods user study, which revealed a clear preference for interactive, explorable interfaces over simpler, static visualizations.
Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment
Yuyan Bu, Xiaohao Liu, ZhaoXing Ren, Yaodong Yang, Juntao Dai · Feb 18, 2026 · Citations: 0

Pairwise Preference

The widespread deployment of large language models (LLMs) across linguistic communities necessitates reliable multilingual safety alignment.
Synthetic Reader Panels: Tournament-Based Ideation with LLM Personas for Autonomous Publishing
Fred Zimmerman · Feb 16, 2026 · Citations: 0

Pairwise Preference

We present a system for autonomous book ideation that replaces human focus groups with synthetic reader panels -- diverse collections of LLM-instantiated reader personas that evaluate book concepts through structured tournament…
Why Code, Why Now: Learnability, Computability, and the Real Limits of Machine Learning
Zhimin Zhao · Feb 15, 2026 · Citations: 0

Pairwise Preference

We propose a five-level hierarchy of learnability based on information structure and argue that the ceiling on ML progress depends less on model size than on whether a task is learnable at all.
The Subjectivity of Respect in Police Traffic Stops: Modeling Community Perspectives in Body-Worn Camera Footage
Preni Golazizian, Elnaz Rahmati, Jackson Trager, Zhivar Sourati, Nona Ghazizadeh · Feb 10, 2026 · Citations: 0

Pairwise PreferenceRubric Rating

By sampling annotators from police-affiliated, justice-system-impacted, and non-affiliated Los Angeles residents, we enable the systematic study of perceptual differences across diverse communities.
The ASIR Courage Model: A Phase-Dynamic Framework for Truth Transitions in Human and AI Systems
Hyo Jin Kim · Feb 25, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

Although initially formulated for human truth-telling under asymmetric stakes, the same phase-dynamic architecture extends to AI systems operating under policy constraints and alignment filters.
Probing Graph Neural Network Activation Patterns Through Graph Topology
Floriano Tori, Lorenzo Bini, Marco Sorbi, Stéphane Marchand-Maillet, Vincent Ginis · Feb 24, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

However, it remains unclear how the topology of a graph interacts with the learned preferences of GNNs.
Moral Preferences of LLMs Under Directed Contextual Influence
Phil Blandfort, Tushar Karayil, Urja Pawar, Robert Graham, Alex McKenzie · Feb 26, 2026 · Citations: 0

Pairwise Preference

Moral benchmarks for LLMs typically use context-free prompts, implicitly assuming stable preferences.
ExpLang: Improved Exploration and Exploitation in LLM Reasoning with On-Policy Thinking Language Selection
Changjiang Gao, Zixian Huang, Kaichen Yang, Jiajun Chen, Jixing Li · Feb 25, 2026 · Citations: 0

Pairwise Preference

Analysis shows that, by enabling on-policy thinking language selection as an action during RL, ExpLang effectively extends the RL exploration space with diversified language preference and improves the RL exploitation outcome with leveraged…
Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment
Mengxuan Hu, Vivek V. Datla, Anoop Kumar, Zihan Guan, Sheng Li · Feb 24, 2026 · Citations: 0

Pairwise PreferenceRed Team

Recent advances in alignment techniques such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), and Direct Preference Optimization (DPO) have improved the safety of large language models (LLMs).
gencat: Generative computerized adaptive testing
Wanyong Feng, Andrew Lan · Feb 23, 2026 · Citations: 0

Pairwise Preference

We train the model in a two-step process, first via Supervised Fine-Tuning and then via preference optimization for knowledge-response alignment.
Differences in Typological Alignment in Language Models' Treatment of Differential Argument Marking
Iskar Deng, Nathalia Xu, Shane Steinert-Threlkeld · Feb 19, 2026 · Citations: 0

Pairwise Preference

Recent work has shown that language models (LMs) trained on synthetic corpora can exhibit typological preferences that resemble cross-linguistic regularities in human languages, particularly for syntactic phenomena such as word order.
Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History
Serin Kim, Sangam Lee, Dongha Lee · Feb 19, 2026 · Citations: 0

Pairwise Preference

Large language models have advanced web agents, yet current agents lack personalization capabilities.
Learning Personalized Agents from Human Feedback
Kaiqu Liang, Julia Kruk, Shengyi Qian, Xianjun Yang, Shengjie Bi · Feb 18, 2026 · Citations: 0

Pairwise Preference

We introduce Personalized Agents from Human Feedback (PAHF), a framework for continual personalization in which agents learn online from live interaction using explicit per-user memory.
In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations
Mohammad Aflah Khan, Mahsa Amani, Soumi Das, Bishwamittra Ghosh, Qinyuan Wu · Feb 17, 2026 · Citations: 0

Pairwise Preference

Agents based on Large Language Models (LLMs) are increasingly being deployed as interfaces to information on online platforms.
How to Train Your Long-Context Visual Document Model
Austin Veselka · Feb 16, 2026 · Citations: 0

Pairwise Preference

We systematically study continued pretraining, supervised finetuning, and preference optimization for 24B and 32B parameter models, backed by extensive LC evaluations and ablations to bridge this gap, and achieve state-of-the-art…
Cold-Start Personalization via Training-Free Priors from Structured World Models
Avinandan Bose, Shuyue Stella Li, Faeze Brahman, Pang Wei Koh, Simon Shaolei Du · Feb 16, 2026 · Citations: 0

Pairwise Preference

Cold-start personalization requires inferring user preferences through interaction when no user-specific historical data is available.
LogitsCoder: Towards Efficient Chain-of-Thought Path Search via Logits Preference Decoding for Code Generation
Jizheng Chen, Weiming Zhang, Xinyi Dai, Weiwen Liu, Kounianhua Du · Feb 15, 2026 · Citations: 0

Pairwise Preference

LogitsCoder iteratively generates and refines reasoning steps by first steering token selection toward statistically preferred patterns via Logits Preference Decoding, then selecting and aggregating diverse reasoning paths using Logits Rank…

Related Hubs

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.

Post a Job Get a Quote