Less is More: Improving LLM Alignment via Preference Data Selection

HFEPX Relevance Assessment

This paper has direct human-feedback and/or evaluation protocol signal and is likely useful for eval pipeline design.

Eval-Fit Score

50/100 • Medium

Useful as a secondary reference; validate protocol details against neighboring papers.

Human Feedback Signal

Detected

Evaluation Signal

Weak / implicit signal

HFEPX Fit

High-confidence candidate

If you are doing eval pipeline work, start here:

Human Eval Hub LLM-as-Judge Hub Pairwise Preference Hub Tool-Use Eval Hub

Protocol And Measurement Signals

Benchmarks / Datasets

AlpacaEval 2.0

Reported Metrics

No metric terms were extracted from the available abstract.

Research Brief

Deterministic synthesis

Direct Preference Optimization (DPO) has emerged as a promising approach for aligning large language models with human preferences. HFEPX signals include Pairwise Preference with confidence 0.55. Updated from current HFEPX corpus.

Generated Mar 4, 2026, 7:23 AM · Grounded in abstract + metadata only

Key Takeaways

Direct Preference Optimization (DPO) has emerged as a promising approach for aligning large language models with human preferences.
To further mitigate the noise in different reward models, we propose a Bayesian Aggregation approach that unifies multiple margin sources (external and implicit) into a single…

Researcher Actions

Compare its human-feedback setup against pairwise and rubric hubs.
Cross-check benchmark overlap: AlpacaEval 2.0.
Verify metric definitions before comparing against your eval pipeline.

Caveats

Generated from title, abstract, and extracted metadata only; full-paper implementation details are not parsed.
Extraction confidence is probabilistic and should be validated for critical decisions.

Recommended Queries

human-eval protocol design pairwise preference data quality inter-rater agreement adjudication

Research Summary

Contribution Summary

Direct Preference Optimization (DPO) has emerged as a promising approach for aligning large language models with human preferences.
To further mitigate the noise in different reward models, we propose a Bayesian Aggregation approach that unifies multiple margin sources (external and implicit) into a single preference probability.
Remarkably, by using just 10\% of the Ultrafeedback dataset, our approach achieves 3\% to 8\% improvements across various Llama, Mistral, and Qwen models on the AlpacaEval2 benchmark.

Why It Matters For Eval

Direct Preference Optimization (DPO) has emerged as a promising approach for aligning large language models with human preferences.
To further mitigate the noise in different reward models, we propose a Bayesian Aggregation approach that unifies multiple margin sources (external and implicit) into a single preference probability.

Researcher Checklist

Pass: Human feedback protocol is explicit

Detected: Pairwise Preference
Gap: Evaluation mode is explicit

No clear evaluation mode extracted.
Gap: Quality control reporting appears

No calibration/adjudication/IAA control explicitly detected.
Pass: Benchmark or dataset anchors are present

Detected: AlpacaEval 2.0
Gap: Metric reporting is present

No metric terms extracted.

Related Papers

Papers are ranked by protocol overlap, extraction signal alignment, and semantic proximity.

A Parallel Cross-Lingual Benchmark for Multimodal Idiomaticity Understanding Protocol Overlap

Citations: 0 Relevance: 4.10 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment Protocol Overlap

Citations: 0 Relevance: 4.10 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment Protocol Overlap

Citations: 0 Relevance: 4.10 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
ARGUS: Adaptive Rotation-Invariant Geometric Unsupervised System Protocol Overlap

Citations: 0 Relevance: 4.10 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback Protocol Overlap

Citations: 0 Relevance: 4.10 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning Protocol Overlap

Citations: 0 Relevance: 4.10 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
Bridging the Multilingual Safety Divide: Efficient, Culturally-Aware Alignment for Global South Languages Protocol Overlap

Citations: 0 Relevance: 4.10 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
CAMEL: Confidence-Gated Reflection for Reward Modeling Protocol Overlap

Citations: 0 Relevance: 4.10 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
CCCaption: Dual-Reward Reinforcement Learning for Complete and Correct Image Captioning Protocol Overlap

Citations: 0 Relevance: 4.10 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
ChartEditBench: Evaluating Grounded Multi-Turn Chart Editing in Multimodal Language Models Protocol Overlap

Citations: 0 Relevance: 4.10 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
Cold-Start Personalization via Training-Free Priors from Structured World Models Protocol Overlap

Citations: 0 Relevance: 4.10 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning Protocol Overlap

Citations: 0 Relevance: 4.10 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol