- SCOPE: Selective Conformal Optimized Pairwise LLM Judging
Sher Badshah, Ali Emami, Hassan Sajjad · Feb 13, 2026 · Citations: 0
Pairwise Preference Automatic Metrics
Large language models (LLMs) are increasingly used as judges to replace costly human preference labels in pairwise evaluation.
- Elo-Evolve: A Co-evolutionary Framework for Language Model Alignment
Jing Zhao, Ting Zhen, Junwei Bao, Hongfei Jiang, Yang Song · Feb 14, 2026 · Citations: 0
Pairwise Preference Automatic Metrics
Current alignment methods for Large Language Models (LLMs) rely on compressing vast amounts of human preference data into static, absolute reward functions, leading to data scarcity, noise sensitivity, and training instability.
- WebCoderBench: Benchmarking Web Application Generation with Comprehensive and Interpretable Evaluation Metrics
Chenxu Liu, Yingjie Fu, Wei Yang, Ying Zhang, Tao Xie · Jan 5, 2026 · Citations: 0
Pairwise Preference Llm As Judge
However, building a benchmark for LLM-generated web apps remains challenging due to the need for real-world user requirements, generalizable evaluation metrics without relying on ground-truth implementations or test cases, and interpretable…
- DSPA: Dynamic SAE Steering for Data-Efficient Preference Alignment
James Wedgwood, Aashiq Muhamed, Mona T. Diab, Virginia Smith · Mar 23, 2026 · Citations: 0
Pairwise Preference Automatic Metrics
Preference alignment is usually achieved by weight-updating training on preference data, which adds substantial alignment-stage compute and provides limited mechanistic visibility.
- Evaluation of Large Language Models via Coupled Token Generation
Nina Corvelo Benz, Stratis Tsirtsis, Eleni Straitouri, Ivi Chatzi, Ander Artola Velasco · Feb 3, 2025 · Citations: 0
Pairwise Preference
In this work, we argue that the evaluation and ranking of large language models should control for the randomization underpinning their functioning.
- TARo: Token-level Adaptive Routing for LLM Test-time Alignment
Arushi Rai, Qiang Zhang, Hanqing Zeng, Yunkai Zhang, Dipesh Tamboli · Mar 19, 2026 · Citations: 0
Pairwise Preference
Recent test-time alignment methods offer a lightweight alternative, but have been explored mainly for preference alignment rather than reasoning.
- Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training
Yixin Liu, Yue Yu, DiJia Su, Sid Wang, Xuewei Wang · Mar 12, 2026 · Citations: 0
Pairwise Preference
Reasoning LLMs-as-Judges, which can benefit from inference-time scaling, provide a promising path for extending the success of reasoning models to non-verifiable domains where the output correctness/quality cannot be directly checked.
- Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
Junming Yang, Ning Xu, Biao Liu, Shiqi Qiao, Xin Geng · Sep 27, 2025 · Citations: 0
Pairwise Preference
To bridge this gap, we propose Meta-Weighted Adaptive Preference Optimization (MetaAPO), a novel framework that dynamically couples data generation with model training.
- Less is More: Improving LLM Alignment via Preference Data Selection
Xun Deng, Han Zhong, Rui Ai, Fuli Feng, Zheng Wang · Feb 20, 2025 · Citations: 0
Pairwise Preference
Direct Preference Optimization (DPO) has emerged as a promising approach for aligning large language models with human preferences.
- Activation Steering for Aligned Open-ended Generation without Sacrificing Coherence
Niklas Herbster, Martin Zborowski, Alberto Tosato, Gauthier Gidel, Tommaso Tosato · Apr 9, 2026 · Citations: 0
- When LLM Judge Scores Look Good but Best-of-N Decisions Fail
Eddie Landesberg · Mar 12, 2026 · Citations: 0
- SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation
Jiahao Zhao, Feng Jiang, Shaowei Qin, Zhonghui Zhang, Junhao Liu · Feb 26, 2026 · Citations: 0
- Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
Shijing Hu, Jingyang Li, Zhihui Lu, Pan Zhou · Sep 26, 2025 · Citations: 0
- An Automated Survey of Generative Artificial Intelligence: Large Language Models, Architectures, Protocols, and Applications
Eduardo C. Garrido-Merchán, Álvaro López López · Jun 5, 2023 · Citations: 0