HFEPX Benchmark Hub
LMSYS Chatbot Arena + Coding Benchmark Papers
Updated from current HFEPX corpus (Mar 8, 2026). 2 papers are grouped in this benchmark page.
Read Full Context
Updated from current HFEPX corpus (Mar 8, 2026). 2 papers are grouped in this benchmark page. Frequently cited benchmark: LMSYS Chatbot Arena. Use this page to compare protocol setup, judge behavior, and labeling design decisions before running new eval experiments. Newest paper in this set is from Sep 27, 2025.