HFEPX Archive Slice
HFEPX Weekly Archive: 2025-W09
Updated from current HFEPX corpus (Mar 1, 2026). 8 papers are grouped in this daily page.
Read Full Context
Updated from current HFEPX corpus (Mar 1, 2026). 8 papers are grouped in this daily page. Common evaluation modes: Automatic Metrics. Common annotation unit: Pairwise. Frequent quality control: Calibration. Common metric signal: helpfulness. Use this page to compare protocol setup, judge behavior, and labeling design decisions before running new eval experiments. Newest paper in this set is from Feb 28, 2025.