HFEPX Hub
CS.MA + General Papers
Updated from current HFEPX corpus (Feb 27, 2026). 5 papers are grouped in this hub page. Common evaluation modes: Automatic Metrics, Simulation Env. Common annotation unit: Freeform. Frequent quality control: Adjudication. Common metric signal: accuracy. Use this page to compare protocol setup, judge behavior, and labeling design decisions before running new eval experiments. Newest paper in this set is from Feb 25, 2026.