HFEPX Metric Hub
Pass@1 In CS.LG Papers
Updated from current HFEPX corpus (Mar 8, 2026). 6 papers are grouped in this metric page.
Read Full Context
Updated from current HFEPX corpus (Mar 8, 2026). 6 papers are grouped in this metric page. Common evaluation modes: Automatic Metrics, Simulation Env. Most common rater population: Domain Experts. Common annotation unit: Multi Dim Rubric. Frequently cited benchmark: Ad-Bench. Common metric signal: pass@1. Use this page to compare protocol setup, judge behavior, and labeling design decisions before running new eval experiments. Newest paper in this set is from Feb 15, 2026.