Daily Archive
HFEPX Daily Archive: 2026-02-15
Updated from current HFEPX corpus (Feb 27, 2026). 6 papers are grouped in this daily page. Common evaluation modes: Automatic Metrics, Human Eval. Most common rater population: Domain Experts. Common annotation unit: Pairwise. Frequent quality control: Adjudication. Frequently cited benchmark: MMBench. Common metric signal: accuracy. Newest paper in this set is from Feb 15, 2026.