HFEPX Archive Slice
HFEPX Daily Archive: 2025-11-18
Updated from current HFEPX corpus (Apr 9, 2026). 9 papers are grouped in this daily page.
Read Full Context
Updated from current HFEPX corpus (Apr 9, 2026). 9 papers are grouped in this daily page. Common evaluation modes: Automatic Metrics. Common annotation unit: Ranking. Frequent quality control: Adjudication. Frequently cited benchmark: Finagentbench. Common metric signal: accuracy. Use this page to compare protocol setup, judge behavior, and labeling design decisions before running new eval experiments. Newest paper in this set is from Nov 18, 2025.