Skip to content
← Back to explorer

HFEPX Hub

General Papers

Updated from current HFEPX corpus (Feb 26, 2026). 528 papers are grouped in this hub page. Common evaluation modes: Automatic Metrics, Simulation Env. Frequently cited benchmark: retrieval. Common metric signal: accuracy. Newest paper in this set is from Feb 25, 2026.

Papers: 528 Last published: Feb 25, 2026 Global RSS Tag RSS
General

Why This Matters For Eval Research

  • Common evaluation patterns here: Automatic Metrics, Simulation Env.
  • Benchmark signals emphasize: retrieval, MMLU.
  • Top reported metrics include: accuracy, cost.

Research Utility Snapshot

Human Feedback Mix

  • Pairwise Preference (38)
  • Red Team (15)
  • Critique Edit (14)
  • Rubric Rating (10)

Evaluation Modes

  • Automatic Metrics (459)
  • Simulation Env (63)
  • Human Eval (19)
  • Llm As Judge (6)

Top Benchmarks

  • Retrieval (55)
  • MMLU (6)
  • AIME (2)
  • Mle Bench (2)

Top Metrics

  • Accuracy (102)
  • Cost (37)
  • Recall (17)
  • F1 (15)

Top Papers

Related Hubs