Skip to content
← Back to explorer

HFEPX Hub

Law Papers

Updated from current HFEPX corpus (Feb 26, 2026). 43 papers are grouped in this hub page. Common evaluation modes: Automatic Metrics, Simulation Env. Frequently cited benchmark: MATH. Common metric signal: accuracy. Newest paper in this set is from Feb 24, 2026.

Papers: 43 Last published: Feb 24, 2026 Global RSS Tag RSS
Law

Why This Matters For Eval Research

  • Common evaluation patterns here: Automatic Metrics, Simulation Env.
  • Benchmark signals emphasize: MATH, advbench.
  • Top reported metrics include: accuracy, cost.

Research Utility Snapshot

Human Feedback Mix

  • Expert Verification (3)
  • Pairwise Preference (2)
  • Red Team (2)
  • Rubric Rating (2)

Evaluation Modes

  • Automatic Metrics (37)
  • Simulation Env (5)
  • Human Eval (3)

Top Benchmarks

  • MATH (2)
  • Advbench (1)
  • GSM8K (1)
  • Lawbench (1)

Top Metrics

  • Accuracy (7)
  • Cost (4)
  • F1 (2)
  • Agreement (1)

Top Papers

Related Hubs