Skip to content
← Back to explorer

HFEPX Hub

Simulation Env Papers

Updated from current HFEPX corpus (Feb 26, 2026). 109 papers are grouped in this hub page. Common evaluation modes: Simulation Env, Automatic Metrics. Frequently cited benchmark: retrieval. Common metric signal: accuracy. Newest paper in this set is from Feb 25, 2026.

Papers: 109 Last published: Feb 25, 2026 Global RSS Tag RSS
Simulation Env

Why This Matters For Eval Research

  • Common evaluation patterns here: Simulation Env, Automatic Metrics.
  • Benchmark signals emphasize: retrieval, MATH.
  • Top reported metrics include: accuracy, cost.

Research Utility Snapshot

Human Feedback Mix

  • Pairwise Preference (5)
  • Critique Edit (2)
  • Demonstrations (2)
  • Expert Verification (2)

Evaluation Modes

  • Simulation Env (109)
  • Automatic Metrics (20)
  • Human Eval (3)
  • Llm As Judge (2)

Top Benchmarks

  • Retrieval (8)
  • MATH (2)
  • Swe Bench (2)
  • SWE Bench (2)

Top Metrics

  • Accuracy (18)
  • Cost (12)
  • F1 (4)
  • Latency (4)

Top Papers

Related Hubs