Skip to content
← Back to explorer

HFEPX Hub

Simulation Env + Long Horizon Papers

Updated from current HFEPX corpus (Feb 26, 2026). 20 papers are grouped in this hub page. Common evaluation modes: Simulation Env, Automatic Metrics. Frequently cited benchmark: retrieval. Common metric signal: cost. Newest paper in this set is from Feb 25, 2026.

Papers: 20 Last published: Feb 25, 2026 Global RSS Tag RSS
Simulation EnvLong Horizon

Why This Matters For Eval Research

  • Common evaluation patterns here: Simulation Env, Automatic Metrics.
  • Benchmark signals emphasize: retrieval, arlarena.
  • Top reported metrics include: cost, success rate.

Research Utility Snapshot

Human Feedback Mix

  • Pairwise Preference (2)
  • Demonstrations (1)
  • Expert Verification (1)
  • Rubric Rating (1)

Evaluation Modes

  • Simulation Env (20)
  • Automatic Metrics (1)
  • Human Eval (1)

Top Benchmarks

  • Retrieval (2)
  • Arlarena (1)
  • MATH (1)
  • Mle Bench (1)

Top Metrics

  • Cost (2)
  • Success rate (2)
  • Accuracy (1)
  • Latency (1)

Top Papers

Related Hubs