Skip to content
← Back to explorer

HFEPX Hub

Multilingual Papers

Updated from current HFEPX corpus (Feb 26, 2026). 92 papers are grouped in this hub page. Common evaluation modes: Automatic Metrics, Human Eval. Frequently cited benchmark: retrieval. Common metric signal: accuracy. Newest paper in this set is from Feb 25, 2026.

Papers: 92 Last published: Feb 25, 2026 Global RSS Tag RSS
Multilingual

Why This Matters For Eval Research

  • Common evaluation patterns here: Automatic Metrics, Human Eval.
  • Benchmark signals emphasize: retrieval, MMLU.
  • Top reported metrics include: accuracy, cost.

Research Utility Snapshot

Human Feedback Mix

  • Pairwise Preference (4)
  • Expert Verification (3)
  • Red Team (3)
  • Critique Edit (1)

Evaluation Modes

  • Automatic Metrics (83)
  • Human Eval (6)
  • Simulation Env (6)

Top Benchmarks

  • Retrieval (7)
  • MMLU (2)
  • Afri Semeval (1)
  • Banglasummeval (1)

Top Metrics

  • Accuracy (21)
  • Cost (7)
  • Precision (4)
  • Agreement (3)

Top Papers

Related Hubs