Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
- Stars
- 2,808
- Last push
- Feb 20, 2026 (120d ago)
Risk flags
- Repository archived
- No tagged releases
- No Docker setup
Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.
No concrete benchmark grounding is available yet. Treat the page as context or an implementation starting point only.
Efficient multi-prompt evaluation of LLMs is the primary contribution described in this paper.
felipemaiapolo/prompteval is the strongest maintained implementation based on ranking signals. License is declared (MIT). Dependency/environment manifests are present.
Open felipemaiapolo/promptevalEvidence graph: 4 refs, 4 links.
Utility signals: depth 55/100, grounding 85/100, status medium.
Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.
Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
Risk flags
Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
Risk flags
Matched via arXiv identifier search · Partial overlap with paper title keywords
Risk flags
Efficient multi-prompt evaluation of LLMs
Preserved for provenance. Not recommended as the default path for new builds.
Dependencies pinned, manual setup needed
Quick start
git clone https://github.com/felipemaiapolo/prompteval.git
pip install -e . No benchmark numbers could be verified. You will not be able to validate reproduction correctness against published numbers.
No additional verified repositories beyond the primary recommendation.
These repositories had low-confidence matching signals and are hidden by default.
No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.
Broaden model search
Broaden dataset search
No trustworthy demo spaces right now.
Search spaces on Hugging FaceEvaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXNeed human evaluators for your AI research? Scale annotation with expert AI Trainers.