Results & Benchmarks
Benchmark data is not yet available for this paper.
Hardware Requirements
- Expect multi-day setup/compute for meaningful reproduction based on current guidance.
Best Implementation
loo R package for approximate leave-one-out cross-validation (LOO-CV) and Pareto smoothed importance sampling (PSIS)
154 39 Apr 2026 NOASSERTION
License ✓
CI ✓
Deps –
Docker –
- Selected stan-dev/loo as the strongest maintained implementation for new work.
- Includes CI workflow signals.
- Repository activity is within the last 24 months.
- Official repository is preserved separately as historical context.
Reproduction Path
- 1
Start with stan-dev/loo and validate setup instructions in README.
- 2
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
- 3
Log exact dependency versions and runtime environment for reproducibility.
Time to first repro: a few daysDependency manifest is missing
Additional Implementations
No additional verified repositories beyond the primary recommendation.
Hugging Face Artifacts
No direct paper-linked artifacts were found. Showing strongest curated related artifacts.
Curated Related
- vectara/hallucination_evaluation_model80.1k 348