Skip to content
implementation starting point
Benchmarks: thin evidence
Time to repro: a few days
1 risk flag
none

Results & Benchmarks

Freshness tier: cold
Direct + Inferred Evidence

Some benchmark signal exists in the extracted evidence, but it is not structured strongly enough yet for a confident benchmark decision.

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC is the primary contribution described in this paper.

Use This Implementation Because…

Confidence: high

stan-dev/loo is the strongest maintained implementation based on ranking signals. CI workflows are present. License is declared (NOASSERTION).

Open stan-dev/loo

Reproduction Risks

  • Dependency manifest is missing

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 4 refs, 4 links.

Utility signals: depth 95/100, grounding 95/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

stan-dev/loo
best maintained
Maintenance: Active
Confidence: High
Reproducibility: Moderate

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars
154
Last push
Apr 27, 2026 (4d ago)
CIReleases

Risk flags

  • No Docker setup
  • Dependency manifest missing
avehtari/PSIS
historical official
Maintenance: Stale
Confidence: High
Reproducibility: Limited

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars
82
Last push
Mar 13, 2024 (779d ago)

Risk flags

  • No push in 12+ months
  • No CI pipeline detected
  • No tagged releases
jgabry/loo
alternative
Maintenance: Active
Confidence: Low
Reproducibility: Moderate

Strong overlap with paper title keywords · Community adoption signal (154 stars)

Stars
154
Last push
Apr 27, 2026 (4d ago)
CIReleases

Risk flags

  • No Docker setup
  • Dependency manifest missing
  • Low confidence match

Best implementation now

stan-dev/loo
Confidence: High
Reproducibility: Moderate

loo R package for approximate leave-one-out cross-validation (LOO-CV) and Pareto smoothed importance sampling (PSIS)

Stars: 154
Forks: 39
Last push: Apr 27, 2026
License: NOASSERTION
Official implementation from Papers with Code
Repository link is mentioned in the paper metadata
Strong overlap with paper title keywords
Community adoption signal (154 stars)
License ✓
CI ✓
Deps –
Docker –
  • Selected stan-dev/loo as the strongest maintained implementation for new work.
  • Includes CI workflow signals.
  • Repository activity is within the last 24 months.
  • Official repository is preserved separately as historical context.

Historical official implementation

Preserved for provenance. Not recommended as the default path for new builds.

avehtari/PSIS
Stars: 82
Last push: Mar 13, 2024

Reproduction readiness

Major Work
Time to first repro: days
Last checked: Apr 30, 2026

Hardware requirements

  • Expect multi-day setup/compute for meaningful reproduction based on current guidance.

No dependency manifest — manual reconstruction required

  • · stan-dev/loo has no requirements.txt, environment.yml, pyproject.toml, or Dockerfile.
  • · You will need to reverse-engineer dependencies from import statements in the source code.
Open stan-dev/loo

Additional implementations

No additional verified repositories beyond the primary recommendation.

These repositories had low-confidence matching signals and are hidden by default.

Hugging Face artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.

Models

Datasets

No trustworthy dataset matches right now.

Search datasets on Hugging Face

Research context

Evaluation & Human Feedback Data

Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.

Open in HFEPX

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.