What is the best open-source implementation of "Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning"?

The best maintained implementation is google/uncertainty-baselines with 1,567 stars on GitHub. Confidence: high. Reproducibility: Moderate.

What framework is used to implement "Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning"?

The primary implementation uses tf.

Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning

Q: How reproducible is "Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning"?

Estimated time to first reproduction: a few days. Risk flags: Dependency manifest is missing. Start with google/uncertainty-baselines and validate setup instructions in README.

Published: Jun 1, 2021

Best maintained implementation now

Evidence: Direct

Domain fit: AI-adjacent

Verified repos: 1

Top repo stars: 1,567

Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.

Framework: tf

Time to first repro: a few days

1 risk flag

arXiv PDF

Technical details

Canonical key: arxiv-2106.04015

Cache status: Fresh

Generated at: Mar 7, 2026, 10:32 AM

Artifact coverage: direct

HF provider: ok (public)

PWC source used: Yes

LLM status: ready

LLM model: openai/gpt-5.1-20251113

LLM generated: Mar 6, 2026, 3:25 AM

LLM content type: researcher_benchmark_brief

HF policy: hf-relevance-v27

LLM evidence refs: paper.abstract, evidencePack.paperSections[id=paper_2], evidencePack.paperSections[id=paper_5], evidencePack.paperSections[id=paper_8], researcherSummary.benchmarkSnapshot[0], researcherSummary.benchmarkSnapshot[1], evidencePack.paperSections[id=paper_7], evidencePack.paperSections[id=paper_10], guidance.riskFlags[0], repos[0].fullName, researcherSummary.hardwareNotes[0], researcherSummary.timeToFirstMeaningfulRun, paper.title, summary.hasReliableImplementation

Researcher verdict

Recommended implementation path available

implementation baseline

Benchmark trust: thin evidence

Quality tier: researcher ready

This page has evidence-backed benchmark findings and a concrete implementation recommendation anchored on google/uncertainty-baselines. Use it as an implementation baseline, then validate benchmark parity before adapting it.

Why this page is still worth reading

A concrete repository path exists via google/uncertainty-baselines, so this page can act as a practical starting point.
Reproduction risks are surfaced explicitly, which helps decide whether the paper is worth immediate prototyping.

Benchmark trust

Some benchmark signal exists in the extracted evidence, but it is not structured strongly enough yet for a confident benchmark decision.

Use this page as

Start here when you need the most practical implementation path quickly.

Results & Benchmarks

Freshness tier: cold

Direct + Inferred Evidence

Some benchmark signal exists in the extracted evidence, but it is not structured strongly enough yet for a confident benchmark decision.

Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning is the primary contribution described in this paper.

Use This Implementation Because…

Confidence: high

google/uncertainty-baselines is the strongest maintained implementation based on ranking signals. CI workflows are present. License is declared (Apache-2.0).

Open google/uncertainty-baselines

Reproduction Risks

Dependency manifest is missing

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 60/100, grounding 75/100, status medium.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

google/uncertainty-baselines

best maintained

Maintenance: Recently updated

Confidence: High

Reproducibility: Moderate

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 1,567
Last push: Feb 2, 2026 (33d ago)

Risk flags

No tagged releases
No Docker setup
Dependency manifest missing

Teddy-J-J/uncertainty-baselines

alternative

Maintenance: Active

Confidence: Low

Reproducibility: Moderate

Matched via arXiv identifier search · Partial overlap with paper title keywords

Stars: 0
Last push: Feb 24, 2026 (11d ago)

Risk flags

No tagged releases
No Docker setup
Dependency manifest missing

masamasa59/uncertainty-paper

alternative

Maintenance: Stale

Confidence: Low

Reproducibility: Limited

Matched via arXiv identifier search · Repository appears stale (>24 months since last push)

Stars: 17
Last push: Nov 6, 2022 (1217d ago)

Risk flags

No push in 12+ months
No CI pipeline detected
No tagged releases

Paper summary

AI-generated

AI-generated summary grounded in paper metadata and artifact signals.

The work defines Uncertainty Baselines as a collection of standardized benchmarks for uncertainty and robustness in deep learning, covering at least ImageNet and Diabetic Retinopathy tasks with multiple baseline methods. This page includes benchmark evidence for Uncertainty robustness benchmarking on ImageNet. Reproduction guidance focuses on implementation viability and concrete risk controls.

Key contributions

The work defines Uncertainty Baselines as a collection of standardized benchmarks for uncertainty and robustness in deep learning, covering at least ImageNet and Diabetic Retinopathy tasks with multiple baseline methods.
The benchmarks provide unified implementations of multiple uncertainty methods built on common backbones, including Wide ResNet for CIFAR10/100 and ResNet-50 (plus ResNet-101/152 and EfficientNet).
For the Diabetic Retinopathy benchmark, the authors include detailed hyperparameter tuning results from two rounds of quasirandom search to help others retune their own uncertainty methods.
The paper illustrates Uncertainty Baselines’ capabilities using only one of nine tasks, ImageNet, and explicitly avoids a legend comparing specific baselines, limiting direct method-to-method performance comparison.

Implementation guidance

Use google/uncertainty-baselines first because deterministic ranking and extracted evidence align on implementation viability. Start with the repo setup path, then validate benchmark reproduction before adaptation.

Reproducibility notes

Environment recreation may fail or produce inconsistent results because the repository lacks an explicit dependency manifest, leading to mismatched library versions.
Insufficient compute or time allocation can prevent full convergence on large benchmarks like ImageNet and Diabetic Retinopathy, yielding weaker or irreproducible uncertainty.
Departing from the specified data preprocessing pipelines, such as padding, cropping, and ResNet-style normalization, can invalidate comparisons with the provided baselines.
Incorrectly configuring or omitting the documented quasirandom hyperparameter search for the Diabetic Retinopathy benchmark can result in misleading performance and uncertainty.

Best implementation now

google/uncertainty-baselines

Confidence: High

Reproducibility: Moderate

High-quality implementations of standard and SOTA methods on a variety of tasks.

Stars: 1,567

Forks: 215

Last push: Feb 2, 2026

License: Apache-2.0

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Matched via arXiv identifier search

Partial overlap with paper title keywords

Community adoption signal (1567 stars)

License ✓

CI ✓

Deps –

Docker –

Selected google/uncertainty-baselines as the strongest maintained implementation for new work.
Includes CI workflow signals.
Repository activity is within the last 24 months.

Reproduction path

Direct

Follow the direct implementation path

1

Start with google/uncertainty-baselines and validate setup instructions in README.
2

Reproduce the baseline result with the provided defaults before modifying hyperparameters.
3

Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few days

Dependency manifest is missing

Additional implementations

No additional verified repositories beyond the primary recommendation.

Possible but unverified matches (3)

These repositories had low-confidence matching signals and are hidden by default.

Teddy-J-J/uncertainty-baselines

Confidence: Low

Stars: 0
masamasa59/uncertainty-paper

Confidence: Low

Stars: 17
ep-infosec/50_google_uncertainty-baselines

Confidence: Low

Stars: 0

Hugging Face artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Continue with targeted Hugging Face searches derived from the paper title and method context:

Models

arxiv:2106.04015 Uncertainty Baselines

Datasets

arxiv:2106.04015 Uncertainty Baselines dataset

Spaces

arxiv:2106.04015 Uncertainty Baselines demo

Tip: start with models, then check datasets/spaces if you need evaluation data or demos.

Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.

Search models Search datasets Search spaces

Research context

Evaluation & Human Feedback Data

Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.

Open in HFEPX

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.

Post a Job Get a Quote