What is the best open-source implementation of "Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights"?

The best maintained implementation is sunrainyg/RandOpt with 601 stars on GitHub. Confidence: medium. Reproducibility: Limited.

What framework is used to implement "Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights"?

The primary implementation uses PyTorch Adam optimizer docs.

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Q: How reproducible is "Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights"?

Estimated time to first reproduction: a few hours. Risk flags: License metadata missing, No CI workflows detected. Start with sunrainyg/RandOpt and validate setup instructions in README.

Yulu Gan, Phillip Isola

Published: Mar 12, 2026

Best maintained implementation now

Evidence: Direct

Domain fit: AI-adjacent

Verified repos: 1

Top repo stars: 601

Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.

Framework: PyTorch Adam optimizer docs

Time to first repro: a few hours

2 risk flags

arXiv PDF

Pretraining produces a learned parameter vector that is typically treated as a starting point for further iterative adaptation. In this work, we instead view the outcome of pretraining as a distribution over parameter vectors, whose support already contains task-specific experts. We show that in small models such expert solutions occupy a negligible fraction of the volume of this distribution, making their discovery ...

Read full abstract

reliant on structured optimization methods such as gradient descent. In contrast, in large, well-pretrained models the density of task-experts increases dramatically, so that diverse, task-improving specialists populate a substantial fraction of the neighborhood around the pretrained weights. Motivated by this perspective, we explore a simple, fully parallel post-training method that samples $N$ parameter perturbations at random, selects the top $K$, and ensembles predictions via majority vote. Despite its simplicity, this approach is competitive with standard post-training methods such as PPO, GRPO, and ES for contemporary large-scale models.

Technical details

Canonical key: arxiv-2603.12228

Cache status: Fresh

Generated at: Jun 11, 2026, 6:04 PM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: No

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few hours

2 risk flags

PyTorch Adam optimizer docs

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Stochastic optimization

GSM8K

Accuracy

500

Source: paper fulltext

Benchmark evidence drill-down

1 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Stochastic optimization	GSM8K	Accuracy	500	paper-derived	No explicit refs

Pretraining produces a learned parameter vector that is typically treated as a starting point for further iterative adaptation.

Use This Implementation Because…

Confidence: medium

sunrainyg/RandOpt is the best available implementation candidate based on ranking signals, but recommendation confidence is not yet high. Dependency/environment manifests are present.

Open sunrainyg/RandOpt

Reproduction Risks

License metadata missing
No CI workflows detected

Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 95/100, grounding 85/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

sunrainyg/RandOpt

best maintained

Maintenance: Active

Confidence: Medium

Reproducibility: Limited

Matched via arXiv identifier search · Strong overlap with paper title keywords

Stars: 601
Last push: May 20, 2026 (22d ago)

Dependencies

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

plugyawn/FlashOpt

alternative

Maintenance: Active

Confidence: Low

Reproducibility: Limited

Matched via arXiv identifier search

Stars: 1
Last push: Jun 6, 2026 (6d ago)

Dependencies

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

Clara-X/RandOptMerge

alternative

Maintenance: Recently updated

Confidence: Low

Reproducibility: Limited

Matched via arXiv identifier search

Stars: 0
Last push: May 2, 2026 (40d ago)

Dependencies

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

Best implementation now

sunrainyg/RandOpt

Confidence: Medium

Reproducibility: Limited

Official Codebase for "Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights" (ICML 2026 Spotlight)

Stars: 601

Forks: 65

Last push: May 20, 2026

Matched via arXiv identifier search

Strong overlap with paper title keywords

Community adoption signal (601 stars)

License –

CI –

Deps ✓

Docker –

Selected sunrainyg/RandOpt as the strongest maintained implementation for new work.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Reproduction readiness

Setup Required

Time to first repro: hours

Last checked: Jun 11, 2026

Dependencies pinned, manual setup needed

· sunrainyg/RandOpt has requirements.txt but requires manual environment setup.
· No Dockerfile — you will set up the environment manually.
· No CI pipeline — test coverage is unknown.

Open sunrainyg/RandOpt

Quick start

git clone https://github.com/sunrainyg/RandOpt.git
pip install -r requirements.txt

Framework baselines

PyTorch Adam optimizer docs
Reference implementation of Adam in PyTorch.
Optax Adam optimizer docs
JAX/Flax baseline for Adam variants.
Keras Adam optimizer docs
TensorFlow/Keras baseline for Adam.

Additional implementations

No additional verified repositories beyond the primary recommendation.

Possible but unverified matches (2)

These repositories had low-confidence matching signals and are hidden by default.

plugyawn/FlashOpt

Confidence: Low

Stars: 1
Clara-X/RandOptMerge

Confidence: Low

Stars: 0

Hugging Face artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Continue with targeted Hugging Face searches derived from the paper title and method context:

Models

arxiv:2603.12228 Neural Thickets Stochastic optimization

Datasets

arxiv:2603.12228 Stochastic optimization benchmark Neural Thickets dataset

Spaces

arxiv:2603.12228 Stochastic optimization gradio Neural Thickets demo

Tip: start with models, then check datasets/spaces if you need evaluation data or demos.

Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.

Search models Search datasets Search spaces

Research context

Tasks

Stochastic optimization

Methods

Stochastic optimization

Domains

None detected

Evaluation & Human Feedback Data

Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.

Open in HFEPX

Explore Similar Papers

Jump to Paper2Code search queries derived from this paper's research context.

Stochastic optimization

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.

Post a Job Get a Quote