What is the best open-source implementation of "How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers"?

The best maintained implementation is rwightman/pytorch-image-models with 36,735 stars on GitHub. Confidence: high. Reproducibility: Strong.

Are there pretrained models available for "How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers"?

Yes, 3 Hugging Face models found. The top result is timm/vit_base_patch16_224.augreg2_in21k_ft_in1k with 526,426 downloads.

How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

Q: How reproducible is "How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers"?

Estimated time to first reproduction: a few hours. No risk flags identified. Start with rwightman/pytorch-image-models and validate setup instructions in README.

Q: What framework is used to implement "How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers"?

The primary implementation uses pytorch.

Andreas Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, Lucas Beyer

Published: Jun 18, 2021

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 3

Top repo stars: 36,735

Core AI workload signals detected from paper context and implementation/artifact evidence.

Framework: pytorch

Time to first repro: a few hours

No risk flags

arXiv PDF

Vision Transformers (ViT) have been shown to attain highly competitive performance for a wide range of vision applications, such as image classification, object detection and semantic image segmentation. In comparison to convolutional neural networks, the Vision Transformer's weaker inductive bias is generally found to cause an increased reliance on model regularization or data augmentation ("AugReg" for short) when ...

Read full abstract

training on smaller training datasets. We conduct a systematic empirical study in order to better understand the interplay between the amount of training data, AugReg, model size and compute budget. As one result of this study we find that the combination of increased compute and AugReg can yield models with the same performance as models trained on an order of magnitude more training data: we train ViT models of various sizes on the public ImageNet-21k dataset which either match or outperform their counterparts trained on the larger, but not publicly available JFT-300M dataset.

Technical details

Canonical key: arxiv-2106.10270

Cache status: Fresh

Generated at: May 2, 2026, 9:05 AM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few hours

pytorch

Results & Benchmarks

Freshness tier: cold

Direct + Inferred Evidence

Image classification

CIFAR-100

Accuracy

100

Source: paper fulltext

Benchmark evidence drill-down

1 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Image classification	CIFAR-100	Accuracy	100	paper-derived	No explicit refs

Use This Implementation Because…

Confidence: high

rwightman/pytorch-image-models is the strongest maintained implementation based on ranking signals. CI workflows are present. License is declared (Apache-2.0).

Open rwightman/pytorch-image-models

Reproduction Risks

No repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.

Evidence disclosure

Evidence graph: 4 refs, 4 links.

Utility signals: depth 90/100, grounding 95/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

rwightman/pytorch-image-models

best maintained

Maintenance: Active

Confidence: High

Reproducibility: Strong

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 36,735
Last push: Apr 29, 2026 (3d ago)

CIReleasesDependencies

Risk flags

No Docker setup

google-research/vision_transformer

historical official

Maintenance: Recently updated

Confidence: High

Reproducibility: Moderate

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 12,502
Last push: Mar 3, 2026 (60d ago)

Risk flags

No tagged releases
No Docker setup
Dependency manifest missing

google-research/big_vision

alternative

Maintenance: Stale risk

Confidence: High

Reproducibility: Limited

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 3,435
Last push: May 19, 2025 (348d ago)

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

Best implementation now

rwightman/pytorch-image-models

Confidence: High

Reproducibility: Strong

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Stars: 36,735

Forks: 5,147

Last push: Apr 29, 2026

License: Apache-2.0

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Strong overlap with paper title keywords

Community adoption signal (36735 stars)

License ✓

CI ✓

Deps ✓

Docker –

Selected rwightman/pytorch-image-models as the strongest maintained implementation for new work.
Includes CI workflow signals.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Historical official implementation

Preserved for provenance. Not recommended as the default path for new builds.

google-research/vision_transformer

Stars: 12,502

Last push: Mar 3, 2026

Reproduction readiness

Ready to Run

Time to first repro: hours

Last checked: May 2, 2026

Ready to reproduce

· Clone rwightman/pytorch-image-models and install dependencies from pyproject.toml.
· CI pipeline detected — automated tests are in place.
· Last updated 3 days ago.

Open rwightman/pytorch-image-models

Quick start

git clone https://github.com/rwightman/pytorch-image-models.git
pip install -e .

Additional implementations

Official

google-research/big_vision
Confidence: High

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.

Stars: 3,435

Forks: 224

Last push: May 19, 2025

License: Apache-2.0