Going deeper with Image Transformers
Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, Hervé Jégou
Abstract
Transformers have been recently adapted for large scale image classification, achieving high scores shaking up the long supremacy of convolutional neural networks. However the optimization of image transformers has been little studied so far. In this work, we build and optimize deeper transformer networks for image classification. In particular, we investigate the interplay of architecture and optimization of such de...
Best Implementation
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
- Selected rwightman/pytorch-image-models as the strongest maintained implementation for new work.
- Includes CI workflow signals.
- Includes dependency/environment manifest signals.
- Repository activity is within the last 24 months.
Reproduction Path
- 1
Start with rwightman/pytorch-image-models and validate setup instructions in README.
- 2
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
- 3
Log exact dependency versions and runtime environment for reproducibility.
Additional Implementations
No additional verified repositories beyond the primary recommendation.
Hugging Face Artifacts
No trustworthy direct or curated related Hugging Face artifacts were found yet.
Continue with targeted Hugging Face searches: