OpenTrain AI
Maintained implementation availablepytorch

Going deeper with Image Transformers

Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, Hervé Jégou

March 31, 2021arXiv: 2103.17239
2 repos36,625 stars~a few hours to reproduce
arXiv PDF

Abstract

Transformers have been recently adapted for large scale image classification, achieving high scores shaking up the long supremacy of convolutional neural networks. However the optimization of image transformers has been little studied so far. In this work, we build and optimize deeper transformer networks for image classification. In particular, we investigate the interplay of architecture and optimization of such de...

Best Implementation

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

36.6k 5.1k Apr 2026 Apache-2.0
License
CI
Deps
Docker
  • Selected rwightman/pytorch-image-models as the strongest maintained implementation for new work.
  • Includes CI workflow signals.
  • Includes dependency/environment manifest signals.
  • Repository activity is within the last 24 months.

Reproduction Path

  1. 1

    Start with rwightman/pytorch-image-models and validate setup instructions in README.

  2. 2

    Reproduce the baseline result with the provided defaults before modifying hyperparameters.

  3. 3

    Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few hoursNo repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.

Additional Implementations

No additional verified repositories beyond the primary recommendation.

Hugging Face Artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Research Context