What is the best open-source implementation of "EfficientFormer: Vision Transformers at MobileNet Speed"?

The best maintained implementation is rwightman/pytorch-image-models with 36,896 stars on GitHub. Confidence: high. Reproducibility: Strong.

How reproducible is "EfficientFormer: Vision Transformers at MobileNet Speed"?

Estimated time to first reproduction: a few hours. No risk flags identified. Start with rwightman/pytorch-image-models and validate setup instructions in README.

Are there pretrained models available for "EfficientFormer: Vision Transformers at MobileNet Speed"?

Yes, 3 Hugging Face models found. The top result is timm/efficientformer_l1.snap_dist_in1k with 1,305 downloads.

What framework is used to implement "EfficientFormer: Vision Transformers at MobileNet Speed"?

The primary implementation uses pytorch.

EfficientFormer: Vision Transformers at MobileNet Speed

Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren

Published: Jun 2, 2022

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 2

Top repo stars: 36,896

Core AI workload signals detected from paper context and implementation/artifact evidence.

Framework: pytorch

Time to first repro: a few hours

No risk flags

arXiv PDF

Vision Transformers (ViT) have shown rapid progress in computer vision tasks, achieving promising results on various benchmarks. However, due to the massive number of parameters and model design, \textit{e.g.}, attention mechanism, ViT-based models are generally times slower than lightweight convolutional networks. Therefore, the deployment of ViT for real-time applications is particularly challenging, especially on ...

Read full abstract

resource-constrained hardware such as mobile devices. Recent efforts try to reduce the computation complexity of ViT through network architecture search or hybrid design with MobileNet block, yet the inference speed is still unsatisfactory. This leads to an important question: can transformers run as fast as MobileNet while obtaining high performance? To answer this, we first revisit the network architecture and operators used in ViT-based models and identify inefficient designs. Then we introduce a dimension-consistent pure transformer (without MobileNet blocks) as a design paradigm. Finally, we perform latency-driven slimming to get a series of final models dubbed EfficientFormer. Extensive experiments show the superiority of EfficientFormer in performance and speed on mobile devices. Our fastest model, EfficientFormer-L1, achieves $79.2\%$ top-1 accuracy on ImageNet-1K with only $1.6$ ms inference latency on iPhone 12 (compiled with CoreML), which runs as fast as MobileNetV2$\times 1.4$ ($1.6$ ms, $74.7\%$ top-1), and our largest model, EfficientFormer-L7, obtains $83.3\%$ accuracy with only $7.0$ ms latency. Our work proves that properly designed transformers can reach extremely low latency on mobile devices while maintaining high performance.

Technical details

Canonical key: arxiv-2206.01191

Cache status: Stale (SWR served)

Generated at: Jun 18, 2026, 11:38 AM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few hours

pytorch

Results & Benchmarks

Freshness tier: cold

Direct + Inferred Evidence

Some benchmark signal exists in the extracted evidence, but it is not structured strongly enough yet for a confident benchmark decision.

Vision Transformers (ViT) have shown rapid progress in computer vision tasks, achieving promising results on various benchmarks.

Use This Implementation Because…

Confidence: high

rwightman/pytorch-image-models is the strongest maintained implementation based on ranking signals. CI workflows are present. License is declared (Apache-2.0).

Open rwightman/pytorch-image-models

Reproduction Risks

No repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.

Evidence disclosure

Evidence graph: 4 refs, 4 links.

Utility signals: depth 75/100, grounding 85/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

rwightman/pytorch-image-models

best maintained

Maintenance: Active

Confidence: High

Reproducibility: Strong

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 36,896
Last push: Jun 3, 2026 (17d ago)

CIReleasesDependencies

Risk flags

No Docker setup

snap-research/efficientformer

historical official

Maintenance: Stale

Confidence: High

Reproducibility: Moderate

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 1,111
Last push: Aug 13, 2023 (1042d ago)

Dependencies

Risk flags

No push in 12+ months
No CI pipeline detected
No tagged releases

leondgarse/keras_cv_attention_models

alternative

Maintenance: Recently updated

Confidence: Low

Reproducibility: Moderate

Partial overlap with paper title keywords · Community adoption signal (627 stars)

Stars: 627
Last push: Feb 15, 2026 (125d ago)

CIReleases

Risk flags

No Docker setup
Dependency manifest missing
Low confidence match

Best implementation now

rwightman/pytorch-image-models

Confidence: High

Reproducibility: Strong

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Stars: 36,896

Forks: 5,167

Last push: Jun 3, 2026

License: Apache-2.0

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Strong overlap with paper title keywords

Community adoption signal (36896 stars)

License ✓

CI ✓

Deps ✓

Docker –

Selected rwightman/pytorch-image-models as the strongest maintained implementation for new work.
Includes CI workflow signals.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Historical official implementation

Preserved for provenance. Not recommended as the default path for new builds.

snap-research/efficientformer

Stars: 1,111

Last push: Aug 13, 2023

Reproduction readiness

Ready to Run

Time to first repro: hours

Last checked: Jun 18, 2026

Ready to reproduce

· Clone rwightman/pytorch-image-models and install dependencies from pyproject.toml.
· CI pipeline detected — automated tests are in place.
· Last updated 17 days ago.

Open rwightman/pytorch-image-models

Quick start

git clone https://github.com/rwightman/pytorch-image-models.git
pip install -e .

Additional implementations

No additional verified repositories beyond the primary recommendation.

Possible but unverified matches (1)

These repositories had low-confidence matching signals and are hidden by default.

leondgarse/keras_cv_attention_models

Confidence: Low

Stars: 627

Hugging Face artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.

Models

timm/efficientformer_l1.snap_dist_in1k

Curated Related

Downloads: 1,305

Likes: 2
timm/efficientformer_l3.snap_dist_in1k

Curated Related

Downloads: 283

Likes: 1
timm/efficientformer_l7.snap_dist_in1k

Curated Related

Downloads: 204

Likes: 1

Broaden model search

Transformer Image classification Transformer Computer vision Image classification

Datasets

No trustworthy dataset matches right now.

Search datasets on Hugging Face

Spaces

No trustworthy demo spaces right now.

Search spaces on Hugging Face

Explore on Hugging Face

Search models Search datasets Search spaces

Research context

Tasks

Image classification

Methods

Transformer

Domains

Computer vision

Evaluation & Human Feedback Data

Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.

Open in HFEPX

Explore Similar Papers

Jump to Paper2Code search queries derived from this paper's research context.

Image classification Transformer Computer vision

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.

Post a Job Get a Quote