EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction
Han Cai, Chuang Gan, Song Han, Gan, Chuang, Han, Song
Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.
High-resolution dense prediction enables many appealing real-world applications, such as computational photography, autonomous driving, etc. However, the vast computational cost makes deploying state-of-the-art high-resolution dense prediction models on hardware devices difficult. This work presents EfficientViT, a new family of high-resolution vision models with novel multi-scale linear attention. Unlike prior high- ...
resolution dense prediction models that rely on heavy softmax attention, hardware-inefficient large-kernel convolution, or complicated topology structure to obtain good performances, our multi-scale linear attention achieves the global receptive field and multi-scale learning (two desirable features for high-resolution dense prediction) with only lightweight and hardware-efficient operations. As such, EfficientViT delivers remarkable performance gains over previous state-of-the-art models with significant speedup on diverse hardware platforms, including mobile CPU, edge GPU, and cloud GPU. Without performance loss on Cityscapes, our EfficientViT provides up to 13.9$\times$ and 6.2$\times$ GPU latency reduction over SegFormer and SegNeXt, respectively. For super-resolution, EfficientViT delivers up to 6.4x speedup over Restormer while providing 0.11dB gain in PSNR. For Segment Anything, EfficientViT delivers 48.9x higher throughput on A100 GPU while achieving slightly better zero-shot instance segmentation performance on COCO.
Results & Benchmarks
No concrete benchmark grounding is available yet. Treat the page as context or an implementation starting point only.
High-resolution dense prediction enables many appealing real-world applications, such as computational photography, autonomous driving, etc.
Implementation Evidence Summary
Recommendation evidence is currently too limited for a maintained-repo choice. Use Implementation Status and Reproduction Path for a practical baseline plan.
Reproduction Risks
- Estimate is based on paper-only reproduction flow
Hardware Notes
For Segment Anything, EfficientViT delivers 48.9x higher throughput on A100 GPU while achieving slightly better zero-shot instance segmentation performance on COCO.
Evidence disclosure
Evidence graph: 2 refs, 1 links.
Utility signals: depth 65/100, grounding 58/100, status medium.
Implementation Status
There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.
- No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.
- Start from related paper: ИСПОЛЬЗОВAНИЕ ПОТЕНЦИAЛA СОЦИAЛЬНЫХ ПAРТНЕРОВ В ПОДГОТОВКЕ БУДУЩИХ ПЕДAГОГОВ.
- Track assumptions and missing details in an experiment log before coding.
Reproduction readiness
Hardware requirements
- For Segment Anything, EfficientViT delivers 48.9x higher throughput on A100 GPU while achieving slightly better zero-shot instance segmentation performance on COCO.
No verified implementation available
- · No maintained repository has been identified for this paper. Check adjacent implementations or HF artifacts below.
No benchmark numbers could be verified. You will not be able to validate reproduction correctness against published numbers.
Framework baselines
- Hugging Face Transformers training guide
Modern transformer training baseline.
- PyTorch nn.Transformer docs
Reference transformer building block implementation.
Hugging Face artifacts
No trustworthy direct or curated related Hugging Face artifacts were found yet.
Continue with targeted Hugging Face searches derived from the paper title and method context:
Tip: start with models, then check datasets/spaces if you need evaluation data or demos.
Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.
Research context
35
Citations
0
References
Tasks
Scale (ratio), High resolution, Linear scale, Computer science, Econometrics, Physical Sciences
Methods
None detected
Domains
Computer Vision and Pattern Recognition
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Related papers
-
Search on Paper2Code
ИСПОЛЬЗОВAНИЕ ПОТЕНЦИAЛA СОЦИAЛЬНЫХ ПAРТНЕРОВ В ПОДГОТОВКЕ БУДУЩИХ ПЕДAГОГОВ (2024) Semantic similarity
-
Search on Paper2Code
Susquehanna Chorale Spring Concert "Roots and Wings" (2017) Semantic similarity
-
Search on Paper2Code
Using DataGrid Control to Realize DataBase of Querying in VB6.0 (2000) Semantic similarity
-
Search on Paper2Code
Study and Two Types of Typical Usage of DataGrid Web Server Control (2005) Semantic similarity
-
Search on Paper2Code
Achieving Parameter of DBSCAN Based on Datagrid (2010) Semantic similarity
-
Search on Paper2Code
PACWON: A parallelizing compiler for workstations on a network (1998) Semantic similarity
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.