High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock, Soham De, Samuel Smith, Karen Simonyan
Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.
Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for large learning rates ...
or strong data augmentations. In this work, we develop an adaptive gradient clipping technique which overcomes these instabilities, and design a significantly improved class of Normalizer-Free ResNets. Our smaller models match the test accuracy of an EfficientNet-B7 on ImageNet while being up to 8.7x faster to train, and our largest models attain a new state-of-the-art top-1 accuracy of 86.5%. In addition, Normalizer-Free models attain significantly better performance than their batch-normalized counterparts when finetuning on ImageNet after large-scale pre-training on a dataset of 300 million labeled images, with our best models obtaining an accuracy of 89.2%. Our code is available at https://github.com/deepmind/ deepmind-research/tree/master/nfnets
Results & Benchmarks
Some benchmark signal exists in the extracted evidence, but it is not structured strongly enough yet for a confident benchmark decision.
Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples.
Implementation Evidence Summary
Recommendation evidence is currently too limited for a maintained-repo choice. Use Implementation Status and Reproduction Path for a practical baseline plan.
Reproduction Risks
- Estimate is based on paper-only reproduction flow
Hardware Notes
Expect multi-day setup/compute for meaningful reproduction based on current guidance.
Evidence disclosure
Evidence graph: 2 refs, 1 links.
Utility signals: depth 100/100, grounding 68/100, status medium.
Implementation Status
There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.
- No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.
- Start from related paper: Foreground object segmentation from binocular stereo video.
- Track assumptions and missing details in an experiment log before coding.
Reproduction readiness
Hardware requirements
- Expect multi-day setup/compute for meaningful reproduction based on current guidance.
No verified implementation available
- · No maintained repository has been identified for this paper. Check adjacent implementations or HF artifacts below.
Hugging Face artifacts
No trustworthy direct or curated related Hugging Face artifacts were found yet.
Continue with targeted Hugging Face searches derived from the paper title and method context:
Tip: start with models, then check datasets/spaces if you need evaluation data or demos.
Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.
Research context
256
Citations
84
References
Tasks
Normalization (sociology), Computer science, Scale (ratio), Pattern recognition (psychology), Physical Sciences
Methods
None detected
Domains
Artificial intelligence, Computer vision, Computer Vision and Pattern Recognition
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Related papers
-
Search on Paper2Code
Foreground object segmentation from binocular stereo video (2005) Semantic similarity
-
Search on Paper2Code
6-DOF object localization by combining monocular vision and robot arm kinematics (2017) Semantic similarity
-
Search on Paper2Code
An Object Detection and Pose Estimation Approach for Position Based Visual Servoing (2017) Semantic similarity
-
Search on Paper2Code
Object-oriented stripe structured-light vision-guided robot (2017) Semantic similarity
-
Search on Paper2Code
Tracking in 3D: Image Variability Decomposition for Recovering Object Pose and Illumination (1999) Semantic similarity
-
Search on Paper2Code
Hand-eye calibration using a single image and robotic picking up using images lacking in contrast (2020) Semantic similarity
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.