Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
- Stars
- 1,070
- Last push
- Mar 1, 2026 (6d ago)
Risk flags
- No CI pipeline detected
- No Docker setup
Siyuan Li, Zedong Wang, Zicheng Liu, Cheng Tan, Haitao Lin, Di Wu, Zhiyuan Chen, Jiangbin Zheng, Stan Z. Li
Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.
By contextualizing the kernel as global as possible, Modern ConvNets have shown great potential in computer vision tasks. However, recent progress on multi-order game-theoretic interaction within deep neural networks (DNNs) reveals the representation bottleneck of modern ConvNets, where the expressive interactions have not been effectively encoded with the increased kernel size. To tackle this challenge, we propose a ...
new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning in pure ConvNet-based models with favorable complexity-performance trade-offs. MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module, where discriminative features are efficiently gathered and contextualized adaptively. MogaNet exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet and various downstream vision benchmarks, including COCO object detection, ADE20K semantic segmentation, 2D&3D human pose estimation, and video prediction. Notably, MogaNet hits 80.0% and 87.8% accuracy with 5.2M and 181M parameters on ImageNet-1K, outperforming ParC-Net and ConvNeXt-L, while saving 59% FLOPs and 17M parameters, respectively. The source code is available at https://github.com/Westlake-AI/MogaNet.
Researcher verdict
This page has evidence-backed benchmark findings and a concrete implementation recommendation anchored on chengtan9907/OpenSTL. Use it as an implementation baseline, then validate benchmark parity before adapting it.
Why this page is still worth reading
Benchmark trust
Concrete benchmark findings are present and can be audited against the extracted evidence.
Use this page as
Start here when you need the most practical implementation path quickly.
Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.
| Task | Dataset | Metric | Value | Source | Evidence refs |
|---|---|---|---|---|---|
| Image classification | ImageNet-1K | Top-1 Accuracy | 80.0 | llm-grounded | paper.abstractevidencePack.paperSections[id=paper_abstract] |
| Image classification | ImageNet-1K | Top-1 Accuracy | 87.8 | llm-grounded | paper.abstractevidencePack.paperSections[id=paper_abstract]researcherSummary.benchmarkSnapshot[1] |
| Image classification | ImageNet | Accuracy | 87.8 | llm-grounded | researcherSummary.benchmarkSnapshot[1] |
By contextualizing the kernel as global as possible, Modern ConvNets have shown great potential in computer vision tasks.
chengtan9907/OpenSTL is the strongest maintained implementation based on ranking signals. License is declared (Apache-2.0). Dependency/environment manifests are present.
Open chengtan9907/OpenSTLLLM evidence refs: paper.abstract, evidencePack.paperSections[id=paper_abstract], researcherSummary.benchmarkSnapshot[0], researcherSummary.benchmarkSnapshot[1], repos[0].fullName, repos[1].fullName, repos[3].fullName, guidance.riskFlags[0]
Evidence graph: 3 refs, 3 links.
Utility signals: depth 90/100, grounding 85/100, status high.
Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.
Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
Risk flags
Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
Risk flags
Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
Risk flags
AI-generated summary grounded in paper metadata and artifact signals.
MogaNet is a family of modern ConvNets that combines simple convolutions with a multi-order gated aggregation module to efficiently contextualize visual features. The paper reports competitive performance with state-of-the-art Vision Transformers and ConvNets on ImageNet and several downstream tasks, including detection and segmentation. Official Apache-2.0 implementations are available, with OpenSTL recommended as the primary entry point for reproduction, though lack of CI introduces some environment-drift risk.
Use the Apache-2.0 OpenSTL repository as the main starting point for reproducing MogaNet-related results. Clone the repo, create the conda environment from the provided environment.yml, activate it, and install the package in development mode. First reproduce a baseline configuration using the default training scripts (for example, SimVP+gSTA on Moving MNIST) to validate your setup and logging. Only after matching baseline metrics should you adapt configurations toward the specific MogaNet experiments described in the paper or related official codebases, keeping exact dependency versions and environment details recorded due to the absence of CI.
OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning
Preserved for provenance. Not recommended as the default path for new builds.
Follow the direct implementation path
Start with chengtan9907/OpenSTL and validate setup instructions in README.
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
Log exact dependency versions and runtime environment for reproducibility.
[ICLR 2024] MogaNet: Efficient Multi-order Gated Aggregation Network
OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning
These repositories had low-confidence matching signals and are hidden by default.
No trustworthy direct or curated related Hugging Face artifacts were found yet.
Continue with targeted Hugging Face searches derived from the paper title and method context:
Tip: start with models, then check datasets/spaces if you need evaluation data or demos.
Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.
Tasks
Image classification
Methods
None detected
Domains
Computer vision
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.