Results & Benchmarks
| Task | Dataset | Metric | Value |
|---|---|---|---|
| Adaptive Mixture-of-experts At Scale | ImageNet-22K | Top-1 Accuracy | 32 |
| Adaptive Mixture-of-experts At Scale | COCO | Accuracy | 85.5 |
| Classification | ImageNet | Accuracy | 85.5 |
Hardware Requirements
- Expect multi-day setup/compute for meaningful reproduction based on current guidance.
Best Implementation
Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4
980 107 Apr 2026 MIT
License ✓
CI –
Deps –
Docker –
- Selected microsoft/tutel as the strongest maintained implementation for new work.
- Repository activity is within the last 24 months.
- Official repository is preserved separately as historical context.
Reproduction Path
- 1
Start with microsoft/tutel and validate setup instructions in README.
- 2
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
- 3
Log exact dependency versions and runtime environment for reproducibility.
Time to first repro: a few daysNo CI workflows detectedDependency manifest is missing
Additional Implementations
Official
No additional official repositories detected.
Community
- microsoft/TutelConfidence: low
Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4
Stars: 980Forks: 107Last push: Apr 2026License: MIT
Hugging Face Artifacts
No direct paper-linked artifacts were found. Showing strongest curated related artifacts.
Curated Related