OpenTrain AI
Maintained implementation availablepytorch

Tutel: Adaptive Mixture-of-Experts at Scale

June 1, 2022arXiv: 2206.03382
3 repos980 stars~a few days to reproduce
arXiv PDF

Abstract

Results & Benchmarks

TaskDatasetMetricValue
Adaptive Mixture-of-experts At ScaleImageNet-22KTop-1 Accuracy32
Adaptive Mixture-of-experts At ScaleCOCOAccuracy85.5
ClassificationImageNetAccuracy85.5

Hardware Requirements

  • Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Best Implementation

Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4

980 107 Apr 2026 MIT
License
CI
Deps
Docker
  • Selected microsoft/tutel as the strongest maintained implementation for new work.
  • Repository activity is within the last 24 months.
  • Official repository is preserved separately as historical context.

Reproduction Path

  1. 1

    Start with microsoft/tutel and validate setup instructions in README.

  2. 2

    Reproduce the baseline result with the provided defaults before modifying hyperparameters.

  3. 3

    Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few daysNo CI workflows detectedDependency manifest is missing

Additional Implementations

Official

No additional official repositories detected.

Community

  • microsoft/TutelConfidence: low

    Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4

    Stars: 980Forks: 107Last push: Apr 2026License: MIT

Hugging Face Artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts.