Results & Benchmarks
| Task | Dataset | Metric | Value |
|---|---|---|---|
| Quantization | Bert-Base | E5M2 | 0.9040 |
| Quantization | Bert-Large | E5M2 | 0.6968 |
| Quantization | ResNet-50 | E5M2 | 0.7544 |
| Quantization | DenseNet-121 | E5M2 | 0.7435 |
| Quantization | Wav2Vec2 | E5M2 | 0.9632 |
| Quantization | Funnel | E5M2 | 0.9215 |
Best Implementation
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
2.6k 302 Apr 2026 Apache-2.0
License ✓
CI ✓
Deps ✓
Docker –
- Selected intel/neural-compressor as the strongest maintained implementation for new work.
- Includes CI workflow signals.
- Includes dependency/environment manifest signals.
- Repository activity is within the last 24 months.
Reproduction Path
- 1
Start with intel/neural-compressor and validate setup instructions in README.
- 2
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
- 3
Log exact dependency versions and runtime environment for reproducibility.
Time to first repro: a few hoursNo repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.
Additional Implementations
No additional verified repositories beyond the primary recommendation.
Hugging Face Artifacts
No direct paper-linked artifacts were found. Showing strongest curated related artifacts.
Curated Related