Maintained implementation availabletfPretrained Models Available

Efficient Post-training Quantization with FP8 Formats

September 1, 2023arXiv: 2309.14592

2 repos2,609 stars~a few hours to reproduce

Abstract

Results & Benchmarks

Task	Dataset	Metric	Value
Quantization	Bert-Base	E5M2	0.9040
Quantization	Bert-Large	E5M2	0.6968
Quantization	ResNet-50	E5M2	0.7544
Quantization	DenseNet-121	E5M2	0.7435
Quantization	Wav2Vec2	E5M2	0.9632
Quantization	Funnel	E5M2	0.9215

Best Implementation

intel/neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

2.6k 302 Apr 2026 Apache-2.0

License ✓

CI ✓

Deps ✓

Docker –

Selected intel/neural-compressor as the strongest maintained implementation for new work.
Includes CI workflow signals.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Reproduction Path

1
Start with intel/neural-compressor and validate setup instructions in README.
2
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
3
Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few hoursNo repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.

Additional Implementations

No additional verified repositories beyond the primary recommendation.

Hugging Face Artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts.

Curated Related

ostris/zimage_turbo_training_adapter
50.9k 131
ostris/FLUX.1-schnell-training-adapter
1.6k 92
allenai/OLMo-2-0425-1B-early-training
1.5k 6

Research Context

Tasks

Quantization

Methods

Quantization