Maintained implementation availablepytorch

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen +5 more

June 1, 2023arXiv: 2306.00978

4 repos76,327 stars~a few hours to reproduce

Abstract

Large language models (LLMs) have transformed numerous AI applications. On-device LLM is becoming increasingly important: running LLMs locally on edge devices can reduce the cloud computing cost and protect users' privacy. However, the astronomical model size and the limited hardware resource pose significant deployment challenges. We propose Activation-aware Weight Quantization (AWQ), a hardware-friendly approach fo...

Best Implementation

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

76.3k 15.5k Apr 2026 Apache-2.0

License ✓

CI ✓

Deps ✓

Docker –

Selected vllm-project/vllm as the strongest maintained implementation for new work.
Includes CI workflow signals.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Reproduction Path

1
Start with vllm-project/vllm and validate setup instructions in README.
2
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
3
Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few hoursNo repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.

Additional Implementations

Official

internlm/lmdeployConfidence: low
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Stars: 7.8kForks: 682Last push: Apr 2026License: Apache-2.0
mit-han-lab/llm-awqConfidence: low
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Stars: 3.5kForks: 310Last push: Jul 2025License: MIT

Community

No additional community repositories detected yet.

Hugging Face Artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Continue with targeted Hugging Face searches:

models

arxiv:2306.00978 Quantization

datasets

arxiv:2306.00978 AWQ dataset Instruction tuning dataset

spaces

arxiv:2306.00978 AWQ demo Instruction tuning demo

Research Context