MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition
Seoungsub Lee, In Seo Kim, Seon Wook Kim
Abstract
Large language models (LLMs) have achieved outstanding performance across a wide range of natural language processing tasks, but their enormous parameter counts impose ubstantial memory and computational overheads. This challenge is particularly critical in NPU-based on-device environments, where FP16/FP32 computation is inefficient and integer (INT) quantization is therefore essential. However, existing methods, inc...
Results & Benchmarks
| Task | Dataset | Metric | Value |
|---|---|---|---|
| Quantization | GPT2-small(0.1B) | perplexity. | 25.3825 |
Hardware Requirements
- Expect multi-day setup/compute for meaningful reproduction based on current guidance.
Best Implementation
Maintained implementation evidence is not confirmed for this paper yet.
Use the Implementation Status and Reproduction Path sections below for the current action plan.
Reproduction Path
Follow this baseline workflow to decide if this paper is worth immediate prototyping.
- 1
Use the paper and benchmark evidence to scope a baseline reproduction plan.
- 2
Start from this likely method family: Quantization.
- 3
Track assumptions and missing details in an experiment log before coding.
Additional Implementations
No additional verified repositories beyond the primary recommendation.
Hugging Face Artifacts
No trustworthy direct or curated related Hugging Face artifacts were found yet.
Continue with targeted Hugging Face searches: