SPQ: An Ensemble Technique for Large Language Model Compression
Jiamin Yao, Eren Gultepe ยท Feb 20, 2026
Citations: 0
Automatic MetricsSimulation Env MathCoding
- Applied to LLaMA-2-7B, SPQ achieves up to 75% memory reduction while maintaining or improving perplexity (e.g., WikiText-2 5.47 to 4.91) and preserving accuracy on downstream benchmarks such as C4, TruthfulQA, and GSM8K.