No verified implementation yet

OneComp: One-Line Revolution for Generative AI Model Compression

Yuma Ichikawa, Keiji Kimura, Akihiro Yoshida, Yudai Fujimoto, Hiroki Tokura +9 more

March 30, 2026arXiv: 2603.28845

0 repos~a few days to reproduce

Abstract

Deploying foundation models is increasingly constrained by memory footprint, latency, and hardware costs. Post-training compression can mitigate these bottlenecks by reducing the precision of model parameters without significantly degrading performance; however, its practical implementation remains challenging as practitioners navigate a fragmented landscape of quantization algorithms, precision budgets, data-driven...

Summary

OneComp is an open-source post-training compression framework for large language models that automates model inspection, mixed-precision planning, and progressive multi-stage quantization. Given a model identifier and hardware target, it executes layer-wise compression, block-wise refinement, and global refinement stages, treating the first quantized checkpoint as a deployable pivot. Quality is measured via WikiText-2 perplexity and average zero-shot accuracy across bit-budget variants.

Key Contributions

Fully automated pipeline: single model identifier plus hardware spec drives mixed-precision assignment and progressive quantization without manual configuration.
Deployable pivot design: the first quantized checkpoint is production-ready; subsequent refinement stages are additive and monotonically improve quality with more compute.
Multi-stage compression: layer-wise, block-wise, and global refinement stages are explicitly separated to allow compute-quality tradeoffs at each level.
Unified evaluation protocol: WikiText-2 perplexity and average zero-shot accuracy used systematically across bit-budget and quantization variants.

Reproducibility Notes

Benchmark value for WikiText-2 perplexity (reported as '2') appears to be a partial extraction artifact; verify exact figures from the paper tables before targeting them.
Paper-only reproduction of a multi-stage progressive quantization pipeline typically requires days of implementation and compute, especially at LLM scale.
Mixed-precision planning details (layer-to-bit-width assignment heuristics) are architecture-sensitive and may require careful reading of the experimental section to replicate.

Results & Benchmarks

Benchmark data is not yet available for this paper.

Hardware Requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Best Implementation

Maintained implementation evidence is not confirmed for this paper yet.

Use the Implementation Status and Reproduction Path sections below for the current action plan.

Reproduction Path

Follow this baseline workflow to decide if this paper is worth immediate prototyping.

1
Use the paper and benchmark evidence to scope a baseline reproduction plan.
2
Start from this likely method family: Quantization.
3
Track assumptions and missing details in an experiment log before coding.

Time to first repro: a few daysEstimate is based on paper-only reproduction flow

Additional Implementations

No additional verified repositories beyond the primary recommendation.

Hugging Face Artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Continue with targeted Hugging Face searches:

models

arxiv:2603.28845 OneComp One-Line

datasets

arxiv:2603.28845 OneComp dataset Quantization benchmark

spaces

arxiv:2603.28845 OneComp demo Quantization gradio

Research Context