OpenTrain AI
Maintained implementation availablepytorchPretrained Models Available

QLoRA: Efficient Finetuning of Quantized LLMs

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer

May 23, 2023arXiv: 2305.14314
2 repos8,118 stars~a few hours to reproduce
arXiv PDF

Abstract

We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the...

Results & Benchmarks

Benchmark data is not yet available for this paper.

Hardware Requirements

  • We release all of our models and code, including CUDA kernels for 4-bit training.

Best Implementation

Accessible large language models via k-bit quantization for PyTorch.

8.1k 840 Apr 2026 MIT
License โœ“
CI โœ“
Deps โœ“
Docker โ€“
  • Selected timdettmers/bitsandbytes as the strongest maintained implementation for new work.
  • Includes CI workflow signals.
  • Includes dependency/environment manifest signals.
  • Repository activity is within the last 24 months.

Reproduction Path

  1. 1

    Start with timdettmers/bitsandbytes and validate setup instructions in README.

  2. 2

    Reproduce the baseline result with the provided defaults before modifying hyperparameters.

  3. 3

    Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few hoursNo repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.

Additional Implementations

No additional verified repositories beyond the primary recommendation.

Hugging Face Artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts.