What is the best open-source implementation of "Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective"?

The best maintained implementation is NVIDIA/kvpress with 1,115 stars on GitHub. Confidence: high. Reproducibility: Strong.

Are there pretrained models available for "Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective"?

Yes, 3 Hugging Face models found. The top result is deepseek-ai/deepseek-llm-7b-chat with 48,245 downloads.

Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective

Q: How reproducible is "Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective"?

Estimated time to first reproduction: a few hours. No risk flags identified. Start with NVIDIA/kvpress and validate setup instructions in README.

Q: What framework is used to implement "Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective"?

The primary implementation uses pytorch.

Published: Feb 1, 2025

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 1

Top repo stars: 1,115

Core AI workload signals detected from paper context and implementation/artifact evidence.

Framework: pytorch

Time to first repro: a few hours

No risk flags

arXiv PDF

Technical details

Canonical key: arxiv-2502.03805

Cache status: Fresh

Generated at: Jun 20, 2026, 11:53 AM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few hours

pytorch

Results & Benchmarks

Freshness tier: cold

Direct + Inferred Evidence

Question answering

Retr.KV (5 turns)

60% Cache Budget

52.60

Source: paper fulltext

Natural language processing

PyramidKV

Llama 40% Cache

44.20

Source: paper fulltext

Natural language processing

DuoAttention

Llama 40% Cache

48.17

Source: paper fulltext

Benchmark evidence drill-down

3 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Question answering	Retr.KV (5 turns)	60% Cache Budget	52.60	paper-derived	No explicit refs
Natural language processing	PyramidKV	Llama 40% Cache	44.20	paper-derived	No explicit refs
Natural language processing	DuoAttention	Llama 40% Cache	48.17	paper-derived	No explicit refs

Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective is the primary contribution described in this paper.

Use This Implementation Because…

Confidence: high

NVIDIA/kvpress is the strongest maintained implementation based on ranking signals. CI workflows are present. License is declared (Apache-2.0).

Open NVIDIA/kvpress

Reproduction Risks

No repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.

Evidence disclosure

Evidence graph: 4 refs, 4 links.

Utility signals: depth 90/100, grounding 95/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

NVIDIA/kvpress

best maintained

Maintenance: Active

Confidence: High

Reproducibility: Strong

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 1,115
Last push: Jun 17, 2026 (3d ago)

CIReleasesDependencies

Risk flags

No Docker setup

ffy0/adakv

alternative

Maintenance: Stale risk

Confidence: Low

Reproducibility: Moderate

Community adoption signal (134 stars)

Stars: 134
Last push: Nov 26, 2025 (206d ago)

Dependencies

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

FFY0/AdaKV

alternative

Maintenance: Stale risk

Confidence: Low

Reproducibility: Moderate

Matched via arXiv identifier search · Community adoption signal (134 stars)

Stars: 134
Last push: Nov 26, 2025 (206d ago)

Dependencies

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

Best implementation now

NVIDIA/kvpress

Confidence: High

Reproducibility: Strong

LLM KV cache compression made easy

Stars: 1,115

Forks: 155

Last push: Jun 17, 2026

License: Apache-2.0

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Partial overlap with paper title keywords

Community adoption signal (1115 stars)

License ✓

CI ✓

Deps ✓

Docker –

Selected NVIDIA/kvpress as the strongest maintained implementation for new work.
Includes CI workflow signals.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Reproduction readiness

Ready to Run

Time to first repro: hours

Last checked: Jun 20, 2026

Ready to reproduce

· Clone NVIDIA/kvpress and install dependencies from pyproject.toml.
· CI pipeline detected — automated tests are in place.
· Last updated 3 days ago.

Open NVIDIA/kvpress

Quick start

git clone https://github.com/NVIDIA/kvpress.git
pip install -e .

Additional implementations

No additional verified repositories beyond the primary recommendation.

Possible but unverified matches (2)

These repositories had low-confidence matching signals and are hidden by default.

ffy0/adakv

Confidence: Low

Stars: 134
FFY0/AdaKV

Confidence: Low

Stars: 134

Hugging Face artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.

Models

deepseek-ai/deepseek-llm-7b-chat

Curated Related

Downloads: 48,245

Likes: 223
deepseek-ai/deepseek-llm-7b-base

Curated Related

Downloads: 41,574

Likes: 145
llm-jp/llm-jp-4-8b-thinking

Curated Related

Downloads: 58,780

Likes: 40