What is the best open-source implementation of "Entropy Law: The Story Behind Data Compression and LLM Performance"?

The best maintained implementation is ustc-starteam/zip with 28 stars on GitHub. Confidence: high. Reproducibility: Limited.

How reproducible is "Entropy Law: The Story Behind Data Compression and LLM Performance"?

Estimated time to first reproduction: a few days. Risk flags: No CI workflows detected, Dependency manifest is missing. Start with ustc-starteam/zip and validate setup instructions in README.

What framework is used to implement "Entropy Law: The Story Behind Data Compression and LLM Performance"?

The primary implementation uses pytorch.

Entropy Law: The Story Behind Data Compression and LLM Performance

Published: Jul 1, 2024

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 2

Top repo stars: 28

Core AI workload signals detected from paper context and implementation/artifact evidence.

Framework: pytorch

Time to first repro: a few days

2 risk flags

arXiv PDF

Technical details

Canonical key: arxiv-2407.06645

Cache status: Fresh

Generated at: Jun 18, 2026, 5:13 AM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few days

2 risk flags

pytorch

Results & Benchmarks

Freshness tier: cold

Direct + Inferred Evidence

Natural language processing

MT-Bench

Perplexity

970

Split: Avg.length

Source: paper fulltext

Benchmark evidence drill-down

1 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Natural language processing	MT-Bench	Perplexity	970	paper-derived	No explicit refs

Entropy Law: The Story Behind Data Compression and LLM Performance is the primary contribution described in this paper.

Use This Implementation Because…

Confidence: high

ustc-starteam/zip is the strongest maintained implementation based on ranking signals. License is declared (MIT).

Open ustc-starteam/zip

Reproduction Risks

No CI workflows detected
Dependency manifest is missing

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 100/100, grounding 85/100, status high.

Implementation Comparison

Top 2 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

ustc-starteam/zip

best maintained

Maintenance: Active

Confidence: High

Reproducibility: Limited

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 28
Last push: Jun 10, 2026 (8d ago)

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

USTC-StarTeam/ZIP

alternative

Maintenance: Active

Confidence: Medium

Reproducibility: Limited

Matched via arXiv identifier search · Strong overlap with paper title keywords

Stars: 28
Last push: Jun 10, 2026 (8d ago)

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

Best implementation now

ustc-starteam/zip

Confidence: High

Reproducibility: Limited

arXiv 2024 | ZIP: entropy-law data selection for efficient LLM alignment.

Stars: 28

Forks: 2

Last push: Jun 10, 2026

License: MIT

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Strong overlap with paper title keywords

Community adoption signal (28 stars)

License ✓

CI –

Deps –

Docker –

Selected ustc-starteam/zip as the strongest maintained implementation for new work.
Repository activity is within the last 24 months.

Reproduction readiness

Major Work

Time to first repro: days

Last checked: Jun 18, 2026

Hardware requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

No dependency manifest — manual reconstruction required

· ustc-starteam/zip has no requirements.txt, environment.yml, pyproject.toml, or Dockerfile.
· You will need to reverse-engineer dependencies from import statements in the source code.

Open ustc-starteam/zip

Additional implementations

Official

No additional official repositories detected.

Community

USTC-StarTeam/ZIP
Confidence: Medium

arXiv 2024 | ZIP: entropy-law data selection for efficient LLM alignment.

Stars: 28

Last push: Jun 10, 2026

License: MIT

Hugging Face artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.

Models

No trustworthy model matches right now.

Search models on Hugging Face

Datasets

neuralmagic/LLM_compression_calibration

Curated Related

Downloads: 1,713

Likes: 16

Updated: Jun 27, 2024
hkust-nlp/llm-compression

Curated Related

Downloads: 1,769

Likes: 8

Updated: Apr 16, 2024