How reproducible is "GeneZip: Region-Aware Compression for Long Context DNA Modeling"?

Estimated time to first reproduction: a few days. Risk flags: No repository-level reproducibility signals are currently available, Estimate is based on paper-only reproduction flow. No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.

Are there pretrained models available for "GeneZip: Region-Aware Compression for Long Context DNA Modeling"?

Yes, 1 Hugging Face model found. The top result is liufanfanlff/C3-Context-Cascade-Compression with 116 downloads.

GeneZip: Region-Aware Compression for Long Context DNA Modeling

Jianan Zhao, Xixian Liu, Zhihao Zhan, Xinyu Yuan, Hongyu Guo, Jian Tang

Published: Feb 19, 2026

No direct paper-linked artifacts found; showing strongest related artifacts

Evidence: Curated Related

Domain fit: AI-adjacent

Verified repos: 0

Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.

Time to first repro: a few days

2 risk flags

arXiv PDF

Genomic sequences span billions of base pairs (bp), posing a fundamental challenge for genome-scale foundation models. Existing approaches largely sidestep this barrier by either scaling relatively small models to long contexts or relying on heavy multi-GPU parallelism. Here we introduce GeneZip, a DNA compression model that leverages a key biological prior: genomic information is highly imbalanced. Coding regions co ...

Read full abstract

mprise only a small fraction (about 2 percent) yet are information-dense, whereas most non-coding sequence is comparatively information-sparse. GeneZip couples HNet-style dynamic routing with a region-aware compression-ratio objective, enabling adaptive allocation of representation budget across genomic regions. As a result, GeneZip learns region-aware compression and achieves 137.6x compression with only 0.31 perplexity increase. On downstream long-context benchmarks, GeneZip achieves comparable or better performance on contact map prediction, expression quantitative trait loci prediction, and enhancer-target gene prediction. By reducing effective sequence length, GeneZip unlocks simultaneous scaling of context and capacity: compared to the prior state-of-the-art model JanusDNA, it enables training models 82.6x larger at 1M-bp context, supporting a 636M-parameter GeneZip model at 1M-bp context. All experiments in this paper can be trained on a single A100 80GB GPU.

Technical details

Canonical key: arxiv-2602.17739

Cache status: Fresh

Generated at: Jun 20, 2026, 11:54 AM

Artifact coverage: curated_related

HF provider: ok (token)

PWC source used: No

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

context only

Benchmarks: thin evidence

Time to repro: a few days

2 risk flags

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Region-aware Compression Long Context Dna Modeling

Expert Model

AUROC

0.926

Source: paper fulltext

Region-aware Compression Long Context Dna Modeling

CNN

AUROC

0.797

Source: paper fulltext

Benchmark evidence drill-down

2 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Region-aware Compression Long Context Dna Modeling	Expert Model	AUROC	0.926	paper-derived	No explicit refs
Region-aware Compression Long Context Dna Modeling	CNN	AUROC	0.797	paper-derived	No explicit refs

Genomic sequences span billions of base pairs (bp), posing a fundamental challenge for genome-scale foundation models.

Implementation Evidence Summary

Confidence: low

Recommendation evidence is currently too limited for a maintained-repo choice. Use Implementation Status and Reproduction Path for a practical baseline plan.

Reproduction Risks

Estimate is based on paper-only reproduction flow

Hardware Notes

All experiments in this paper can be trained on a single A100 80GB GPU.

Evidence disclosure

Evidence graph: 3 refs, 2 links.

Utility signals: depth 95/100, grounding 78/100, status high.

Implementation Status

No verified maintained repo

There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.

No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.
Track assumptions and missing details in an experiment log before coding.

Time to first repro: a few days

Best available artifact: liufanfanlff/C3-Context-Cascade-Compression