Official implementation from Papers with Code · Strong overlap with paper title keywords
- Stars
- 1,191
- Last push
- Sep 30, 2025 (158d ago)
Risk flags
- No Docker setup
- Dependency manifest missing
Core AI workload signals detected from paper context and implementation/artifact evidence.
Researcher verdict
This page has evidence-backed benchmark findings and a concrete implementation recommendation anchored on microsoft/MInference. Use it as an implementation baseline, then validate benchmark parity before adapting it.
Why this page is still worth reading
Benchmark trust
Some benchmark signal exists in the extracted evidence, but it is not structured strongly enough yet for a confident benchmark decision.
Use this page as
Start here when you need the most practical implementation path quickly.
Some benchmark signal exists in the extracted evidence, but it is not structured strongly enough yet for a confident benchmark decision.
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention presents a transformer method.
microsoft/MInference is the strongest maintained implementation based on ranking signals. CI workflows are present. License is declared (MIT).
Open microsoft/MInferenceHardware Notes
Expect multi-day setup/compute for meaningful reproduction based on current guidance.
LLM evidence refs: paper.abstract, evidencePack.paperSections[id=paper_18], evidencePack.paperSections[id=paper_12], evidencePack.paperSections[id=paper_14], evidencePack.paperSections[id=paper_22], evidencePack.paperSections[id=paper_30], guidance.riskFlags[0], repos[0].fullName, evidencePack.paperSections[id=paper_20], evidencePack.paperSections[id=paper_46], researcherSummary.benchmarkSnapshot[0], evidencePack.repoSources[repoUrl=https://github.com/microsoft/MInference].sections[id=readme_5], researcherSummary.hardwareNotes[0], researcherSummary.timeToFirstMeaningfulRun, paper.title, summary.hasReliableImplementation
Evidence graph: 3 refs, 3 links.
Utility signals: depth 60/100, grounding 75/100, status medium.
Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.
Official implementation from Papers with Code · Strong overlap with paper title keywords
Risk flags
Community adoption signal (5891 stars)
Risk flags
Community adoption signal (147 stars)
Risk flags
AI-generated summary grounded in paper metadata and artifact signals.
MInference 1.0 introduces a dynamic sparse attention mechanism to accelerate pre-filling for long-context transformer-based large language models while maintaining accuracy close to full attention. This page includes benchmark evidence for Language modeling on PG-19. Reproduction guidance focuses on implementation viability and concrete risk controls.
Use microsoft/MInference first because deterministic ranking and extracted evidence align on implementation viability. Start with the repo setup path, then validate benchmark reproduction before adaptation.
[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
Follow the direct implementation path
Start with microsoft/MInference and validate setup instructions in README.
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
Log exact dependency versions and runtime environment for reproducibility.
No additional verified repositories beyond the primary recommendation.
These repositories had low-confidence matching signals and are hidden by default.
No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.
No trustworthy model matches right now.
Search models on Hugging FaceBroaden dataset search
Tasks
None detected
Methods
Transformer
Domains
None detected
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.