Are there pretrained models available for "NPG-Muse: Scaling Long Chain-of-Thought Reasoning with NP-Hard Graph Problems"?

Yes, 3 Hugging Face models found. The top result is Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled with 588,732 downloads.

NPG-Muse: Scaling Long Chain-of-Thought Reasoning with NP-Hard Graph Problems

Q: How reproducible is "NPG-Muse: Scaling Long Chain-of-Thought Reasoning with NP-Hard Graph Problems"?

Estimated time to first reproduction: a few days. Risk flags: No repository-level reproducibility signals are currently available, Estimate is based on paper-only reproduction flow. No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.

Yuyao Wang, Bowen Liu, Jianheng Tang, Nuo Chen, Yuhan Li, Qifan Zhang, Chenyi Zi, Chen Zhang, Jia Li

Published: Aug 28, 2025

No direct paper-linked artifacts found; showing strongest related artifacts

Evidence: Curated Related

Domain fit: AI-core

Verified repos: 0

Core AI workload signals detected from paper context and implementation/artifact evidence.

Time to first repro: a few days

2 risk flags

arXiv PDF

Reasoning Large Language Models (RLLMs) have recently achieved remarkable progress on complex reasoning tasks, largely enabled by their long chain-of-thought (Long CoT) capabilities. However, developing these Long CoT behaviors relies heavily on post-training with high-quality datasets, which are typically costly and human-curated (e.g., mathematics and code), leaving scalable alternatives unexplored. In this work, w ...

Read full abstract

e introduce NP-hard (NPH) graph problems as a novel synthetic training corpus, as they inherently require deep reasoning, extensive exploration, and reflective strategies, which are the core characteristics of Long CoT reasoning. Building on this insight, we develop a two-stage post-training framework: (i) Long-CoT Supervised Fine-Tuning (SFT) on rejection-sampled NPH graph instances, which substantially enhances reasoning depth, and (ii) Reinforcement Learning (RL) with a fine-grained reward design, which sharpens reasoning efficiency. The resulting NPG-Muse-series models exhibit substantially enhanced Long CoT reasoning capabilities, achieving consistent gains across mathematics, coding, logical, and graph reasoning benchmarks. NPG-Muse-7B even surpasses QwQ-32B on NPH graph problems in both accuracy and reasoning efficiency. These results position NPH graph problems as an effective and scalable resource for advancing Long CoT reasoning in LLM post-training. Our implementation is available at https://github.com/littlewyy/NPG-Muse.

Technical details

Canonical key: arxiv-2508.20373

Cache status: Fresh

Generated at: Apr 17, 2026, 7:26 AM

Artifact coverage: curated_related

HF provider: ok (token)

PWC source used: No

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

context only

Benchmarks: thin evidence

Time to repro: a few days

2 risk flags

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Reinforcement learning

GSM8K

Accuracy

Source: paper fulltext

Reinforcement learning

MATH

Accuracy

Source: paper fulltext

Benchmark evidence drill-down

2 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Reinforcement learning	GSM8K	Accuracy	24	paper-derived	No explicit refs
Reinforcement learning	MATH	Accuracy	24	paper-derived	No explicit refs

Reasoning Large Language Models (RLLMs) have recently achieved remarkable progress on complex reasoning tasks, largely enabled by their long chain-of-thought (Long CoT) capabilities.

Implementation Evidence Summary

Confidence: low

Recommendation evidence is currently too limited for a maintained-repo choice. Use Implementation Status and Reproduction Path for a practical baseline plan.

Reproduction Risks

Estimate is based on paper-only reproduction flow

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 3 refs, 2 links.

Utility signals: depth 95/100, grounding 78/100, status high.

Implementation Comparison

Top 1 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

Graph-Reasoner/Graph-R1

alternative

Maintenance: Stale risk

Confidence: Low

Reproducibility: Limited

Matched via arXiv identifier search

Stars: 10
Last push: Aug 29, 2025 (231d ago)

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

Implementation Status

No verified maintained repo

There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.

No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.
Start from this likely method family: Reinforcement learning.
Track assumptions and missing details in an experiment log before coding.

Time to first repro: a few days

Best available artifact: Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled