What is the best open-source implementation of "VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models"?

The best maintained implementation is WeiboAI/VibeThinker with 1,044 stars on GitHub. Confidence: medium. Reproducibility: Limited.

Are there pretrained models available for "VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models"?

Yes, 1 Hugging Face model found. The top result is WeiboAI/VibeThinker-3B with 0 downloads.

VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models

Q: How reproducible is "VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models"?

Estimated time to first reproduction: a few days. Risk flags: No CI workflows detected, Dependency manifest is missing. Start with WeiboAI/VibeThinker and validate setup instructions in README.

Sen Xu, Shixi Liu, Wei Wang, Jixin Min, Yingwei Dai, Zhibin Yin, Yirong Chen, Xin Zhou, Junlin Zhang

Published: Jun 15, 2026

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 1

Top repo stars: 1,044

Core AI workload signals detected from paper context and implementation/artifact evidence.

Time to first repro: a few days

2 risk flags

arXiv PDF

This technical report introduces VibeThinker-3B, a compact dense model with 3B parameters developed to investigate how far verifiable reasoning can be pushed within a strictly small-model regime. Building upon the Spectrum-to-Signal post-training paradigm, we systematically enhance the model through an optimized pipeline that includes curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and o ...

Read full abstract

ffline self-distillation. Experimental evaluations demonstrate that VibeThinker-3B achieves frontier-level performance on highly demanding verifiable tasks. Specifically, it attains a score of 94.3 on AIME26 (improving to 97.1 with claim-level test-time scaling), an 80.2 Pass@1 on LiveCodeBench v6, and exhibits strong out-of-distribution generalization with a 96.1\% acceptance rate on recent unseen LeetCode contests. This effectively places it in the performance band of first-tier reasoning systems, matching or exceeding flagship models that are orders of magnitude larger, such as DeepSeek V3.2, GLM-5, and Gemini 3 Pro. Furthermore, a score of 93.4 on IFEval confirms that this extreme reasoning enhancement does not compromise strict instruction controllability. Extending our previous 1.5B work, these findings motivate the Parametric Compression-Coverage Hypothesis, which views verifiable reasoning as compressible into compact reasoning cores, while open-domain knowledge and general-purpose competence require broad parameter coverage over facts, concepts, and long-tail scenarios. This perspective suggests that compact models are not merely deployment-efficient substitutes, but a complementary path toward frontier-level performance in parameter-dense capability regimes.

Technical details

Canonical key: arxiv-2606.16140

Cache status: Stale (SWR served)

Generated at: Jun 19, 2026, 8:39 AM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: No

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few days

2 risk flags

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Some benchmark signal exists in the extracted evidence, but it is not structured strongly enough yet for a confident benchmark decision.

This technical report introduces VibeThinker-3B, a compact dense model with 3B parameters developed to investigate how far verifiable reasoning can be pushed within a strictly small-model regime.

Use This Implementation Because…

Confidence: medium

WeiboAI/VibeThinker is the best available implementation candidate based on ranking signals, but recommendation confidence is not yet high. License is declared (MIT).

Open WeiboAI/VibeThinker

Reproduction Risks

No CI workflows detected
Dependency manifest is missing

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 4 refs, 4 links.

Utility signals: depth 100/100, grounding 95/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

WeiboAI/VibeThinker

best maintained

Maintenance: Active

Confidence: Medium

Reproducibility: Limited

Matched via arXiv identifier search · Partial overlap with paper title keywords

Stars: 1,044
Last push: Jun 17, 2026 (3d ago)

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

NickDee96/ASR-TTS-paper-daily

alternative

Maintenance: Active

Confidence: Low

Reproducibility: Strong

Matched via arXiv identifier search · Strong overlap with paper title keywords

Stars: 3
Last push: Jun 19, 2026 (2d ago)

CIDependencies

Risk flags

No tagged releases
No Docker setup
Low confidence match

alphaXiv/vibethinker-eb715d63

alternative

Maintenance: Active

Confidence: Low

Reproducibility: Limited

Matched via arXiv identifier search

Stars: 0
Last push: Jun 16, 2026 (4d ago)

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

Best implementation now

WeiboAI/VibeThinker

Confidence: Medium

Reproducibility: Limited

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Stars: 1,044

Forks: 77

Last push: Jun 17, 2026

License: MIT

Matched via arXiv identifier search

Partial overlap with paper title keywords

Community adoption signal (1044 stars)

License ✓

CI –

Deps –

Docker –

Selected WeiboAI/VibeThinker as the strongest maintained implementation for new work.
Repository activity is within the last 24 months.

Reproduction readiness

Major Work

Time to first repro: days

Last checked: Jun 19, 2026

Hardware requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

No dependency manifest — manual reconstruction required

· WeiboAI/VibeThinker has no requirements.txt, environment.yml, pyproject.toml, or Dockerfile.
· You will need to reverse-engineer dependencies from import statements in the source code.

Open WeiboAI/VibeThinker