What is the best open-source implementation of "LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?"?

The best maintained implementation is HKUDS/LightReasoner with 599 stars on GitHub. Confidence: medium. Reproducibility: Moderate.

LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

Q: How reproducible is "LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?"?

Estimated time to first reproduction: a few hours. Risk flags: No CI workflows detected. Start with HKUDS/LightReasoner and validate setup instructions in README.

Jingyuan Wang, Yankai Chen, Zhonghang Li, Chao Huang

Published: Oct 9, 2025

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 1

Top repo stars: 599

Core AI workload signals detected from paper context and implementation/artifact evidence.

Time to first repro: a few hours

1 risk flag

arXiv PDF

Large language models (LLMs) have demonstrated remarkable progress in reasoning, often through supervised fine-tuning (SFT). However, SFT is resource-intensive, relying on large curated datasets, rejection-sampled demonstrations, and uniform optimization across all tokens, even though only a fraction carry meaningful learning value. In this work, we explore a counterintuitive idea: can smaller language models (SLMs) ...

Read full abstract

teach larger language models (LLMs) by revealing high-value reasoning moments that reflect the latter's unique strength? We propose LightReasoner, a novel framework that leverages the behavioral divergence between a stronger expert model (LLM) and a weaker amateur model (SLM). LightReasoner operates in two stages: (1) a sampling stage that pinpoints critical reasoning moments and constructs supervision examples capturing the expert's advantage through expert-amateur contrast, and (2) a fine-tuning stage that aligns the expert model with these distilled examples, amplifying its reasoning strengths. Across seven mathematical benchmarks, LightReasoner improves accuracy by up to 28.1%, while reducing time consumption by 90%, sampled problems by 80%, and tuned token usage by 99%, all without relying on ground-truth labels. By turning weaker SLMs into effective teaching signals, LightReasoner offers a scalable and resource-efficient approach for advancing LLM reasoning. Code is available at: https://github.com/HKUDS/LightReasoner

Technical details

Canonical key: arxiv-2510.07962

Cache status: Fresh

Generated at: Jun 18, 2026, 3:35 AM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: No

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few hours

1 risk flag

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Some benchmark signal exists in the extracted evidence, but it is not structured strongly enough yet for a confident benchmark decision.

Large language models (LLMs) have demonstrated remarkable progress in reasoning, often through supervised fine-tuning (SFT).

Use This Implementation Because…

Confidence: medium

HKUDS/LightReasoner is the best available implementation candidate based on ranking signals, but recommendation confidence is not yet high. License is declared (MIT). Dependency/environment manifests are present.

Open HKUDS/LightReasoner

Reproduction Risks

No CI workflows detected

Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 90/100, grounding 85/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

HKUDS/LightReasoner

best maintained

Maintenance: Active

Confidence: Medium

Reproducibility: Moderate

Matched via arXiv identifier search · Strong overlap with paper title keywords

Stars: 599
Last push: May 22, 2026 (27d ago)

Dependencies

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

edward-lcl/feedback-distillation

alternative

Maintenance: Active

Confidence: Low

Reproducibility: Moderate

Matched via arXiv identifier search · Partial overlap with paper title keywords

Stars: 1
Last push: Jun 18, 2026 (0d ago)

CIDependencies

Risk flags

No tagged releases
No Docker setup
Low confidence match

tanzil7890/lightR_change

alternative

Maintenance: Recently updated

Confidence: Low

Reproducibility: Limited

Matched via arXiv identifier search

Stars: 0
Last push: May 3, 2026 (45d ago)

Dependencies

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

Best implementation now

HKUDS/LightReasoner

Confidence: Medium

Reproducibility: Moderate

[ACL 2026 Oral] "LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?"

Stars: 599

Forks: 33

Last push: May 22, 2026

License: MIT

Matched via arXiv identifier search

Strong overlap with paper title keywords

Community adoption signal (599 stars)

License ✓

CI –

Deps ✓

Docker –

Selected HKUDS/LightReasoner as the strongest maintained implementation for new work.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Reproduction readiness

Setup Required

Time to first repro: hours

Last checked: Jun 18, 2026

Dependencies pinned, manual setup needed

· HKUDS/LightReasoner has requirements.txt but requires manual environment setup.
· No Dockerfile — you will set up the environment manually.
· No CI pipeline — test coverage is unknown.

Open HKUDS/LightReasoner

Quick start

git clone https://github.com/HKUDS/LightReasoner.git
pip install -r requirements.txt

Additional implementations

No additional verified repositories beyond the primary recommendation.

Possible but unverified matches (2)

These repositories had low-confidence matching signals and are hidden by default.

edward-lcl/feedback-distillation

Confidence: Low

Stars: 1
tanzil7890/lightR_change

Confidence: Low

Stars: 0

Hugging Face artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Continue with targeted Hugging Face searches derived from the paper title and method context:

Models

arxiv:2510.07962 LightReasoner Natural Language Processing

Datasets

arxiv:2510.07962 LightReasoner dataset

Spaces

arxiv:2510.07962 LightReasoner demo

Tip: start with models, then check datasets/spaces if you need evaluation data or demos.

Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.

Search models Search datasets Search spaces

Research context

Tasks

Natural language processing

Methods

Transformer

Domains

Natural Language Processing, Large Language Models

Evaluation & Human Feedback Data

Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.

Open in HFEPX

Explore Similar Papers

Jump to Paper2Code search queries derived from this paper's research context.

Natural language processing Transformer Natural Language Processing Large Language Models

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.

Post a Job Get a Quote