Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Q: How reproducible is "Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization"?

Estimated time to first reproduction: a few days. Risk flags: Adjacent implementations are not paper-verified. No maintained paper-verified implementation was found; start with the closest related repositories below.

Zhenpeng Su, Leiyu Pan, Xue Bai, Dening Liu, Guanting Dong, Jiaming Huang, Minxuan Lv, Wenping Hu, Fuzheng Zhang, Kun Gai, Guorui Zhou

Published: Aug 11, 2025

No direct implementation yet

Evidence: Adjacent

Domain fit: AI-adjacent

Verified repos: 1

Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.

Time to first repro: a few days

1 risk flag

arXiv PDF

We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. Although there are already many excellent works related to inference models in the current community, there are still many problems with reproducing high-performance inference models due to incomplete disclosure of training det ...

Read full abstract

ails. This report provides an in-depth analysis of the reasoning model, covering the entire post-training workflow from data preparation and long Chain-of-Thought supervised fine-tuning (long CoT SFT) to reinforcement learning (RL), along with detailed ablation studies for each experimental component. For SFT data, our experiments show that a small number of high-quality data sources are more effective than a large number of diverse data sources, and that difficult samples can achieve better results without accuracy filtering. In addition, we investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose Gradient-Preserving clipping Policy Optimization (GPPO) that gently backpropagates gradients from clipped tokens. GPPO not only enhances the model's exploration capacity but also improves its efficiency in learning from negative samples. Klear-Reasoner exhibits exceptional reasoning abilities in mathematics and programming, scoring 90.5% on AIME 2024, 83.2% on AIME 2025, 66.0% on LiveCodeBench V5 and 58.1% on LiveCodeBench V6.

Technical details

Canonical key: arxiv-2508.07629

Cache status: Stale (SWR served)

Generated at: May 30, 2026, 11:09 PM

Artifact coverage: sparse

HF provider: ok (token)

PWC source used: No

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

context only

Benchmarks: thin evidence

Time to repro: a few days

1 risk flag

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Reinforcement learning

AIME 2024

Top-1 Accuracy

40.83

Source: paper fulltext

Reinforcement learning

AIME 2025

Top-1 Accuracy

36.04

Source: paper fulltext

Reinforcement learning

LiveCodeBench V5

Top-1 Accuracy

26.52

Source: paper fulltext

Benchmark evidence drill-down

3 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Reinforcement learning	AIME 2024	Top-1 Accuracy	40.83	paper-derived	No explicit refs
Reinforcement learning	AIME 2025	Top-1 Accuracy	36.04	paper-derived	No explicit refs
Reinforcement learning	LiveCodeBench V5	Top-1 Accuracy	26.52	paper-derived	No explicit refs

We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks.

Implementation Evidence Summary

Confidence: medium

TsinghuaC3I/Awesome-RL-for-LRMs is the closest maintained adjacent implementation (Matches contextual method/domain keyword: reinforcement learning). It is not paper-verified; validate algorithm and evaluation setup against the paper before trusting reported metrics. Community adoption signal: 2459 GitHub stars.

Reproduction Risks

Adjacent implementations are not paper-verified
Recommended repository is adjacent and not paper-verified.

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 100/100, grounding 85/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

suu990901/KlearReasoner

alternative

Maintenance: Recently updated

Confidence: Medium

Reproducibility: Moderate

Matched via arXiv identifier search · Strong overlap with paper title keywords

Stars: 82
Last push: Dec 25, 2025 (177d ago)

Dependencies

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

Kwai-Klear/ERC

alternative

Maintenance: Recently updated

Confidence: Low

Reproducibility: Moderate

Matched via arXiv identifier search

Stars: 4
Last push: Jan 5, 2026 (166d ago)

Dependencies

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

cameraMeasurementTech/66-training

alternative

Maintenance: Recently updated

Confidence: Low

Reproducibility: Moderate

Matched via arXiv identifier search

Stars: 0
Last push: Apr 6, 2026 (75d ago)

Dependencies

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

Implementation Status

No verified maintained repo

There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.

No maintained paper-verified implementation was found; start with the closest related repositories below.
Compare repo methods against the paper equations/algorithm before trusting metrics.
Create a minimal baseline implementation from the paper and use adjacent repos as references.

Time to first repro: a few days

Reproduction readiness

No Repo

Time to first repro: days

Last checked: May 30, 2026

Hardware requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

No verified implementation available

· No maintained repository has been identified for this paper. Check adjacent implementations or HF artifacts below.

Closest related implementations

These are not paper-verified. Use them as reference points when no direct implementation is available.

TsinghuaC3I/Awesome-RL-for-LRMs

Adjacent

Confidence: Medium

Stars: 2,459

Matches contextual method/domain keyword: reinforcement learning

Additional implementations

Official

No additional official repositories detected.

Community

suu990901/KlearReasoner
Confidence: Medium

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Stars: 82

Last push: Dec 25, 2025

License: Apache-2.0

Possible but unverified matches (2)

These repositories had low-confidence matching signals and are hidden by default.

Kwai-Klear/ERC

Confidence: Low

Stars: 4
cameraMeasurementTech/66-training

Confidence: Low

Stars: 0

Hugging Face artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Continue with targeted Hugging Face searches derived from the paper title and method context:

Models

arxiv:2508.07629 Klear-Reasoner Gradient-Preserving

Datasets

arxiv:2508.07629 Klear-Reasoner dataset Reinforcement learning benchmark

Spaces

arxiv:2508.07629 Klear-Reasoner demo Reinforcement learning gradio

Tip: start with models, then check datasets/spaces if you need evaluation data or demos.

Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.

Search models Search datasets Search spaces

Research context

Tasks

Reinforcement learning

Methods

Reinforcement learning

Domains

None detected

Evaluation & Human Feedback Data

Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.

Open in HFEPX

Explore Similar Papers

Jump to Paper2Code search queries derived from this paper's research context.

Reinforcement learning

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.

Post a Job Get a Quote