Skip to content
implementation starting point
Benchmarks: thin evidence
Time to repro: a few hours

Results & Benchmarks

Freshness tier: hot
Direct + Inferred Evidence
Reasoning / puzzle solving
GRPO
AIME24
0.7250
Source: paper fulltext
Reasoning / puzzle solving
REINFORCE++
AIME24
0.7664
Source: paper fulltext

Benchmark evidence drill-down

2 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task Dataset Metric Value Source Evidence refs
Reasoning / puzzle solving GRPO AIME24 0.7250 paper-derived No explicit refs
Reasoning / puzzle solving REINFORCE++ AIME24 0.7664 paper-derived No explicit refs

Policy gradient algorithms have been successfully applied to enhance the reasoning capabilities of large language models (LLMs).

Use This Implementation Because…

Confidence: medium

complex-reasoning/RPG is the best available implementation candidate based on ranking signals, but recommendation confidence is not yet high. CI workflows are present. License is declared (MIT).

Open complex-reasoning/RPG

Reproduction Risks

  • No repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.
Evidence disclosure

Evidence graph: 4 refs, 4 links.

Utility signals: depth 90/100, grounding 95/100, status high.

Implementation Comparison

Top 2 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

complex-reasoning/RPG
best maintained
Maintenance: Active
Confidence: Medium
Reproducibility: Strong

Matched via arXiv identifier search · Strong overlap with paper title keywords

Stars
65
Last push
Mar 30, 2026 (16d ago)
CIDependencies

Risk flags

  • No tagged releases
  • No Docker setup
Maintenance: Recently updated
Confidence: Low
Reproducibility: Limited

Matched via arXiv identifier search

Stars
11
Last push
Jan 6, 2026 (99d ago)

Risk flags

  • No CI pipeline detected
  • No tagged releases
  • No Docker setup

Best implementation now

complex-reasoning/RPG
Confidence: Medium
Reproducibility: Strong

[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)

Stars: 65
Forks: 3
Last push: Mar 30, 2026
License: MIT
Matched via arXiv identifier search
Strong overlap with paper title keywords
Community adoption signal (65 stars)
License ✓
CI ✓
Deps ✓
Docker –
  • Selected complex-reasoning/RPG as the strongest maintained implementation for new work.
  • Includes CI workflow signals.
  • Includes dependency/environment manifest signals.
  • Repository activity is within the last 24 months.

Reproduction readiness

Ready to Run
Time to first repro: hours
Last checked: Apr 15, 2026

Ready to reproduce

  • · Clone complex-reasoning/RPG and install dependencies from pyproject.toml.
  • · CI pipeline detected — automated tests are in place.
  • · Last updated 16 days ago.
Open complex-reasoning/RPG

Quick start

git clone https://github.com/complex-reasoning/RPG.git
pip install -e .

Additional implementations

No additional verified repositories beyond the primary recommendation.

These repositories had low-confidence matching signals and are hidden by default.

Hugging Face artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.

Research context

Tasks

Reasoning / puzzle solving

Methods

Transformer

Domains

Natural Language Processing, Large Language Models

Evaluation & Human Feedback Data

Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.

Open in HFEPX

Explore Similar Papers

Jump to Paper2Code search queries derived from this paper's research context.

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.