What is the best open-source implementation of "SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward"?

The best maintained implementation is hiyouga/easyr1 with 5,011 stars on GitHub. Confidence: high. Reproducibility: Strong.

How reproducible is "SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward"?

Estimated time to first reproduction: a few hours. No risk flags identified. Start with hiyouga/easyr1 and validate setup instructions in README.

What framework is used to implement "SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward"?

The primary implementation uses pytorch.

SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward

Published: May 1, 2025

Best maintained implementation now

Evidence: Direct

Domain fit: AI-adjacent

Verified repos: 3

Top repo stars: 5,011

Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.

Framework: pytorch

Time to first repro: a few hours

No risk flags

arXiv PDF

Technical details

Canonical key: arxiv-2505.17018

Cache status: Stale (SWR served)

Generated at: Jun 15, 2026, 2:11 PM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few hours

pytorch

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Reinforcing Mllms Reasoning Thinking Reward

Qwen2.5-VL-3B-Instruct

Overall Accuracy

43.1

Source: paper fulltext

Reinforcing Mllms Reasoning Thinking Reward

GPT-4o-mini

Macro Accuracy.

44.8

Source: paper fulltext

Reinforcing Mllms Reasoning Thinking Reward

Qwen2-VL-72B

Macro Accuracy.

43.0

Source: paper fulltext

Reinforcing Mllms Reasoning Thinking Reward

MathVista

Accuracy

71.3

Source: paper fulltext

Benchmark evidence drill-down

4 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Reinforcing Mllms Reasoning Thinking Reward	Qwen2.5-VL-3B-Instruct	Overall Accuracy	43.1	paper-derived	No explicit refs
Reinforcing Mllms Reasoning Thinking Reward	GPT-4o-mini	Macro Accuracy.	44.8	paper-derived	No explicit refs
Reinforcing Mllms Reasoning Thinking Reward	Qwen2-VL-72B	Macro Accuracy.	43.0	paper-derived	No explicit refs
Reinforcing Mllms Reasoning Thinking Reward	MathVista	Accuracy	71.3	paper-derived	No explicit refs

SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward is the primary contribution described in this paper.

Use This Implementation Because…

Confidence: high

hiyouga/easyr1 is the strongest maintained implementation based on ranking signals. CI workflows are present. License is declared (Apache-2.0).

Open hiyouga/easyr1

Reproduction Risks

No repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.

Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 90/100, grounding 85/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

hiyouga/easyr1

best maintained

Maintenance: Recently updated

Confidence: High

Reproducibility: Strong

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 5,011
Last push: Apr 6, 2026 (73d ago)

CIDockerfileReleasesDependencies

Risk flags

No obvious maintenance or reproducibility risks detected.

kxfan2002/sophiavl-r1

historical official

Maintenance: Stale risk

Confidence: High

Reproducibility: Strong

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 95
Last push: Aug 8, 2025 (314d ago)

CIDockerfileDependencies

Risk flags

No tagged releases

kxfan2002/SophiaVL-R1

alternative

Maintenance: Stale risk

Confidence: Medium

Reproducibility: Strong

Matched via arXiv identifier search · Strong overlap with paper title keywords

Stars: 95
Last push: Aug 8, 2025 (314d ago)

CIDockerfileDependencies

Risk flags

No tagged releases

Best implementation now

hiyouga/easyr1

Confidence: High

Reproducibility: Strong

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Stars: 5,011

Forks: 373

Last push: Apr 6, 2026

License: Apache-2.0

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Community adoption signal (5011 stars)

License ✓

CI ✓

Deps ✓

Docker ✓

Selected hiyouga/easyr1 as the strongest maintained implementation for new work.
Includes CI workflow signals.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Historical official implementation

Preserved for provenance. Not recommended as the default path for new builds.

kxfan2002/sophiavl-r1

Stars: 95

Last push: Aug 8, 2025

Reproduction readiness

Ready to Run

Time to first repro: hours

Last checked: Jun 15, 2026

Ready to reproduce

· Clone hiyouga/easyr1 and install dependencies from pyproject.toml.
· Dockerfile available for containerized reproduction.
· CI pipeline detected — automated tests are in place.
· Last updated 73 days ago.

Open hiyouga/easyr1

Quick start

git clone https://github.com/hiyouga/easyr1.git
pip install -e .

Additional implementations

Official

No additional official repositories detected.

Community

kxfan2002/SophiaVL-R1
Confidence: Medium

SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward

Stars: 95

Last push: Aug 8, 2025

License: Apache-2.0