VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning

Q: What is the best open-source implementation of "VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning"?

The best maintained implementation is hiyouga/easyr1 with 4,896 stars on GitHub. Confidence: high. Reproducibility: Strong.

Q: How reproducible is "VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning"?

Estimated time to first reproduction: a few hours. No risk flags identified. Start with hiyouga/easyr1 and validate setup instructions in README.

Q: What framework is used to implement "VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning"?

The primary implementation uses pytorch.

Yuqi Liu, Tianyuan Qu, Zhisheng Zhong, Bohao Peng, Shu Liu, Bei Yu, Jiaya Jia

Published: May 17, 2025

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 4

Top repo stars: 4,896

Core AI workload signals detected from paper context and implementation/artifact evidence.

Framework: pytorch

Time to first repro: a few hours

No risk flags

arXiv PDF

Large vision-language models exhibit inherent capabilities to handle diverse visual perception tasks. In this paper, we introduce VisionReasoner, a unified framework capable of reasoning and solving multiple visual perception tasks within a shared model. Specifically, by designing a unified reward mechanism and multi-object cognitive learning strategies, VisionReasoner enhances its reasoning capabilities to analyze v ...

Read full abstract

isual inputs, and addresses diverse perception tasks within a unified model. VisionReasoner generates a structured reasoning process before delivering the desired outputs responding to user queries. Human evaluation reveals the reasoning process of VisionReasoner is faithful and reliable even without annotated reasoning train data. To rigorously assess unified visual perception capabilities, we evaluate VisionReasoner on ten diverse tasks spanning three critical domains: detection, segmentation, and counting. Experimental results show that VisionReasoner achieves superior performance as a unified model, outperforming the baseline Qwen2.5VL by relative margins of 29.1\% on COCO (detection), 22.1\% on ReasonSeg (segmentation), and 13.2\% on CountBench (counting).

Technical details

Canonical key: arxiv-2505.12081

Cache status: Stale (SWR served)

Generated at: Apr 29, 2026, 12:06 AM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few hours

pytorch

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Classification

Qwen2.5-1.5B

Accuracy.

46.3

Source: paper fulltext

Benchmark evidence drill-down

1 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Classification	Qwen2.5-1.5B	Accuracy.	46.3	paper-derived	No explicit refs

Large vision-language models exhibit inherent capabilities to handle diverse visual perception tasks.

Use This Implementation Because…

Confidence: high

hiyouga/easyr1 is the strongest maintained implementation based on ranking signals. CI workflows are present. License is declared (Apache-2.0).

Open hiyouga/easyr1

Reproduction Risks

No repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.

Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 90/100, grounding 85/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

hiyouga/easyr1

best maintained

Maintenance: Active

Confidence: High

Reproducibility: Strong

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 4,896
Last push: Apr 6, 2026 (24d ago)

CIDockerfileReleasesDependencies

Risk flags

No obvious maintenance or reproducibility risks detected.

dvlab-research/VisionReasoner

historical official

Maintenance: Recently updated

Confidence: High

Reproducibility: Moderate

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 336
Last push: Feb 9, 2026 (80d ago)

Dependencies

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

dvlab-research/Seg-Zero

alternative

Maintenance: Recently updated

Confidence: High

Reproducibility: Moderate

Official implementation from Papers with Code · Strong overlap with paper title keywords

Stars: 624
Last push: Jan 17, 2026 (103d ago)

DockerfileDependencies

Risk flags

No CI pipeline detected
No tagged releases

Best implementation now

hiyouga/easyr1

Confidence: High

Reproducibility: Strong

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Stars: 4,896

Forks: 371

Last push: Apr 6, 2026

License: Apache-2.0

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Community adoption signal (4896 stars)

License ✓

CI ✓

Deps ✓

Docker ✓

Selected hiyouga/easyr1 as the strongest maintained implementation for new work.
Includes CI workflow signals.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Historical official implementation

Preserved for provenance. Not recommended as the default path for new builds.

dvlab-research/VisionReasoner

Stars: 336

Last push: Feb 9, 2026

Reproduction readiness

Ready to Run

Time to first repro: hours

Last checked: Apr 29, 2026

Ready to reproduce

· Clone hiyouga/easyr1 and install dependencies from pyproject.toml.
· Dockerfile available for containerized reproduction.
· CI pipeline detected — automated tests are in place.
· Last updated 24 days ago.

Open hiyouga/easyr1

Quick start

git clone https://github.com/hiyouga/easyr1.git
pip install -e .

Additional implementations

Official

dvlab-research/Seg-Zero
Confidence: High

Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"

Stars: 624

Forks: 30

Last push: Jan 17, 2026

License: Apache-2.0

Community

JIA-Lab-research/Seg-Zero
Confidence: Medium

Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"

Stars: 624

Last push: Jan 17, 2026

License: Apache-2.0