Maintained implementation availablepytorchPretrained Models Available

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

Wenxuan Huang, Bohan Jia, Zijie Zhai, Shaosheng Cao, Zheyu Ye +5 more

March 9, 2025arXiv: 2503.06749

1 repo1,131 stars~a few hours to reproduce

Abstract

DeepSeek-R1-Zero has successfully demonstrated the emergence of reasoning capabilities in LLMs purely through Reinforcement Learning (RL). Inspired by this breakthrough, we explore how RL can be utilized to enhance the reasoning capability of MLLMs. However, direct training with RL struggles to activate complex reasoning capabilities such as questioning and reflection in MLLMs, due to the absence of substantial high-...

Best Implementation

osilly/vision-r1

[ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reasoning capability.

1.1k 25 Mar 2026

License –

CI –

Deps ✓

Docker –

Selected osilly/vision-r1 as the strongest maintained implementation for new work.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Reproduction Path

1
Start with osilly/vision-r1 and validate setup instructions in README.
2
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
3
Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few hoursLicense metadata missingNo CI workflows detected

Additional Implementations

No additional verified repositories beyond the primary recommendation.

Hugging Face Artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts.

Curated Related

microsoft/Phi-4-reasoning-vision-15B
27.6k 158

Research Context