What is the best open-source implementation of "DeepEyesV2: Toward Agentic Multimodal Model"?

The best maintained implementation is Visual-Agent/DeepEyesV2 with 594 stars on GitHub. Confidence: medium. Reproducibility: Limited.

How reproducible is "DeepEyesV2: Toward Agentic Multimodal Model"?

Estimated time to first reproduction: a few days. Risk flags: No CI workflows detected, Dependency manifest is missing. Start with Visual-Agent/DeepEyesV2 and validate setup instructions in README.

Are there pretrained models available for "DeepEyesV2: Toward Agentic Multimodal Model"?

Yes, 3 Hugging Face models found. The top result is microsoft/Phi-4-multimodal-instruct with 391,628 downloads.

DeepEyesV2: Toward Agentic Multimodal Model

Jack Hong, Chenxiao Zhao, ChengLin Zhu, Weiheng Lu, Guohai Xu, Xing Yu

Published: Nov 7, 2025

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 1

Top repo stars: 594

Core AI workload signals detected from paper context and implementation/artifact evidence.

Time to first repro: a few days

2 risk flags

arXiv PDF

Agentic multimodal models should not only comprehend text and images, but also actively invoke external tools, such as code execution environments and web search, and integrate these operations into reasoning. In this work, we introduce DeepEyesV2 and explore how to build an agentic multimodal model from the perspectives of data construction, training methods, and model evaluation. We observe that direct reinforcemen ...

Read full abstract

t learning alone fails to induce robust tool-use behavior. This phenomenon motivates a two-stage training pipeline: a cold-start stage to establish tool-use patterns, and reinforcement learning stage to further refine tool invocation. We curate a diverse, moderately challenging training dataset, specifically including examples where tool use is beneficial. We further introduce RealX-Bench, a comprehensive benchmark designed to evaluate real-world multimodal reasoning, which inherently requires the integration of multiple capabilities, including perception, search, and reasoning. We evaluate DeepEyesV2 on RealX-Bench and other representative benchmarks, demonstrating its effectiveness across real-world understanding, mathematical reasoning, and search-intensive tasks. Moreover, DeepEyesV2 exhibits task-adaptive tool invocation, tending to use image operations for perception tasks and numerical computations for reasoning tasks. Reinforcement learning further enables complex tool combinations and allows model to selectively invoke tools based on context. We hope our study can provide guidance for community in developing agentic multimodal models.

Technical details

Canonical key: arxiv-2511.05271

Cache status: Fresh

Generated at: May 12, 2026, 3:29 AM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: No

LLM status: ready

LLM model: openai/gpt-5.1-20251113

LLM generated: May 7, 2026, 5:26 AM

LLM content type: sparse_repro_blueprint

HF policy: hf-relevance-v27

LLM evidence refs: paper.abstract, evidencePack.paperSections[id=paper_13], evidencePack.paperSections[id=paper_17], researcherSummary.reproductionRisks, guidance.riskFlags, paper.title, summary.hasReliableImplementation

implementation starting point

Benchmarks: thin evidence

Time to repro: a few days

2 risk flags

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Some benchmark signal exists in the extracted evidence, but it is not structured strongly enough yet for a confident benchmark decision.

Use This Implementation Because…

Confidence: medium

Visual-Agent/DeepEyesV2 is the best available implementation candidate based on ranking signals, but recommendation confidence is not yet high. License is declared (Apache-2.0).

Open Visual-Agent/DeepEyesV2

Reproduction Risks

No CI workflows detected
Dependency manifest is missing

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 4 refs, 4 links.

Utility signals: depth 65/100, grounding 85/100, status medium.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

Visual-Agent/DeepEyes

alternative

Maintenance: Recently updated

Confidence: Low

Reproducibility: Strong

Matched via arXiv identifier search · Community adoption signal (1209 stars)

Stars: 1,209
Last push: Nov 20, 2025 (173d ago)

CIDependencies

Risk flags

No tagged releases
No Docker setup
Low confidence match

Visual-Agent/DeepEyesV2

best maintained

Maintenance: Recently updated

Confidence: Medium

Reproducibility: Limited

Matched via arXiv identifier search · Partial overlap with paper title keywords

Stars: 594
Last push: Feb 26, 2026 (75d ago)

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

jiangyigithub/deepeyes-main

alternative

Maintenance: Active

Confidence: Low

Reproducibility: Strong

Matched via arXiv identifier search

Stars: 0
Last push: Apr 13, 2026 (29d ago)

CIDependencies

Risk flags

No tagged releases
No Docker setup
Low confidence match

Best implementation now

Visual-Agent/DeepEyesV2

Confidence: Medium

Reproducibility: Limited

Visual-Agent/DeepEyesV2

Stars: 594

Forks: 55

Last push: Feb 26, 2026

License: Apache-2.0

Matched via arXiv identifier search

Partial overlap with paper title keywords

Community adoption signal (594 stars)

License ✓

CI –

Deps –

Docker –

Selected Visual-Agent/DeepEyesV2 as the strongest maintained implementation for new work.
Repository activity is within the last 24 months.

Reproduction readiness

Major Work

Time to first repro: days

Last checked: May 12, 2026

Hardware requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

No dependency manifest — manual reconstruction required

· Visual-Agent/DeepEyesV2 has no requirements.txt, environment.yml, pyproject.toml, or Dockerfile.
· You will need to reverse-engineer dependencies from import statements in the source code.

Open Visual-Agent/DeepEyesV2