What is the best open-source implementation of "Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution"?

The best maintained implementation is qwenlm/qwen2-vl with 19,425 stars on GitHub. Confidence: high. Reproducibility: Limited.

Are there pretrained models available for "Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution"?

Yes, 1 Hugging Face model found. The top result is Qwen/Qwen2.5-VL-7B-Instruct with 6,884,518 downloads.

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Q: How reproducible is "Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution"?

Estimated time to first reproduction: a few days. Risk flags: No CI workflows detected, Dependency manifest is missing. Start with qwenlm/qwen2-vl and validate setup instructions in README.

Q: What framework is used to implement "Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution"?

The primary implementation uses pytorch.

Published: Sep 1, 2024

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 2

Top repo stars: 19,425

Core AI workload signals detected from paper context and implementation/artifact evidence.

Framework: pytorch

Time to first repro: a few days

2 risk flags

arXiv PDF

Technical details

Canonical key: arxiv-2409.12191

Cache status: Fresh

Generated at: Jun 20, 2026, 12:41 PM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few days

2 risk flags

pytorch

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Computer vision

MMMU val

GPT-4o

69.1

Source: paper fulltext

Computer vision

GPT-4o

Japanese

88.3

Source: paper fulltext

Computer vision

Qwen2-VL-72B

Japanese

93.4

Source: paper fulltext

Computer vision

MVBench

Previous SoTA

69.6

Source: paper fulltext

Computer vision

PerceptionTest test

Previous SoTA

66.9

Source: paper fulltext

Benchmark evidence drill-down

5 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Computer vision	MMMU val	GPT-4o	69.1	paper-derived	No explicit refs
Computer vision	GPT-4o	Japanese	88.3	paper-derived	No explicit refs
Computer vision	Qwen2-VL-72B	Japanese	93.4	paper-derived	No explicit refs
Computer vision	MVBench	Previous SoTA	69.6	paper-derived	No explicit refs
Computer vision	PerceptionTest test	Previous SoTA	66.9	paper-derived	No explicit refs

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution is the primary contribution described in this paper.

Use This Implementation Because…

Confidence: high

qwenlm/qwen2-vl is the strongest maintained implementation based on ranking signals. License is declared (Apache-2.0).

Open qwenlm/qwen2-vl

Reproduction Risks

No CI workflows detected
Dependency manifest is missing

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 4 refs, 4 links.

Utility signals: depth 100/100, grounding 95/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

qwenlm/qwen2-vl

best maintained

Maintenance: Recently updated

Confidence: High

Reproducibility: Limited

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 19,425
Last push: Jan 30, 2026 (142d ago)

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

qwenlm/qwen2.5-vl

alternative

Maintenance: Recently updated

Confidence: Low

Reproducibility: Limited

Strong overlap with paper title keywords · Community adoption signal (19424 stars)

Stars: 19,424
Last push: Jan 30, 2026 (142d ago)

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

QwenLM/Qwen3-VL

alternative

Maintenance: Recently updated

Confidence: Medium

Reproducibility: Limited

Matched via arXiv identifier search · Strong overlap with paper title keywords

Stars: 19,424
Last push: Jan 30, 2026 (142d ago)

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

Best implementation now

qwenlm/qwen2-vl

Confidence: High

Reproducibility: Limited

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Stars: 19,425

Forks: 1,796

Last push: Jan 30, 2026

License: Apache-2.0

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Strong overlap with paper title keywords

Community adoption signal (19425 stars)

License ✓

CI –

Deps –

Docker –

Selected qwenlm/qwen2-vl as the strongest maintained implementation for new work.
Repository activity is within the last 24 months.

Reproduction readiness

Major Work

Time to first repro: days

Last checked: Jun 20, 2026

Hardware requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

No dependency manifest — manual reconstruction required

· qwenlm/qwen2-vl has no requirements.txt, environment.yml, pyproject.toml, or Dockerfile.
· You will need to reverse-engineer dependencies from import statements in the source code.

Open qwenlm/qwen2-vl

Additional implementations

Official

No additional official repositories detected.

Community

QwenLM/Qwen3-VL
Confidence: Medium

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Stars: 19,424

Last push: Jan 30, 2026

License: Apache-2.0