What is the best open-source implementation of "VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs"?

The best maintained implementation is damo-nlp-sg/videollama2 with 1,300 stars on GitHub. Confidence: high. Reproducibility: Moderate.

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Q: How reproducible is "VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs"?

Estimated time to first reproduction: a few hours. Risk flags: No CI workflows detected. Start with damo-nlp-sg/videollama2 and validate setup instructions in README.

Q: What framework is used to implement "VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs"?

The primary implementation uses pytorch.

Published: Jun 1, 2024

Best maintained implementation now

Evidence: Direct

Domain fit: AI-adjacent

Verified repos: 2

Top repo stars: 1,300

Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.

Framework: pytorch

Time to first repro: a few hours

1 risk flag

arXiv PDF

Technical details

Canonical key: arxiv-2406.07476

Cache status: Stale (SWR served)

Generated at: Jun 15, 2026, 9:15 PM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: missing

Time to repro: a few hours

1 risk flag

pytorch

Results & Benchmarks

Freshness tier: cold

Direct + Inferred Evidence

No concrete benchmark grounding is available yet. Treat the page as context or an implementation starting point only.

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs is the primary contribution described in this paper.

Use This Implementation Because…

Confidence: high

damo-nlp-sg/videollama2 is the strongest maintained implementation based on ranking signals. License is declared (Apache-2.0). Dependency/environment manifests are present.

Open damo-nlp-sg/videollama2

Reproduction Risks

No CI workflows detected

Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 55/100, grounding 75/100, status medium.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

damo-nlp-sg/videollama2

best maintained

Maintenance: Stale

Confidence: High

Reproducibility: Moderate

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 1,300
Last push: Jan 23, 2025 (511d ago)

Dependencies

Risk flags

No push in 12+ months
No CI pipeline detected
No tagged releases

damo-nlp-sg/videollama3

alternative

Maintenance: Stale risk

Confidence: Low

Reproducibility: Moderate

Strong overlap with paper title keywords · Community adoption signal (1166 stars)

Stars: 1,166
Last push: Aug 14, 2025 (308d ago)

Dependencies

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

damo-nlp-sg/inf-clip

alternative

Maintenance: Stale

Confidence: Low

Reproducibility: Moderate

Community adoption signal (286 stars)

Stars: 286
Last push: Jan 16, 2025 (518d ago)

Dependencies

Risk flags

No push in 12+ months
No CI pipeline detected
No tagged releases

Best implementation now

damo-nlp-sg/videollama2

Confidence: High

Reproducibility: Moderate

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Stars: 1,300

Forks: 89

Last push: Jan 23, 2025

License: Apache-2.0

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Strong overlap with paper title keywords

Community adoption signal (1300 stars)

License ✓

CI –

Deps ✓

Docker –

Selected damo-nlp-sg/videollama2 as the strongest maintained implementation for new work.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Reproduction readiness

Setup Required

Time to first repro: hours

Last checked: Jun 15, 2026

Dependencies pinned, manual setup needed

· damo-nlp-sg/videollama2 has pyproject.toml but requires manual environment setup.
· Last push was 511 days ago — expect possible dependency version conflicts.
· No Dockerfile — you will set up the environment manually.
· No CI pipeline — test coverage is unknown.

Open damo-nlp-sg/videollama2

Quick start

git clone https://github.com/damo-nlp-sg/videollama2.git
pip install -e .

No benchmark numbers could be verified. You will not be able to validate reproduction correctness against published numbers.

Additional implementations

Official

No additional official repositories detected.

Community

DAMO-NLP-SG/VideoLLaMA2
Confidence: Medium

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Stars: 1,300

Last push: Jan 23, 2025

License: Apache-2.0

Possible but unverified matches (6)

These repositories had low-confidence matching signals and are hidden by default.

damo-nlp-sg/videollama3

Confidence: Low

Stars: 1,166
damo-nlp-sg/inf-clip

Confidence: Low

Stars: 286
DAMO-NLP-SG/Inf-CLIP

Confidence: Low

Stars: 286
YALYAshley/VideoQA-TA

Confidence: Low

Stars: 8
bigjoo99/VideoLLaMA3_LoRA

Confidence: Low

Stars: 0
he2720/DialogueMMT

Confidence: Low

Stars: 2

Hugging Face artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.

Models

No trustworthy model matches right now.

Search models on Hugging Face

Datasets

microsoft/TemporalBench

Curated Related

Downloads: 498

Likes: 18

Updated: Nov 7, 2024
minkyuchoi/Temporal-Logic-Video-Dataset

Curated Related

Downloads: 1,686

Likes: 1

Updated: Jul 14, 2024