DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

Q: What is the best open-source implementation of "DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment"?

The best maintained implementation is kehanlu/desta2.5-audio with 136 stars on GitHub. Confidence: high. Reproducibility: Limited.

Q: How reproducible is "DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment"?

Estimated time to first reproduction: a few days. Risk flags: License metadata missing, No CI workflows detected, Dependency manifest is missing. Start with kehanlu/desta2.5-audio and validate setup instructions in README.

Q: Are there pretrained models available for "DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment"?

Yes, 1 Hugging Face model found. The top result is DeSTA-ntu/DeSTA2.5-Audio-Llama-3.1-8B with 446 downloads.

Q: What framework is used to implement "DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment"?

The primary implementation uses pytorch.

Published: Jul 1, 2025

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 2

Top repo stars: 136

Core AI workload signals detected from paper context and implementation/artifact evidence.

Framework: pytorch

Time to first repro: a few days

3 risk flags

arXiv PDF

Technical details

Canonical key: arxiv-2507.02768

Cache status: Stale (SWR served)

Generated at: Apr 16, 2026, 12:59 PM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few days

3 risk flags

pytorch

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Classification

Random

Emotion

20.0

Source: paper fulltext

Classification

Instruction-based QA

Emotion

62.2

Source: paper fulltext

Benchmark evidence drill-down

2 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Classification	Random	Emotion	20.0	paper-derived	No explicit refs
Classification	Instruction-based QA	Emotion	62.2	paper-derived	No explicit refs

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment is the primary contribution described in this paper.

Use This Implementation Because…

Confidence: high

kehanlu/desta2.5-audio is the strongest maintained implementation based on ranking signals.

Open kehanlu/desta2.5-audio

Reproduction Risks

License metadata missing
No CI workflows detected
Dependency manifest is missing

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 4 refs, 4 links.

Utility signals: depth 100/100, grounding 95/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

kehanlu/desta2.5-audio

best maintained

Maintenance: Recently updated

Confidence: High

Reproducibility: Limited

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 136
Last push: Feb 4, 2026 (73d ago)

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

kehanlu/DeSTA2

alternative

Maintenance: Stale risk

Confidence: Low

Reproducibility: Limited

Partial overlap with paper title keywords · Community adoption signal (125 stars)

Stars: 125
Last push: Jul 15, 2025 (276d ago)

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

linxid/ai-paper-daily

alternative

Maintenance: Recently updated

Confidence: Low

Reproducibility: Strong

Matched via arXiv identifier search

Stars: 4
Last push: Nov 28, 2025 (141d ago)

CIDependencies

Risk flags

No tagged releases
No Docker setup
Low confidence match

Best implementation now

kehanlu/desta2.5-audio

Confidence: High

Reproducibility: Limited

Code for DeSTA2.5-Audio, general-purpose LALM

Stars: 136

Forks: 7

Last push: Feb 4, 2026

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Partial overlap with paper title keywords

Community adoption signal (136 stars)

License –

CI –

Deps –

Docker –

Selected kehanlu/desta2.5-audio as the strongest maintained implementation for new work.
Repository activity is within the last 24 months.

Reproduction readiness

Major Work

Time to first repro: days

Last checked: Apr 16, 2026

Hardware requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

No dependency manifest — manual reconstruction required

· kehanlu/desta2.5-audio has no requirements.txt, environment.yml, pyproject.toml, or Dockerfile.
· You will need to reverse-engineer dependencies from import statements in the source code.

Open kehanlu/desta2.5-audio