OpenTrain AI
Maintained implementation availablepytorchPretrained Models Available

LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

May 1, 2025arXiv: 2505.02625
1 repo268 stars~a few hours to reproduce
arXiv PDF

Abstract

Results & Benchmarks

TaskDatasetMetricValue
Representation learningLLaMA-Omni2-7BASR-WER.3.26
Representation learningw/o Gate FusionASR-WER.4.89
Representation learningw/o Text EmbeddingASR-WER.6.83
Language modelingStreaming TTSASR-WER.3.26
Language modelingOffline TTSASR-WER.3.51
Language modelingText PretrainedASR-WER.10.34

Best Implementation

ictnlp/LLaMA-Omni2

268 27 May 2025
License
CI
Deps
Docker
  • Selected ictnlp/llama-omni2 as the strongest maintained implementation for new work.
  • Includes dependency/environment manifest signals.
  • Repository activity is within the last 24 months.

Reproduction Path

  1. 1

    Start with ictnlp/llama-omni2 and validate setup instructions in README.

  2. 2

    Reproduce the baseline result with the provided defaults before modifying hyperparameters.

  3. 3

    Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few hoursLicense metadata missingNo CI workflows detected

Additional Implementations

No additional verified repositories beyond the primary recommendation.

Hugging Face Artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts.

Research Context