OpenTrain AI
Maintained implementation availablepytorchPretrained Models Available

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

September 1, 2024arXiv: 2409.06666
2 repos3,137 stars~a few hours to reproduce
arXiv PDF

Abstract

Results & Benchmarks

TaskDatasetMetricValue
Natural language processingInstructS2S-EvalWER2.98

Best Implementation

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

3.1k 223 May 2025 Apache-2.0
License
CI
Deps
Docker
  • Selected ictnlp/llama-omni as the strongest maintained implementation for new work.
  • Includes dependency/environment manifest signals.
  • Repository activity is within the last 24 months.

Reproduction Path

  1. 1

    Start with ictnlp/llama-omni and validate setup instructions in README.

  2. 2

    Reproduce the baseline result with the provided defaults before modifying hyperparameters.

  3. 3

    Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few hoursNo CI workflows detected

Additional Implementations

Official

No additional official repositories detected.

Community

  • ictnlp/LLaMA-OmniConfidence: low

    LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

    Stars: 3.1kForks: 223Last push: May 2025License: Apache-2.0

Hugging Face Artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts.

Curated Related