Maintained implementation availablepytorchPretrained Models Available

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

September 1, 2024arXiv: 2409.06666

2 repos3,137 stars~a few hours to reproduce

Abstract

Results & Benchmarks

Task	Dataset	Metric	Value
Natural language processing	InstructS2S-Eval	WER	2.98

Best Implementation

ictnlp/llama-omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

3.1k 223 May 2025 Apache-2.0

License ✓

CI –

Deps ✓

Docker –

Selected ictnlp/llama-omni as the strongest maintained implementation for new work.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Reproduction Path

1
Start with ictnlp/llama-omni and validate setup instructions in README.
2
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
3
Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few hoursNo CI workflows detected

Additional Implementations

Official

No additional official repositories detected.

Community

ictnlp/LLaMA-OmniConfidence: low
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Stars: 3.1kForks: 223Last push: May 2025License: Apache-2.0

Hugging Face Artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts.

Curated Related

ICTNLP/Llama-3.1-8B-Omni
63 418

Research Context