Maintained implementation availablepaddle

PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

May 1, 2022arXiv: 2205.12007

1 repo12,582 stars~a few days to reproduce

Abstract

Results & Benchmarks

Task	Dataset	Metric	Value
Easy-to-use All-in-one Speech Toolkit	PANNs-CNN14	Accuracy.	95.00
Easy-to-use All-in-one Speech Toolkit	PANNs-CNN10	Accuracy.	89.75

Hardware Requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Best Implementation

PaddlePaddle/PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

12.6k 2.0k Apr 2026 Apache-2.0

License ✓

CI –

Deps –

Docker –

Selected PaddlePaddle/PaddleSpeech as the strongest maintained implementation for new work.
Repository activity is within the last 24 months.

Reproduction Path

1
Start with PaddlePaddle/PaddleSpeech and validate setup instructions in README.
2
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
3
Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few daysNo CI workflows detectedDependency manifest is missing

Additional Implementations

No additional verified repositories beyond the primary recommendation.

Hugging Face Artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Continue with targeted Hugging Face searches:

models

arxiv:2205.12007 PaddleSpeech Easy-to-Use

datasets

arxiv:2205.12007 PaddleSpeech dataset

spaces

arxiv:2205.12007 PaddleSpeech demo

Research Context

Tasks

Easy-to-use All-in-one Speech Toolkit