Results & Benchmarks
| Task | Dataset | Metric | Value |
|---|---|---|---|
| Easy-to-use All-in-one Speech Toolkit | PANNs-CNN14 | Accuracy. | 95.00 |
| Easy-to-use All-in-one Speech Toolkit | PANNs-CNN10 | Accuracy. | 89.75 |
Hardware Requirements
- Expect multi-day setup/compute for meaningful reproduction based on current guidance.
Best Implementation
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
12.6k 2.0k Apr 2026 Apache-2.0
License ✓
CI –
Deps –
Docker –
- Selected PaddlePaddle/PaddleSpeech as the strongest maintained implementation for new work.
- Repository activity is within the last 24 months.
Reproduction Path
- 1
Start with PaddlePaddle/PaddleSpeech and validate setup instructions in README.
- 2
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
- 3
Log exact dependency versions and runtime environment for reproducibility.
Time to first repro: a few daysNo CI workflows detectedDependency manifest is missing
Additional Implementations
No additional verified repositories beyond the primary recommendation.
Hugging Face Artifacts
No trustworthy direct or curated related Hugging Face artifacts were found yet.
Continue with targeted Hugging Face searches: