Matched via arXiv identifier search
- Stars
- 3
- Last push
- Jun 16, 2026 (2d ago)
Risk flags
- No tagged releases
- No Docker setup
- Low confidence match
Youssef Saidi, Haroun Elleuch, Fethi Bougares
Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.
End-to-end speech Named Entity Recognition (NER) aims to directly extract entities from speech. Prior work has shown that end-to-end (E2E) approaches can outperform cascaded pipelines for English, French, and Chinese, but Arabic remains under-explored due to its morphological complexity, the absence of short vowels, and limited annotated resources. We introduce CV-18 NER, the first publicly available dataset for NER ...
from Arabic speech, created by augmenting the Arabic Common Voice 18 corpus with manual NER annotations following the fine-grained Wojood schema (21 entity types). We benchmark both pipeline systems (ASR + text NER) and E2E models based on Whisper and AraBEST-RQ. E2E systems substantially outperform the best pipeline configuration on the test set, reaching 37.0% CoER (AraBEST-RQ 300M) and 38.0% CVER (Whisper-medium). Further analysis shows that Arabic-specific self-supervised pretraining yields strong ASR performance, while multilingual weak supervision transfers more effectively to joint speech-to-entity learning, and that larger models may be harder to adapt in this low-resource setting. Our dataset and models are publicly released, providing the first open benchmark for end-to-end named entity recognition from Arabic speech https://huggingface.co/datasets/Elyadata/CV18-NER.
Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.
| Task | Dataset | Metric | Value | Source | Evidence refs |
|---|---|---|---|---|---|
| Augmented Common Voice Named Entity Recognition | CV-18 NER | WER | 81.06 | paper-derived | No explicit refs |
| Augmented Common Voice Named Entity Recognition | AraBEST-RQ 300M 6k | WER | 16.0 | paper-derived | No explicit refs |
| Augmented Common Voice Named Entity Recognition | AraBEST-RQ 600M 6k | WER | 99.4 | paper-derived | No explicit refs |
| Augmented Common Voice Named Entity Recognition | AraBEST-RQ 600M 14k | WER | 99.6 | paper-derived | No explicit refs |
End-to-end speech Named Entity Recognition (NER) aims to directly extract entities from speech.
No direct maintained repository implementation was found, but paper-linked Hugging Face artifacts are available.
Hardware Notes
Expect multi-day setup/compute for meaningful reproduction based on current guidance.
Evidence graph: 2 refs, 1 links.
Utility signals: depth 95/100, grounding 68/100, status medium.
Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.
Matched via arXiv identifier search
Risk flags
There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.
Hardware requirements
No verified implementation available
No additional verified repositories beyond the primary recommendation.
These repositories had low-confidence matching signals and are hidden by default.
No trustworthy model matches right now.
Search models on Hugging FaceNo trustworthy demo spaces right now.
Search spaces on Hugging FaceTasks
Augmented Common Voice Named Entity Recognition
Methods
None detected
Domains
None detected
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.