CV-18 NER: Augmented Common Voice for Named Entity Recognition from Arabic Speech

Q: How reproducible is "CV-18 NER: Augmented Common Voice for Named Entity Recognition from Arabic Speech"?

Estimated time to first reproduction: a few days. Risk flags: No repository-level reproducibility signals are currently available, Estimate assumes artifact-level reproduction; full training reproduction may require additional paper details.. Use the paper-linked Hugging Face release as the starting artifact, then reconstruct training and evaluation settings from the paper.

Youssef Saidi, Haroun Elleuch, Fethi Bougares

Published: Apr 2, 2026

No direct implementation yet

Evidence: Inferred

Domain fit: AI-adjacent

Verified repos: 0

Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.

Time to first repro: a few days

2 risk flags

arXiv PDF

End-to-end speech Named Entity Recognition (NER) aims to directly extract entities from speech. Prior work has shown that end-to-end (E2E) approaches can outperform cascaded pipelines for English, French, and Chinese, but Arabic remains under-explored due to its morphological complexity, the absence of short vowels, and limited annotated resources. We introduce CV-18 NER, the first publicly available dataset for NER ...

Read full abstract

from Arabic speech, created by augmenting the Arabic Common Voice 18 corpus with manual NER annotations following the fine-grained Wojood schema (21 entity types). We benchmark both pipeline systems (ASR + text NER) and E2E models based on Whisper and AraBEST-RQ. E2E systems substantially outperform the best pipeline configuration on the test set, reaching 37.0% CoER (AraBEST-RQ 300M) and 38.0% CVER (Whisper-medium). Further analysis shows that Arabic-specific self-supervised pretraining yields strong ASR performance, while multilingual weak supervision transfers more effectively to joint speech-to-entity learning, and that larger models may be harder to adapt in this low-resource setting. Our dataset and models are publicly released, providing the first open benchmark for end-to-end named entity recognition from Arabic speech https://huggingface.co/datasets/Elyadata/CV18-NER.

Technical details

Canonical key: arxiv-2604.02209

Cache status: Stale (SWR served)

Generated at: Jun 16, 2026, 10:41 PM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: No

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

context only

Benchmarks: thin evidence

Time to repro: a few days

2 risk flags

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Augmented Common Voice Named Entity Recognition

CV-18 NER

WER

81.06

Source: paper fulltext

Augmented Common Voice Named Entity Recognition

AraBEST-RQ 300M 6k

WER

16.0

Source: paper fulltext

Augmented Common Voice Named Entity Recognition

AraBEST-RQ 600M 6k

WER

99.4

Source: paper fulltext

Augmented Common Voice Named Entity Recognition

AraBEST-RQ 600M 14k

WER

99.6

Source: paper fulltext

Benchmark evidence drill-down

4 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Augmented Common Voice Named Entity Recognition	CV-18 NER	WER	81.06	paper-derived	No explicit refs
Augmented Common Voice Named Entity Recognition	AraBEST-RQ 300M 6k	WER	16.0	paper-derived	No explicit refs
Augmented Common Voice Named Entity Recognition	AraBEST-RQ 600M 6k	WER	99.4	paper-derived	No explicit refs
Augmented Common Voice Named Entity Recognition	AraBEST-RQ 600M 14k	WER	99.6	paper-derived	No explicit refs

End-to-end speech Named Entity Recognition (NER) aims to directly extract entities from speech.

Implementation Evidence Summary

Confidence: low

No direct maintained repository implementation was found, but paper-linked Hugging Face artifacts are available.

Reproduction Risks

Estimate assumes artifact-level reproduction; full training reproduction may require additional paper details.

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 2 refs, 1 links.

Utility signals: depth 95/100, grounding 68/100, status medium.

Implementation Comparison

Top 1 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

NickDee96/ASR-TTS-paper-daily

alternative

Maintenance: Active

Confidence: Low

Reproducibility: Strong

Matched via arXiv identifier search

Stars: 3
Last push: Jun 16, 2026 (2d ago)

CIDependencies

Risk flags

No tagged releases
No Docker setup
Low confidence match

Implementation Status

No verified maintained repo

There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.

Use the paper-linked Hugging Face release as the starting artifact, then reconstruct training and evaluation settings from the paper.
No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.
Track assumptions and missing details in an experiment log before coding.

Time to first repro: a few days