Better Late Than Never: Meta-Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation
Peter Polák, Sara Papi, Luisa Bentivogli, Ondřej Bojar · Sep 22, 2025 · Citations: 0
How to use this page
Low trustUse this as background context only. Do not make protocol decisions from this page alone.
Best use
Background context only
What to verify
Validate the evaluation procedure and quality controls in the full paper before operational use.
Evidence quality
Low
Derived from extracted protocol signals and abstract evidence.
Abstract
Simultaneous speech-to-text translation systems must balance translation quality with latency. Although quality evaluation is well established, latency measurement remains a challenge. Existing metrics produce inconsistent results, especially in short-form settings with artificial presegmentation. We present the first comprehensive meta-evaluation of latency metrics across language pairs and systems. We uncover a structural bias in current metrics related to segmentation. We introduce YAAL (Yet Another Average Lagging) for a more accurate short-form evaluation and LongYAAL for unsegmented audio. We propose SoftSegmenter, a resegmentation tool based on soft word-level alignment. We show that YAAL and LongYAAL, together with SoftSegmenter, outperform popular latency metrics, enabling more reliable assessments of short- and long-form simultaneous speech translation systems. We implement all artifacts within the OmniSTEval toolkit: https://github.com/pe-trik/OmniSTEval.