- Whisper: Courtside Edition Enhancing ASR Performance Through LLM-Driven Context Generation
Yonathan Ron, Shiri Gilboa, Tammuz Dubnov · Feb 21, 2026 · Citations: 0
Multi Agent
We introduce Whisper: Courtside Edition, a novel multi-agent large language model (LLM) pipeline that enhances Whisper transcriptions without retraining.
- Yor-Sarc: A gold-standard dataset for sarcasm detection in a low-resource African language
Toheeb Aduramomi Jimoh, Tabea De Wille, Nikola S. Nikolov · Feb 21, 2026 · Citations: 0
Pairwise Preference
This protocol incorporates context-sensitive interpretation and community-informed guidelines and is accompanied by a comprehensive analysis of inter-annotator agreement to support replication in other African languages.
- MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Elastic LLMs
Dongwei Wang, Jinhee Kim, Seokho Han, Denis Gudovskiy, Yohei Nakata · Feb 21, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Why Agent Caching Fails and How to Fix It: Structured Intent Canonicalization with Few-Shot Learning
Abhinaba Basu · Feb 21, 2026 · Citations: 0
Personal AI agents incur substantial cost via repeated LLM calls.
- DeepInnovator: Triggering the Innovative Capabilities of LLMs
Tianyu Fan, Fengji Zhang, Yuxiang Zheng, Bei Chen, Xinyao Niu · Feb 21, 2026 · Citations: 0
The application of Large Language Models (LLMs) in accelerating scientific discovery has garnered increasing attention, with a key focus on constructing research agents endowed with innovative capability, i.e., the ability to autonomously…
- AAVGen: Precision Engineering of Adeno-associated Viral Capsids for Renal Selective Targeting
Mohammadreza Ghaffarzadeh-Esfahani, Yousof Gheisari · Feb 21, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- TRUE: A Trustworthy Unified Explanation Framework for Large Language Model Reasoning
Yujiao Yang · Feb 21, 2026 · Citations: 0
Extensive experiments across multiple reasoning benchmarks demonstrate that the proposed framework provides multi-level, verifiable explanations, including executable reasoning structures for individual instances, feasible-region…
- [b]=[d]-[t]+[p]: Self-supervised Speech Models Discover Phonological Vector Arithmetic
Kwanghee Choi, Eunjung Yeo, Cheol Jun Cho, David Harwath, David R. Mortensen · Feb 21, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Hyperbolic Busemann Neural Networks
Ziheng Chen, Bernhard Schölkopf, Nicu Sebe · Feb 21, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- EvalSense: A Framework for Domain-Specific LLM (Meta-)Evaluation
Adam Dejl, Jonathan Pearson · Feb 21, 2026 · Citations: 0
Robust and comprehensive evaluation of large language models (LLMs) is essential for identifying effective LLM system configurations and mitigating risks associated with deploying LLMs in sensitive domains.
- Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models
Abraham Paul Elenjical, Vivek Hruday Kavuri, Vasudeva Varma · Feb 21, 2026 · Citations: 0
Pairwise Preference
We introduce a psychologically grounded metacognitive framework that operationalizes Ann Brown's regulatory cycle (Planning, Monitoring, and Evaluation) as a structured prompting architecture, and study its integration within a lightweight…
- BURMESE-SAN: Burmese NLP Benchmark for Evaluating Large Language Models
Thura Aung, Jann Railey Montalan, Jian Gang Ngui, Peerat Limkonchotiwat · Feb 21, 2026 · Citations: 0
We introduce BURMESE-SAN, the first holistic benchmark that systematically evaluates large language models (LLMs) for Burmese across three core NLP competencies: understanding (NLU), reasoning (NLR), and generation (NLG).
- MANATEE: Inference-Time Lightweight Diffusion Based Safety Defense for LLMs
Chun Yan Ryan Kan, Tommy Tran, Vedant Yadav, Ava Cai, Kevin Zhu · Feb 21, 2026 · Citations: 0
Red Team
We propose MANATEE, an inference-time defense that uses density estimation over a benign representation manifold.
- ArabicNumBench: Evaluating Arabic Number Reading in Large Language Models
Anas Alhumud, Abdulaziz Alhammadi, Muhammad Badruddin Khan · Feb 21, 2026 · Citations: 0
We present ArabicNumBench, a comprehensive benchmark for evaluating large language models on Arabic number reading tasks across Eastern Arabic-Indic numerals (0-9 in Arabic script) and Western Arabic numerals (0-9).
- The Convergence of Schema-Guided Dialogue Systems and the Model Context Protocol
Andreas Schlapbach · Feb 21, 2026 · Citations: 0
This paper establishes a fundamental convergence: Schema-Guided Dialogue (SGD) and the Model Context Protocol (MCP) represent two manifestations of a unified paradigm for deterministic, auditable LLM-agent interaction.
- Rethinking Retrieval-Augmented Generation as a Cooperative Decision-Making Problem
Lichang Song, Ting Long, Yi Chang · Feb 21, 2026 · Citations: 0
Multi Agent
To overcome this limitation, we reformulate RAG as a cooperative multi-agent decision-making problem and propose Cooperative Retrieval-Augmented Generation (CoRAG), a framework in which the reranker and the generator act as peer…
- ReHear: Iterative Pseudo-Label Refinement for Semi-Supervised Speech Recognition via Audio Large Language Models
Zefang Liu, Chenyang Zhu, Sangwoo Cho, Shi-Xiong Zhang · Feb 21, 2026 · Citations: 0
Experimental results across diverse benchmarks demonstrate that ReHear effectively mitigates error propagation, consistently outperforming both supervised and pseudo-labeling baselines.
- Many AI Analysts, One Dataset: Navigating the Agentic Data Science Multiverse
Martin Bertran, Riccardo Fogliato, Zhiwei Steven Wu · Feb 21, 2026 · Citations: 0
- Watermarking LLM Agent Trajectories
Wenlong Meng, Chen Gong, Terry Yue Zhuo, Fan Zhang, Kecen Li · Feb 21, 2026 · Citations: 0
Long Horizon
LLM agents rely heavily on high-quality trajectory data to guide their problem-solving behaviors, yet producing such data requires substantial task design, high-capacity model generation, and manual filtering.
- Semantic Substrate Theory: An Operator-Theoretic Framework for Geometric Semantic Drift
Stephen Russell · Feb 21, 2026 · Citations: 0
Long Horizon
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Contradiction to Consensus: Dual Perspective, Multi Source Retrieval Based Claim Verification with Source Level Disagreement using LLM
Md Badsha Biswas, Ozlem Uzuner · Feb 21, 2026 · Citations: 0
Through extensive evaluation on four benchmark datasets with five LLMs, we show that knowledge aggregation not only improves claim verification but also reveals differences in source-specific reasoning.
- From Trial by Fire To Sleep Like a Baby: A Lexicon of Anxiety Associations for 20k English Multiword Expressions
Saif M. Mohammad · Feb 21, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Spilled Energy in Large Language Models
Adrian Robert Minut, Hazem Dewidar, Iacopo Masi · Feb 21, 2026 · Citations: 0
Evaluated on nine benchmarks across state-of-the-art LLMs (including LLaMA, Mistral, and Gemma) and on synthetic algebraic operations (Qwen3), our approach demonstrates robust, competitive hallucination detection and cross-task…