We introduce Whisper: Courtside Edition, a novel multi-agent large language model (LLM) pipeline that enhances Whisper transcriptions without retraining.
The pipeline intercepts Whisper's initial transcript, applies specialized LLM agents for domain context identification, named entity recognition, and jargon detection, and generates compact prompts that guide Whisper's decoder.
Wenlong Meng, Chen Gong, Terry Yue Zhuo, Fan Zhang, Kecen Li, Zheng Liu · Feb 21, 2026 · Citations: 0
Automatic MetricsMathCoding
LLM agents rely heavily on high-quality trajectory data to guide their problem-solving behaviors, yet producing such data requires substantial task design, high-capacity model generation, and manual filtering.
Despite the high cost of creating these datasets, existing literature has overlooked copyright protection for LLM agent trajectories.
Zhiqin Qian, Ryan Diaz, Sangwon Seo, Vaibhav Unhelkar · Feb 20, 2026 · Citations: 0
Pairwise PreferenceAutomatic MetricsCoding
When training artificial intelligence (AI) to perform tasks, humans often care not only about whether a task is completed but also how it is performed.
As AI agents tackle increasingly complex tasks, aligning their behavior with human-provided specifications becomes critical for responsible AI deployment.
Jiamin Yao, Eren Gultepe · Feb 20, 2026 · Citations: 0
Automatic MetricsSimulation EnvMathCoding
Applied to LLaMA-2-7B, SPQ achieves up to 75% memory reduction while maintaining or improving perplexity (e.g., WikiText-2 5.47 to 4.91) and preserving accuracy on downstream benchmarks such as C4, TruthfulQA, and GSM8K.
Generative AI is reshaping knowledge work, yet existing research focuses predominantly on software engineering and the natural sciences, with limited methodological exploration for the humanities and social sciences.
Positioned as a "methodological experiment," this study proposes an AI Agent-based collaborative research workflow (Agentic Workflow) for humanities and social science research.
Serin Kim, Sangam Lee, Dongha Lee · Feb 19, 2026 · Citations: 0
Pairwise PreferenceAutomatic MetricsCoding
Large language models have advanced web agents, yet current agents lack personalization capabilities.
Since users rarely specify every detail of their intent, practical web agents must be able to interpret ambiguous queries by inferring user preferences and contexts.
Iman Ahmadi, Mehrshad Taji, Arad Mahdinezhad Kashani, AmirHossein Jadidi, Saina Kashani, Babak Khalaj · Feb 18, 2026 · Citations: 0
Simulation EnvCoding
MALLVI presents a Multi Agent Large Language and Vision framework that enables closed-loop feedback driven robotic manipulation.
Rather than using a single model, MALLVI coordinates specialized agents, Decomposer, Localizer, Thinker, and Reflector, to manage perception, localization, reasoning, and high level planning.
Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities understudied.
We introduce \textbf{Indic Jailbreak Robustness (IJR)}, a judge-free benchmark for adversarial safety across 12 Indic and South Asian languages (2.1 Billion speakers), covering 45216 prompts in JSON (contract-bound) and Free (naturalistic)
Jeffrey T. H. Wong, Zixi Zhang, Junyi Liu, Yiren Zhao · Feb 18, 2026 · Citations: 0
Expert VerificationAutomatic MetricsCoding
Existing Multi-Agent Systems (MAS) typically rely on static, homogeneous model configurations, limiting their ability to exploit the distinct strengths of differently post-trained models.
To address this, we introduce Team-of-Thoughts, a novel MAS architecture that leverages the complementary capabilities of heterogeneous agents via an orchestrator-tool paradigm.
GLM-5-Team, :, Aohan Zeng, Xin Lv, Zhenyu Hou, Zhengxiao Du · Feb 17, 2026 · Citations: 0
Automatic MetricsCoding
We present GLM-5, a next-generation foundation model designed to transition the paradigm of vibe coding to agentic engineering.
Building upon the agentic, reasoning, and coding (ARC) capabilities of its predecessor, GLM-5 adopts DSA to significantly reduce training and inference costs while maintaining long-context fidelity.
In practice, users iteratively refine visualizations through multi-turn interactions that require maintaining common ground, tracking prior edits, and adapting to evolving preferences.
We introduce ChartEditBench, a benchmark for incremental, visually grounded chart editing via code, comprising 5,000 difficulty-controlled modification chains and a rigorously human-verified subset.
Xiaoze Liu, Ruowang Zhang, Weichen Yu, Siheng Xiong, Liu He, Feijie Wu · Feb 17, 2026 · Citations: 0
Pairwise PreferenceAutomatic MetricsCoding
Multi-Agent Systems (MAS) powered by Large Language Models have unlocked advanced collaborative reasoning, yet they remain shackled by the inefficiency of discrete text communication, which imposes significant runtime overhead and informati
By introducing a Universal Visual Codec, we map heterogeneous reasoning traces into a shared continuous latent space and inject them directly into the receiver's visual pathway, effectively treating the vision encoder as a universal port fo
Victor De Lima, Jiqun Liu, Grace Hui Yang · Feb 17, 2026 · Citations: 0
Human EvalSimulation EnvCoding
Within this framework, we construct framing-sensitive agent personas by fine-tuning language models with framing-conditioned loss attenuation, inducing targeted biases while preserving overall task competence.
Human evaluation further confirms that FrameRef's generated framings measurably affect human judgment.
Qi Liu, Ruochen Hao, Can Li, Wanjing Ma · Feb 14, 2026 · Citations: 0
Simulation EnvCoding
We present OR-Agent, a configurable multi-agent research framework designed for automated exploration in rich experimental environments.
OR-Agent organizes research as a structured tree-based workflow that explicitly models branching hypothesis generation and systematic backtracking, enabling controlled management of research trajectories beyond simple mutation-crossover loo