- Discourse-Aware Dual-Track Streaming Response for Low-Latency Spoken Dialogue Systems
Siyuan Liu, Jiahui Xu, Feng Jiang, Kuang Wang, Zefeng Zhao · Feb 26, 2026
Automatic Metrics General
Achieving human-like responsiveness is a critical yet challenging goal for cascaded spoken dialogue systems.
- Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization
Qianben Chen, Tianrui Qin, King Zhu, Qiexiang Wang, Chengjun Yu · Feb 26, 2026
Automatic Metrics General
Recent deep research agents primarily improve performance by scaling reasoning depth, but this leads to high inference cost and latency in search-intensive scenarios.
- Ruyi2 Technical Report
Huan Song, Shuyu Tian, Junyi Hao, Minxiu Xu, Hongjun An · Feb 26, 2026
Automatic Metrics General
Large Language Models (LLMs) face significant challenges regarding deployment costs and latency, necessitating adaptive computing strategies.
- LiCQA : A Lightweight Complex Question Answering System
Sourav Saha, Dwaipayan Roy, Mandar Mitra · Feb 25, 2026
Automatic Metrics General
The results of our experiments show that LiCQA significantly outperforms these two state-of-the-art systems on benchmark data with noteworthy reduction in latency.
- Generative Pseudo-Labeling for Pre-Ranking with LLMs
Junyu Bi, Xinting Niu, Daixuan Cheng, Kun Yuan, Tao Wang · Feb 24, 2026
Automatic Metrics General
Pre-ranking is a critical stage in industrial recommendation systems, tasked with efficiently scoring thousands of recalled items for downstream ranking.
- HELP: HyperNode Expansion and Logical Path-Guided Evidence Localization for Accurate and Efficient GraphRAG
Yuqi Huang, Ning Liao, Kai Yang, Anning Hu, Shengchao Hu · Feb 24, 2026
Automatic Metrics General
Extensive experiments demonstrate that HELP achieves competitive performance across multiple simple and multi-hop QA benchmarks and up to a 28.8$\times$ speedup over leading Graph-based RAG baselines.
- Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations
Dongming Jiang, Yi Li, Songtao Wei, Jinxin Yang, Ayushi Kishore · Feb 22, 2026
Automatic Metrics General
Agentic memory systems enable large language model (LLM) agents to maintain state across long interactions, supporting long-horizon reasoning and personalization beyond fixed context windows.
- TabAgent: A Framework for Replacing Agentic Generative Components with Tabular-Textual Classifiers
Ido Levy, Eilam Shapira, Yinon Goldshtein, Avi Yaeli, Nir Mashkif · Feb 18, 2026
Automatic Metrics General
Agentic systems, AI architectures that autonomously execute multi-step workflows to achieve complex goals, are often built using repeated large language model (LLM) calls for closed-set decision tasks such as routing, shortlisting, gating,
- Discrete Stochastic Localization for Non-autoregressive Generation
Yunshu Wu, Jiayi Cheng, Partha Thakuria, Rob Brekelmans, Evangelos E. Papalexakis · Feb 18, 2026
Automatic Metrics General
On OpenWebText, \textsc{DSL} fine-tuning yields large MAUVE gains at low step budgets, surpassing the MDLM+ReMDM baseline with \(\sim\)4$\times$ fewer denoiser evaluations, and matches autoregressive quality at high budgets.
- Overthinking Loops in Agents: A Structural Risk via MCP Tools
Yohan Lee, Jisoo Jang, Seoyeon Choi, Sangyeop Kim, Seungtaek Choi · Feb 16, 2026
Automatic Metrics General
Tool-using LLM agents increasingly coordinate real workloads by selecting and chaining third-party tools based on text-visible metadata such as tool names, descriptions, and return messages.
- Chain-of-Thought Compression Should Not Be Blind: V-Skip for Efficient Multimodal Reasoning via Dual-Path Anchoring
Dongxu Zhang, Yiding Sun, Cheng Tan, Wenbiao Yan, Ning Yang · Jan 20, 2026
Automatic Metrics General
While Chain-of-Thought (CoT) reasoning significantly enhances the performance of Multimodal Large Language Models (MLLMs), its autoregressive nature incurs prohibitive latency constraints.
- Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning
Chi-Pin Huang, Yunze Man, Zhiding Yu, Min-Hung Chen, Jan Kautz · Jan 14, 2026
Simulation Env General
Fast-ThinkAct learns to reason efficiently with latent CoTs by distilling from a teacher, driven by a preference-guided objective to align manipulation trajectories that transfers both linguistic and visual planning capabilities for embodie
- Towards Efficient Agents: A Co-Design of Inference Architecture and System
Weizhe Lin, Hui-Ling Zhen, Shuai Yang, Xian Wang, Renxi Liu · Dec 20, 2025
Automatic Metrics General
The rapid development of large language model (LLM)-based agents has unlocked new possibilities for autonomous multi-turn reasoning and tool-augmented decision-making.
- Intelligence per Watt: Measuring Intelligence Efficiency of Local AI
Jon Saad-Falcon, Avanika Narayan, Hakki Orhun Akengin, J. Wes Griffin, Herumb Shandilya · Nov 11, 2025
Automatic Metrics General
Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure.
- RELOOP: Recursive Retrieval with Multi-Hop Reasoner and Planners for Heterogeneous QA
Ruiyi Yang, Hao Xue, Imran Razzak, Hakim Hacid, Flora D. Salim · Oct 23, 2025
Automatic Metrics General
A Head Agent provides guidance that leads retrieval, while an Iteration Agent selects and expands HSeq via structure-respecting actions (e.g., parent/child hops, table row/column neighbors, KG relations); Finally the head agent composes can
- FrugalPrompt: Reducing Contextual Overhead in Large Language Models via Token Attribution
Syed Rifat Raiyan, Md Farhan Ishmam, Abdullah Al Imran, Mohammad Ali Moni · Oct 18, 2025
Automatic Metrics General
Human communication heavily relies on laconism and inferential pragmatics, allowing listeners to successfully reconstruct rich meaning from sparse, telegraphic speech.
- On the Inference (In-)Security of Vertical Federated Learning: Efficient Auditing against Inference Tampering Attack
Chung-ju Huang, Ziqi Zhang, Yinggui Wang, Binghui Wang, Tao Wei · Jul 3, 2025
Automatic MetricsSimulation Env General
Vertical Federated Learning (VFL) is an emerging distributed learning paradigm for cross-silo collaboration without accessing participants' data.
- vCache: Verified Semantic Prompt Caching
Luis Gaspar Schroeder, Aditya Desai, Alejandro Cuadron, Kyle Chu, Shu Liu · Feb 6, 2025
Automatic Metrics General
We release the vCache implementation and four benchmarks to support future research.
- Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
Qiong Wu, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji · Mar 22, 2024
Automatic Metrics General
To validate EAS, we apply it to a recently proposed MLLM called LaVIN and a classic VL pre-trained model called METER, and conduct extensive experiments on a set of benchmarks.