- VehicleMemBench: An Executable Benchmark for Multi-User Long-Term Memory in In-Vehicle Agents
Yuhao Chen, Yi Xu, Xinyun Ding, Xiang Fang, Shuochen Liu · Mar 25, 2026 · Citations: 0
Pairwise Preference Simulation Env Tool Use
With the growing demand for intelligent in-vehicle experiences, vehicle-based agents are evolving from simple assistants to long-term companions.
- The Ultimate Tutorial for AI-driven Scale Development in Generative Psychometrics: Releasing AIGENIE from its Bottle
Lara Russell-Lasalandra, Hudson Golino, Luis Eduardo Garrido, Alexander P. Christensen · Mar 30, 2026 · Citations: 0
Critique Edit Tool Use
Psychological scale development has traditionally required extensive expert involvement, iterative revision, and large-scale pilot testing before psychometric evaluation can begin.
- AgenticRec: End-to-End Tool-Integrated Policy Optimization for Ranking-Oriented Recommender Agents
Tianyi Li, Zixuan Wang, Guidong Lei, Xiaodong Li, Hui Li · Mar 23, 2026 · Citations: 0
Pairwise Preference Tool Use
To address this, we present AgenticRec, a ranking-oriented agentic recommendation framework that optimizes the entire decision-making trajectory (including intermediate reasoning, tool invocation, and final ranking list generation) under…
- Brief Is Better: Non-Monotonic Chain-of-Thought Budget Effects in Function-Calling Language Agents
Xuan Qi · Apr 2, 2026 · Citations: 0
Automatic Metrics Tool Use
Chain-of-thought (CoT) reasoning is widely assumed to improve agent performance, but the relationship between reasoning length and accuracy in structured tool-use settings remains poorly understood.
- Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency
Guan-Ting Lin, Chen Chen, Zhehuai Chen, Hung-yi Lee · Apr 6, 2026 · Citations: 0
Automatic Metrics Tool Use
We introduce Full-Duplex-Bench-v3 (FDB-v3), a benchmark for evaluating spoken language models under naturalistic speech conditions and multi-step tool use.
- FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol
Jie Zhu, Yimin Tian, Boyang Li, Kehao Wu, Zhongzhi Liang · Mar 26, 2026 · Citations: 0
Automatic Metrics Tool Use
This paper introduces FinMCP-Bench, a novel benchmark for evaluating large language models (LLMs) in solving real-world financial problems through tool invocation of financial model context protocols.
- Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models
Shilin Yan, Jintao Tong, Hongwei Xue, Xiaojun Tang, Yangyang Wang · Apr 9, 2026 · Citations: 0
Automatic Metrics Tool Use
The advent of agentic multimodal models has empowered systems to actively interact with external environments.
- Cognitive Friction: A Decision-Theoretic Framework for Bounded Deliberation in Tool-Using Agents
Davide Di Gioia · Mar 31, 2026 · Citations: 0
Automatic Metrics Tool Use
Autonomous tool-using agents in networked environments must decide which information source to query and when to stop querying and act.
- The Evolution of Tool Use in LLM Agents: From Single-Tool Call to Multi-Tool Orchestration
Haoyuan Xu, Chang Li, Xinyan Ma, Xianhao Ou, Zihan Zhang · Mar 24, 2026 · Citations: 0
Automatic Metrics Tool Use
As agent systems evolve, however, the central problem has shifted from isolated invocation to multi-tool orchestration over long trajectories with intermediate state, execution feedback, changing environments, and practical constraints such…
- The Detection-Extraction Gap: Models Know the Answer Before They Can Say It
Hanyang Wang, Mingxuan Zhu · Apr 8, 2026 · Citations: 0
Automatic Metrics Tool Use
Across five model configurations, two families, and three benchmarks, we find that 52--88% of chain-of-thought tokens are produced after the answer is recoverable from a partial prefix.
- AgentGL: Towards Agentic Graph Learning with LLMs via Reinforcement Learning
Yuanfu Sun, Kang Li, Dongzhe Fan, Jiajin Liu, Qiaoyu Tan · Apr 7, 2026 · Citations: 0
Automatic Metrics Tool Use
To bridge this gap, we introduce Agentic Graph Learning (AGL), a paradigm that reframes graph learning as an interleaved process of topology-aware navigation and LLM-based inference.
- HippoCamp: Benchmarking Contextual Agents on Personal Computers
Zhe Yang, Shulin Tian, Kairui Hu, Shuai Liu, Hoang-Nhat Nguyen · Apr 1, 2026 · Citations: 0
Automatic Metrics Tool Use
We present HippoCamp, a new benchmark designed to evaluate agents' capabilities on multimodal file management.
- Schema on the Inside: A Two-Phase Fine-Tuning Method for High-Efficiency Text-to-SQL at Scale
Chinmay Soni, Shivam Chourasia, Gaurav Kumar, Hitesh Kapoor · Mar 25, 2026 · Citations: 0
Automatic Metrics Tool Use
We present a specialized, self-hosted 8B-parameter model designed for a conversational bot in CriQ, a sister app to Dream11, India's largest fantasy sports platform with over 250 million users, that answers user queries about cricket…
- Dynamic Context Evolution for Scalable Synthetic Data Generation
Ryan Lingo, Rajeev Chhajer · Apr 8, 2026 · Citations: 0
Tool Use
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Agentic Tool Use in Large Language Models
Jinchao Hu, Meizhi Zhong, Kehai Chen, Xuefeng Bai, Min Zhang · Apr 1, 2026 · Citations: 0
Tool Use
Large language models are increasingly being deployed as autonomous agents yet their real world effectiveness depends on reliable tools for information retrieval, computation and external action.
- PaperVoyager : Building Interactive Web with Visual Language Models
Dasen Dai, Biao Wu, Meng Fang, Wenhao Wang · Mar 24, 2026 · Citations: 0
Tool Use
In this work, we propose a Paper-to-Interactive-System Agent that converts research papers into executable interactive web systems.