Tool Retrieval Bridge: Aligning Vague Instructions with Retriever Preferences via Bridge Model
Kunfeng Chen, Luyao Zhuang, Fei Liao, Juhua Liu, Jian Wang, Bo Du · Apr 9, 2026 · Citations: 0
How to use this paper page
Coverage: RecentUse this page to decide whether the paper is strong enough to influence an eval design. It summarizes the abstract plus available structured metadata. If the signal is thin, use it as background context and compare it against stronger hub pages before making protocol choices.
Best use
Background context only
Metadata: RecentTrust level
Provisional
Signals: RecentWhat still needs checking
Structured extraction is still processing; current fields are metadata-first.
Signal confidence unavailable
Abstract
Tool learning has emerged as a promising paradigm for large language models (LLMs) to address real-world challenges. Due to the extensive and irregularly updated number of tools, tool retrieval for selecting the desired tool subset is essential. However, current tool retrieval methods are usually based on academic benchmarks containing overly detailed instructions (e.g., specific API names and parameters), while real-world instructions are more vague. Such a discrepancy would hinder the tool retrieval in real-world applications. In this paper, we first construct a new benchmark, VGToolBench, to simulate human vague instructions. Based on this, we conduct a series of preliminary analyses and find that vague instructions indeed damage the performance of tool retrieval. To this end, we propose a simple-yet-effective Tool Retrieval Bridge (TRB) approach to boost the performance of tool retrieval for vague instructions. The principle of TRB is to introduce a bridge model to rewrite the vague instructions into more specific ones and alleviate the gap between vague instructions and retriever preferences.We conduct extensive experiments under multiple commonly used retrieval settings, and the results show that TRB effectively mitigates the ambiguity of vague instructions while delivering consistent and substantial improvements across all baseline retrievers. For example, with the help of TRB, BM25 achieves a relative improvement of up to 111.51%, i.e., increasing the average NDCG score from 9.73 to 19.59. The source code and models are publicly available at https://github.com/kfchenhn/TRB.