The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
Junlong Li, Wenshuo Zhao, Jian Zhao, Weihao Zeng, Haoze Wu, Xiaochen Wang ยท Oct 29, 2025
Citations: 0
Simulation Env Long Horizon General
- Real-world language agents must handle complex, multi-step workflows across diverse Apps.
- For instance, an agent may manage emails by coordinating with calendars and file systems, or monitor a production database to detect anomalies and generate reports following an operating manual.