- GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
Rui Yang, Qianhui Wu, Zhaoyang Wang, Hanyang Chen, Ke Yang · Feb 25, 2026 · Citations: 0
Automatic Metrics Long Horizon
Open-source native GUI agents still lag behind closed-source systems on long-horizon navigation tasks.
- Hierarchical LLM-Based Multi-Agent Framework with Prompt Optimization for Multi-Robot Task Planning
Tomoya Kawabe, Rin Takano · Feb 25, 2026 · Citations: 0
Automatic Metrics Long Horizon
We present a hierarchical multi-agent LLM-based planner with prompt optimization: an upper layer decomposes tasks and assigns them to lower-layer agents, which generate PDDL problems solved by a classical planner.
- Training Generalizable Collaborative Agents via Strategic Risk Aversion
Chengrui Qu, Yizhou Zhang, Nicholas Lanzetti, Eric Mazumdar · Feb 25, 2026 · Citations: 0
Automatic Metrics Multi Agent
Many emerging agentic paradigms require agents to collaborate with one another (or people) to achieve shared goals.
- The Headless Firm: How AI Reshapes Enterprise Boundaries
Tassilo Klein, Sebastian Wieczorek · Feb 24, 2026 · Citations: 0
Automatic Metrics Multi Agent
We argue that agentic AI induces a structural change in how coordination costs scale: in prior modular systems, integration cost grew with interaction topology (O(n^2) in the number of components); in protocol-mediated agentic systems, inte
- A Hierarchical Multi-Agent System for Autonomous Discovery in Geoscientific Data Archives
Dmitrii Pantiukhin, Ivan Kuznetsov, Boris Shapkin, Antonia Anna Jost, Thomas Jung · Feb 24, 2026 · Citations: 0
Automatic Metrics Long Horizon
Here we present PANGAEA-GPT, a hierarchical multi-agent framework designed for autonomous data discovery and analysis.
- Efficient Hierarchical Any-Angle Path Planning on Multi-Resolution 3D Grids
Victor Reijgwart, Cesar Cadena, Roland Siegwart, Lionel Ott · Feb 24, 2026 · Citations: 0
Simulation Env Long Horizon
Hierarchical, multi-resolution volumetric mapping approaches are widely used to represent large and complex environments as they can efficiently capture their occupancy and connectivity information.
- A Benchmark for Deep Information Synthesis
Debjit Paul, Daniel Murphy, Milan Gritta, Ronald Cardenas, Victor Prokhorov · Feb 24, 2026 · Citations: 0
Human EvalAutomatic Metrics Tool Use
Large language model (LLM)-based agents are increasingly used to solve complex tasks involving tool use, such as web browsing, code execution, and data analysis.
- SparkMe: Adaptive Semi-Structured Interviewing for Qualitative Insight Discovery
David Anugraha, Vishakh Padmakumar, Diyi Yang · Feb 24, 2026 · Citations: 0
Expert Verification Automatic Metrics Multi Agent
Based on this formulation, we introduce SparkMe, a multi-agent LLM interviewer that performs deliberative planning via simulated conversation rollouts to select questions with high expected utility.
- Cooperative-Competitive Team Play of Real-World Craft Robots
Rui Zhao, Xihui Li, Yizheng Zhang, Yuzhen Liu, Zhong Zhang · Feb 24, 2026 · Citations: 0
Simulation Env Multi Agent
Multi-agent deep Reinforcement Learning (RL) has made significant progress in developing intelligent game-playing agents in recent years.
- Architecting AgentOS: From Token-Level Context to Emergent System-Level Intelligence
ChengYou Li, XiaoDong Liu, XiangBao Meng, XinYu Zhao · Feb 24, 2026 · Citations: 0
Simulation Env Multi Agent
The paradigm of Large Language Models is undergoing a fundamental transition from static inference engines to dynamic autonomous cognitive systems.While current research primarily focuses on scaling context windows or optimizing prompt engi
- SoK: Agentic Skills -- Beyond Tool Use in LLM Agents
Yanna Jiang, Delong Li, Haiyu Deng, Baihe Ma, Xu Wang · Feb 24, 2026 · Citations: 0
Simulation Env Tool Use
Agentic systems increasingly rely on reusable procedural capabilities, \textit{a.k.a., agentic skills}, to execute long-horizon workflows reliably.
- Onboard-Targeted Segmentation of Straylight in Space Camera Sensors
Riccardo Gallon, Fabian Schiemenz, Alessandra Menicucci, Eberhard Gill · Feb 24, 2026 · Citations: 0
Automatic Metrics Web Browsing
This study details an artificial intelligence (AI)-based methodology for the semantic segmentation of space camera faults.
- Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination
Rakshit Trivedi, Kartik Sharma, David C Parkes · Feb 24, 2026 · Citations: 0
Demonstrations Automatic Metrics Multi Agent
Effective human-AI coordination requires artificial agents capable of exhibiting and responding to human-like behaviors while adapting to changing contexts.
- Contextual Safety Reasoning and Grounding for Open-World Robots
Zachary Ravichandran, David Snyder, Alexander Robey, Hamed Hassani, Vijay Kumar · Feb 23, 2026 · Citations: 0
Simulation Env Web Browsing
Traditional safety approaches enforce fixed constraints in user-specified contexts, limiting their ability to handle the open-ended contextual variability of real-world deployment.
- SAMAS: A Spectrum-Guided Multi-Agent System for Achieving Style Fidelity in Literary Translation
Jingzhuo Wu, Jiajun Zhang, Keyan Jin, Dehua Ma, Junbo Wang · Feb 23, 2026 · Citations: 0
Automatic Metrics Multi Agent
This limitation stems from the inability of current single-model and static multi-agent systems to perceive and adapt to stylistic variations.
- Whisper: Courtside Edition Enhancing ASR Performance Through LLM-Driven Context Generation
Yonathan Ron, Shiri Gilboa, Tammuz Dubnov · Feb 21, 2026 · Citations: 0
Automatic Metrics Multi Agent
We introduce Whisper: Courtside Edition, a novel multi-agent large language model (LLM) pipeline that enhances Whisper transcriptions without retraining.
- Rethinking Retrieval-Augmented Generation as a Cooperative Decision-Making Problem
Lichang Song, Ting Long, Yi Chang · Feb 21, 2026 · Citations: 0
Automatic Metrics Multi Agent
To overcome this limitation, we reformulate RAG as a cooperative multi-agent decision-making problem and propose Cooperative Retrieval-Augmented Generation (CoRAG), a framework in which the reranker and the generator act as peer decision-ma
- Mind the Style: Impact of Communication Style on Human-Chatbot Interaction
Erik Derner, Dalibor Kučera, Aditya Gulati, Ayoub Bagheri, Nuria Oliver · Feb 19, 2026 · Citations: 0
Automatic Metrics Web Browsing
Conversational agents increasingly mediate everyday digital interactions, yet the effects of their communication style on user experience and task success remain unclear.
- Modeling Distinct Human Interaction in Web Agents
Faria Huq, Zora Zhiruo Wang, Zhanqiu Guo, Venu Arvind Arangarajan, Tianyue Ou · Feb 19, 2026 · Citations: 0
Pairwise Preference Automatic Metrics Web Browsing
Despite rapid progress in autonomous web agents, human involvement remains essential for shaping preferences and correcting agent behavior as tasks unfold.
- Evaluating Chain-of-Thought Reasoning through Reusability and Verifiability
Shashank Aggarwal, Ram Vikas Mishra, Amit Awekar · Feb 19, 2026 · Citations: 0
Automatic Metrics Multi Agent
In multi-agent IR pipelines for tasks such as search and ranking, LLM-based agents exchange intermediate reasoning in terms of Chain-of-Thought (CoT) with each other.
- The Emergence of Lab-Driven Alignment Signatures: A Psychometric Framework for Auditing Latent Bias and Compounding Risk in Generative AI
Dusan Bosnjakovic · Feb 19, 2026 · Citations: 0
Automatic Metrics Multi Agent
As Large Language Models (LLMs) transition from standalone chat interfaces to foundational reasoning layers in multi-agent systems and recursive evaluation loops (LLM-as-a-judge), the detection of durable, provider-level behavioral signatur
- MALLVI: A Multi-Agent Framework for Integrated Generalized Robotics Manipulation
Iman Ahmadi, Mehrshad Taji, Arad Mahdinezhad Kashani, AmirHossein Jadidi, Saina Kashani · Feb 18, 2026 · Citations: 0
Simulation Env Multi Agent
MALLVI presents a Multi Agent Large Language and Vision framework that enables closed-loop feedback driven robotic manipulation.
- Team of Thoughts: Efficient Test-time Scaling of Agentic Systems through Orchestrated Tool Calling
Jeffrey T. H. Wong, Zixi Zhang, Junyi Liu, Yiren Zhao · Feb 18, 2026 · Citations: 0
Expert Verification Automatic Metrics Multi Agent
Existing Multi-Agent Systems (MAS) typically rely on static, homogeneous model configurations, limiting their ability to exploit the distinct strengths of differently post-trained models.
- MemoryArena: Benchmarking Agent Memory in Interdependent Multi-Session Agentic Tasks
Zexue He, Yu Wang, Churan Zhi, Yuanzhe Hu, Tzu-Ping Chen · Feb 18, 2026 · Citations: 0
Pairwise Preference Simulation Env Web Browsing
Existing evaluations of agents with memory typically assess memorization and action in isolation.
- World-Model-Augmented Web Agents with Action Correction
Zhouzhou Shen, Xueyu Hu, Xiyun Li, Tianqing Fang, Juncheng Li · Feb 17, 2026 · Citations: 0
Llm As JudgeSimulation Env Multi Agent
Web agents based on large language models have demonstrated promising capability in automating web tasks.
- The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems
Xiaoze Liu, Ruowang Zhang, Weichen Yu, Siheng Xiong, Liu He · Feb 17, 2026 · Citations: 0
Pairwise Preference Automatic Metrics Multi Agent
Multi-Agent Systems (MAS) powered by Large Language Models have unlocked advanced collaborative reasoning, yet they remain shackled by the inefficiency of discrete text communication, which imposes significant runtime overhead and informati
- Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems
Mason Nakamura, Abhinav Kumar, Saswat Das, Sahar Abdelnabi, Saaduddin Mahmud · Feb 16, 2026 · Citations: 0
Simulation Env Multi Agent
Multi-agent systems, where LLM agents communicate through free-form language, enable sophisticated coordination for solving complex cooperative tasks.
- Multi-Agent Comedy Club: Investigating Community Discussion Effects on LLM Humor Generation
Shiwei Hong, Lingyao Li, Ethan Z. Rong, Chenxinran Shen, Zhicong Lu · Feb 16, 2026 · Citations: 0
Pairwise PreferenceRubric Rating Human Eval Multi Agent
Prior work has explored multi-turn interaction and feedback for LLM writing, but evaluations still largely center on prompts and localized feedback, leaving persistent public reception in online communities underexamined.
- Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook
Ming Li, Xirui Li, Tianyi Zhou · Feb 15, 2026 · Citations: 0
Simulation Env Multi Agent
As large language model agents increasingly populate networked environments, a fundamental question arises: do artificial intelligence (AI) agent societies undergo convergence dynamics similar to human social systems?
- OR-Agent: Bridging Evolutionary Search and Structured Research for Automated Algorithm Discovery
Qi Liu, Ruochen Hao, Can Li, Wanjing Ma · Feb 14, 2026 · Citations: 0
Simulation Env Multi Agent
We present OR-Agent, a configurable multi-agent research framework designed for automated exploration in rich experimental environments.
- BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents
Huanyao Zhang, Jiepeng Zhou, Bo Li, Bowen Zhou, Yanzhe Shan · Feb 13, 2026 · Citations: 0
Automatic MetricsSimulation Env Web Browsing
Multimodal large language models (MLLMs), equipped with increasingly advanced planning and tool-use capabilities, are evolving into autonomous agents capable of performing multimodal web browsing and deep search in open-world environments.
- UI-Venus-1.5 Technical Report
Venus Team, Changlong Gao, Zhangxuan Gu, Yulin Liu, Xinyu Qiu · Feb 9, 2026 · Citations: 0
Simulation Env Long Horizon
GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging.
- VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration
Jaeyoon Jung, Yejun Yoon, Kunwoo Park · Feb 4, 2026 · Citations: 0
Automatic Metrics Multi Agent
This paper describes VILLAIN, a multimodal fact-checking system that verifies image-text claims through prompt-based multi-agent collaboration.
- INSURE-Dial: A Phase-Aware Conversational Dataset & Benchmark for Compliance Verification and Phase Detection
Shubham Kulkarni, Alexander Lyzhov, Preetam Joshi, Shiva Chaitanya · Jan 28, 2026 · Citations: 0
Automatic Metrics Web Browsing
We introduce INSURE-Dial, to our knowledge the first public benchmark for developing and assessing compliance-aware voice agents for phase-aware call auditing with span-based compliance verification.
- Between Search and Platform: ChatGPT Under the DSA
Toni Lorente, Kathrin Gardhouse · Jan 22, 2026 · Citations: 0
Automatic Metrics Web Browsing
This article examines the applicability of the Digital Services Act (DSA) to ChatGPT, arguing that it should be classified as a hybrid of the two types of hosting services: online search engines and platforms.
- Multimodal Multi-Agent Empowered Legal Judgment Prediction
Zhaolu Kang, Junhao Gong, Qingxi Chen, Hao Zhang, Jiaxin Liu · Jan 19, 2026 · Citations: 0
Simulation Env Multi Agent
Furthermore, we build JurisMM, a large dataset with over 100,000 recent Chinese judicial records, including both text and multimodal video-text data, enabling comprehensive evaluation.
- Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning
Huilin Xu, Zhuoyang Liu, Yixiang Luomei, Feng Xu · Dec 9, 2025 · Citations: 0
Simulation Env Long Horizon
Extensive experiments on the AerialVLN and OpenFly benchmark validate the effectiveness of our method.
- From Competition to Coordination: Market Making as a Scalable Framework for Safe and Aligned Multi-Agent LLM Systems
Brendan Gho, Suman Muppavarapu, Afnan Shaik, Tyson Tsay, Atharva Mohan · Nov 18, 2025 · Citations: 0
Automatic Metrics Multi Agent
As foundation models are increasingly deployed as interacting agents in multi-agent systems, their collective behavior raises new challenges for trustworthiness, transparency, and accountability.
- From Medical Records to Diagnostic Dialogues: A Clinical-Grounded Approach and Dataset for Psychiatric Comorbidity
Tianxi Wan, Jiaming Luo, Siyuan Chen, Kunyao Lan, Jianhua Chen · Oct 29, 2025 · Citations: 0
Automatic Metrics Multi Agent
To address this, we develop a novel approach integrating synthetic patient electronic medical record (EMR) construction and multi-agent diagnostic dialogue generation.
- MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation
Chengshu Li, Mengdi Xu, Arpit Bahety, Hang Yin, Yunfan Jiang · Oct 21, 2025 · Citations: 0
Demonstrations Simulation Env Long Horizon
Imitation learning from large-scale, diverse human demonstrations has been shown to be effective for training robots, but collecting such data is costly and time-consuming.
- SPACeR: Self-Play Anchoring with Centralized Reference Models
Wei-Jer Chang, Akshay Rangesh, Kevin Joseph, Matthew Strong, Masayoshi Tomizuka · Oct 20, 2025 · Citations: 0
Demonstrations Simulation Env Multi Agent
Developing autonomous vehicles (AVs) requires not only safety and efficiency, but also realistic, human-like behaviors that are socially aware and predictable.
- EpidemIQs: Prompt-to-Paper LLM Agents for Epidemic Modeling and Analysis
Mohammad Hossein Samaei, Faryad Darabi Sahneh, Lee W. Cohnstaedt, Caterina Scoglio · Sep 24, 2025 · Citations: 0
Expert Verification Llm As JudgeSimulation Env Multi Agent
We introduce EpidemIQs, a novel multi-agent LLM framework that integrates user inputs and autonomously conducts literature review, analytical derivation, network modeling, mechanistic modeling, stochastic simulations, data visualization and
- Collaborative Document Editing with Multiple Users and AI Agents
Florian Lehmann, Krystsina Shauchenka, Daniel Buschek · Sep 15, 2025 · Citations: 0
Simulation Env Multi Agent
We propose integrating AI agents directly into collaborative writing environments.
- CogniAlign: Survivability-Grounded Multi-Agent Moral Reasoning for Safe and Transparent AI
Hasin Jawad Ali, Ilhamul Azam, Ajwad Abrar, Md. Kamrul Hasan, Hasan Mahmud · Sep 14, 2025 · Citations: 0
Automatic Metrics Multi Agent
The challenge of aligning artificial intelligence (AI) with human values persists due to the abstract and often conflicting nature of moral principles and the opacity of existing approaches.
- Your AI Bosses Are Still Prejudiced: The Emergence of Stereotypes in LLM-Based Multi-Agent Systems
Jingyu Guo, Yingying Xu · Aug 27, 2025 · Citations: 0
Automatic Metrics Multi Agent
While stereotypes are well-documented in human social interactions, AI systems are often presumed to be less susceptible to such biases.
- CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures
Punya Syon Pandey, Yongjin Yang, Jiarui Liu, Zhijing Jin · Aug 16, 2025 · Citations: 0
Pairwise Preference Automatic Metrics Multi Agent
Game-theoretic interactions between agents with Large Language Models (LLMs) have revealed many emergent capabilities, yet the linguistic diversity of these interactions has not been sufficiently quantified.
- 1-2-3 Check: Enhancing Contextual Privacy in LLM via Multi-Agent Reasoning
Wenkai Li, Liwen Sun, Zhenxiang Guan, Xuhui Zhou, Maarten Sap · Aug 11, 2025 · Citations: 0
Automatic Metrics Multi Agent
We introduce a multi-agent framework that decomposes privacy reasoning into specialized subtasks (extraction, classification), reducing the information load on any single agent while enabling iterative validation and more reliable adherence
- CoAct-1: Computer-using Multi-Agent System with Coding Actions
Linxin Song, Yutong Dai, Viraj Prabhu, Jieyu Zhang, Taiwei Shi · Aug 5, 2025 · Citations: 0
Automatic Metrics Long Horizon
Autonomous agents that operate computers via Graphical User Interfaces (GUIs) often struggle with efficiency and reliability on complex, long-horizon tasks.
- GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning
Jie Peng, Jiarui Ji, Runlin Lei, Zhewei Wei, Yongchao Liu · Jul 4, 2025 · Citations: 0
Automatic Metrics Multi Agent
Additionally, prior work mainly focuses on discriminative tasks on DyTAGs, resulting in a lack of standardized task formulations and evaluation protocols tailored for DyTAG generation.
- "Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation
Amin Seffo, Aladin Djuhera, Masataro Asai, Holger Boche · Jun 4, 2025 · Citations: 0
Simulation Env Web Browsing
Recent advancements in large language models (LLMs) have spurred interest in robotic navigation that incorporates complex spatial, mathematical, and conditional constraints from natural language into the planning problem.
- Automated Web Application Testing: End-to-End Test Case Generation with Large Language Models and Screen Transition Graphs
Nguyen-Khang Le, Quan Minh Bui, Minh Ngoc Nguyen, Hiep Nguyen, Trung Vo · Jun 3, 2025 · Citations: 0
Automatic Metrics Web Browsing
Web applications are critical to modern software ecosystems, yet ensuring their reliability remains challenging due to the complexity and dynamic nature of web interfaces.
- RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
Zeyi Liao, Jaylen Jones, Linxi Jiang, Yuting Ning, Eric Fosler-Lussier · May 28, 2025 · Citations: 0
Red Team Simulation Env Web Browsing
Computer-use agents (CUAs) promise to automate complex tasks across operating systems (OS) and the web, but remain vulnerable to indirect prompt injection.
- Visual Planning: Let's Think Only with Images
Yi Xu, Chengzu Li, Han Zhou, Xingchen Wan, Caiqi Zhang · May 16, 2025 · Citations: 0
Automatic Metrics Web Browsing
In this paradigm, planning is executed via sequences of images that encode step-by-step inference in the visual domain, akin to how humans sketch or visualize future actions.
- Reshaping MOFs text mining with a dynamic multi-agents framework of large language model
Zuhong Lin, Daoyuan Ren, Kai Ran, Jing Sun, Songlin Yu · Apr 26, 2025 · Citations: 0
Automatic Metrics Multi Agent
Accurately identifying the synthesis conditions of metal-organic frameworks (MOFs) is essential for guiding experimental design, yet remains challenging because relevant information in the literature is often scattered, inconsistent, and di
- Toward Safe and Human-Aligned Game Conversational Recommendation via Multi-Agent Decomposition
Zheng Hui, Xiaokai Wei, Yexi Jiang, Kevin Gao, Chen Wang · Apr 26, 2025 · Citations: 0
Pairwise Preference Automatic Metrics Multi Agent
These domains typically involve fixed content and passive consumption, where user preferences can be matched by genre or theme.
- Can Multimodal LLMs Perform Time Series Anomaly Detection?
Xiongxiao Xu, Haoran Wang, Yueqing Liang, Philip S. Yu, Yue Zhao · Feb 25, 2025 · Citations: 0
Automatic Metrics Multi Agent
One natural way for humans to detect time series anomalies is through visualization and textual description.
- Oracular Programming: A Modular Foundation for Building LLM-Enabled Software
Jonathan Laurent, André Platzer · Feb 7, 2025 · Citations: 0
Demonstrations Automatic Metrics Web Browsing
Large Language Models can solve a wide range of tasks from just a few examples, but they remain difficult to steer and lack a capability essential for building reliable software at scale: the modular composition of computations under enforc
- Multi-agent deep reinforcement learning with centralized training and decentralized execution for transportation infrastructure management
M. Saifullah, K. G. Papakonstantinou, A. Bhattacharya, S. M. Stoffels, C. P. Andriotis · Jan 23, 2024 · Citations: 0
Simulation Env Multi Agent
To tackle the high dimensionality of state and action spaces, we propose DDMAC-CTDE, a Deep Decentralized Multi-Agent Actor-Critic (DDMAC) reinforcement learning architecture with Centralized Training and Decentralized Execution (CTDE).
- Usability Study of Security Features in Programmable Logic Controllers
Karen Li, Kopo M. Ramokapane, Awais Rashid · Aug 4, 2022 · Citations: 0
Automatic Metrics Web Browsing
Our results uncover various misperceptions about the security controls and how design constraints, e.g., safety and lack of regular updates due to the long-term nature of such systems, provide significant challenges to the realization of mo