- SODIUM: From Open Web Data to Queryable Databases
Chuxuan Hu, Philip Li, Maxwell Yang, Daniel Kang · Mar 19, 2026 · Citations: 0
Expert Verification Automatic Metrics Multi Agent
Existing systems struggle with SODIUM tasks: we evaluate 6 advanced AI agents on SODIUM-Bench, with the strongest baseline achieving only 46.5% accuracy.
- Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Language Navigation
Swagat Padhan, Lakshya Jain, Bhavya Minesh Shah, Omkar Patil, Thao Nguyen · Mar 19, 2026 · Citations: 0
Demonstrations Simulation Env Multi Agent
To address this limitation, we propose MAPG (Multi-Agent Probabilistic Grounding), an agentic framework that decomposes language queries into structured subcomponents and queries a VLM to ground each component.
- I Can't Believe It's Corrupt: Evaluating Corruption in Multi-Agent Governance Systems
Vedanta S P, Ponnurangam Kumaraguru · Mar 19, 2026 · Citations: 0
Rubric Rating Simulation Env Multi Agent
Large language models are increasingly proposed as autonomous agents for high-stakes public workflows, yet we lack systematic evidence about whether they would follow institutional rules when granted authority.
- Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants
Alejandro Breen Herrera, Aayush Sheth, Steven G. Xu, Zhucheng Zhan, Charles Wright · Mar 3, 2026 · Citations: 0
Pairwise PreferenceRubric Rating Llm As JudgeSimulation Env Long Horizon
Conversational shopping assistants (CSAs) represent a compelling application of agentic AI, but moving from prototype to production reveals two underexplored challenges: how to evaluate multi-turn interactions and how to optimize tightly…
- WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Stealthy Context-Based Inference
Zixun Xiong, Gaoyi Wu, Lingfeng Yao, Miao Pan, Xiaojiang Du · Mar 11, 2026 · Citations: 0
Red Team Automatic Metrics Multi Agent
Communication topology is a critical factor in the utility and safety of LLM-based multi-agent systems (LLM-MAS), making it a high-value intellectual property (IP) whose confidentiality remains insufficiently studied.
- Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives
Changgeon Ko, Jisu Shin, Hoyun Song, Huije Lee, Eui Jun Hwang · Apr 7, 2026 · Citations: 0
Automatic MetricsSimulation Env Multi Agent
Large language model (LLM) agents are increasingly acting as human delegates in multi-agent environments, where a representative agent integrates diverse peer perspectives to make a final decision.
- Spatio-Temporal Attention Enhanced Multi-Agent DRL for UAV-Assisted Wireless Networks with Limited Communications
Che Chen, Lanhua Li, Shimin Gong, Yu Zhao, Yuming Fang · Mar 23, 2026 · Citations: 0
Simulation Env Long Horizon
To maximize the overall throughput, we first propose a delay-tolerant multi-agent deep reinforcement learning (MADRL) algorithm that integrates a delay-penalized reward to encourage information sharing among UAVs, while jointly optimizing…
- OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation
Xiaomeng Hu, Yinger Zhang, Fei Huang, Jianhong Tu, Yang Su · Apr 13, 2026 · Citations: 0
Simulation Env Multi Agent
We introduce OccuBench, a benchmark covering 100 real-world professional task scenarios across 10 industry categories and 65 specialized domains, enabled by Language Environment Simulators (LESs) that simulate domain-specific environments…
- ActionParty: Multi-Subject Action Binding in Generative Video Games
Alexander Pondaven, Ziyi Wu, Igor Gilitschenski, Philip Torr, Sergey Tulyakov · Apr 2, 2026 · Citations: 0
Automatic MetricsSimulation Env Multi Agent
However, these models are largely restricted to single-agent settings, failing to control multiple agents simultaneously in a scene.
- Multi-Agent Dialectical Refinement for Enhanced Argument Classification
Jakub Bąba, Jarosław A. Chudziak · Mar 29, 2026 · Citations: 0
Llm As JudgeAutomatic Metrics Multi Agent
We introduce MAD-ACC (Multi-Agent Debate for Argument Component Classification), a framework that leverages dialectical refinement to resolve classification uncertainty.
- Diff-KD: Diffusion-based Knowledge Distillation for Collaborative Perception under Corruptions
Pengcheng Lyu, Chaokun Zhang, Gong Chen, Tao Tang, Zhaoxiang Luo · Apr 2, 2026 · Citations: 0
Automatic Metrics Multi Agent
Multi-agent collaborative perception enables autonomous systems to overcome individual sensing limits through collective intelligence.
- Learning to Negotiate: Multi-Agent Deliberation for Collective Value Alignment in LLMs
Panatchakorn Anantaprayoon, Nataliia Babina, Nima Asgharbeygi, Jad Tarifi · Mar 11, 2026 · Citations: 0
Rlaif Or Synthetic Feedback Multi Agent
The alignment of large language models (LLMs) has progressed substantially in single-agent settings through paradigms such as RLHF and Constitutional AI, with recent work exploring scalable alternatives such as RLAIF and evolving alignment…
- S5-SHB Agent: Society 5.0 enabled Multi-model Agentic Blockchain Framework for Smart Home
Janani Rangila, Akila Siriweera, Incheon Paik, Keitaro Naruse, Isuru Jayanada · Mar 5, 2026 · Citations: 0
Pairwise Preference Multi Agent
The smart home is a key application domain within the Society 5.0 vision for a human-centered society.
- QChunker: Learning Question-Aware Text Chunking for Domain RAG via Multi-Agent Debate
Jihao Zhao, Daixuan Li, Pengfei Li, Shuaishuai Zu, Biao Qin · Mar 12, 2026 · Citations: 0
Automatic Metrics Multi Agent
Drawing inspiration from Hal Gregersen's "Questions Are the Answer" theory, we design a multi-agent debate framework comprising four specialized components: a question outline generator, text segmenter, integrity reviewer, and knowledge…
- Heterogeneous Debate Engine: Identity-Grounded Cognitive Architecture for Resilient LLM-Based Ethical Tutoring
Jakub Masłowski, Jarosław A. Chudziak · Mar 28, 2026 · Citations: 0
Simulation Env Multi Agent
Large Language Models (LLMs) are being increasingly used as autonomous agents in complex reasoning tasks, opening the niche for dialectical interactions.
- GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents
Yunzhe Wang, Runhui Xu, Kexin Zheng, Tianyi Zhang, Jayavibhav Niranjan Kogundi · Mar 25, 2026 · Citations: 0
Simulation Env Multi Agent
Multimodal LLMs are increasingly deployed as perceptual backbones for autonomous agents in 3D environments, from robotics to virtual worlds.
- Influencing LLM Multi-Agent Dialogue via Policy-Parameterized Prompts
Hongbo Bo, Jingyu Hu, Weiru Liu · Mar 10, 2026 · Citations: 0
Simulation Env Multi Agent
Large Language Models (LLMs) have emerged as a new paradigm for multi-agent systems.
- Alignment Backfire: Language-Dependent Reversal of Safety Interventions Across 16 Languages in LLM Multi-Agent Systems
Hiroki Fukui · Mar 5, 2026 · Citations: 0
Simulation Env Multi Agent
We report four preregistered studies (1,584 multi-agent simulations across 16 languages and three model families) demonstrating that alignment interventions in large language models produce a structurally analogous phenomenon: surface…
- Learning to Interrupt in Language-based Multi-agent Communication
Danqing Wang, Da Yin, Ruta Desai, Lei Li, Asli Celikyilmaz · Apr 7, 2026 · Citations: 0
Automatic Metrics Multi Agent
Motivated by this, we propose an interruptible communication framework that allows the agent who is listening to interrupt the current speaker.
- Towards Automated Community Notes Generation with Large Vision Language Models for Combating Contextual Deception
Jin Ma, Jingwen Yan, Mohammed Aldeen, Ethan Anderson, Taran Kavuru · Mar 23, 2026 · Citations: 0
Automatic Metrics Multi Agent
However, its reliance on human contributors limits both the timeliness and scalability.
- Governed Memory: A Production Architecture for Multi-Agent Workflows
Hamed Taheri · Mar 18, 2026 · Citations: 0
Automatic Metrics Long Horizon
Enterprise AI deploys dozens of autonomous agent nodes across workflows, each acting on the same entities with no shared memory and no common governance.
- Semantic Invariance in Agentic AI
I. de Zarzà, J. de Curtò, Jordi Cabot, Pietro Manzoni, Carlos T. Calafate · Mar 13, 2026 · Citations: 0
Automatic Metrics Long Horizon
Standard benchmark evaluations, which assess accuracy on fixed, canonical problem formulations, fail to capture this critical reliability dimension.
- From Debate to Deliberation: Structured Collective Reasoning with Typed Epistemic Acts
Sunil Prakash · Mar 12, 2026 · Citations: 0
Automatic Metrics Multi Agent
Multi-agent LLM systems increasingly tackle complex reasoning, yet their interaction patterns remain limited to voting, unstructured debate, or pipeline orchestration.
- Chow-Liu Ordering for Long-Context Reasoning in Chain-of-Agents
Naman Gupta, Vaibhav Singh, Arun Iyer, Kirankumar Shiragur, Pratham Grover · Mar 10, 2026 · Citations: 0
Automatic Metrics Multi Agent
Sequential multi-agent reasoning frameworks such as Chain-of-Agents (CoA) handle long-context queries by decomposing inputs into chunks and processing them sequentially using LLM-based worker agents that read from and update a bounded…
- LieCraft: A Multi-Agent Framework for Evaluating Deceptive Capabilities in Language Models
Matthew Lyle Olson, Neale Ratzlaff, Musashi Hinck, Tri Nguyen, Vasudev Lal · Mar 6, 2026 · Citations: 0
Automatic Metrics Multi Agent
Large Language Models (LLMs) exhibit impressive general-purpose capabilities but also introduce serious safety risks, particularly the potential for deception as models acquire increased agency and human oversight diminishes.