- L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search
Ziqi Wang, Boqin Yuan · Aug 31, 2025 · Citations: 0
Multi Agent
We present L-MARS (Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search), a multi-agent retrieval framework for grounded legal question answering that decomposes queries into structured sub-problems, retrieves evidence…
- When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment
Hanqi Yan, Hainiu Xu, Siya Qi, Shu Yang, Yulan He · Aug 30, 2025 · Citations: 0
- Estimating Parameter Fields in Multi-Physics PDEs from Scarce Measurements
Xuyang Li, Mahdi Masmoudi, Rami Gharbi, Nizar Lajnef, Vishnu Naresh Boddeti · Aug 29, 2025 · Citations: 0
- On the Theoretical Limitations of Embedding-Based Retrieval
Orion Weller, Michael Boratko, Iftekhar Naim, Jinhyuk Lee · Aug 28, 2025 · Citations: 0
These new benchmarks push embeddings to work for any query and any notion of relevance that could be given.
- EO-1: An Open Unified Embodied Foundation Model for General Robot Control
Delin Qu, Haoming Song, Qizhi Chen, Zhaoqing Chen, Xianqiang Gao · Aug 28, 2025 · Citations: 0
Long Horizon
The human ability to seamlessly perform multimodal reasoning and physical interaction in the open world is a core goal for general purpose embodied intelligent systems.
- AVIATOR: Towards AI-Agentic Vulnerability Injection Workflow for High-Fidelity, Large-Scale Code Security Dataset
Amine Lbath, Massih-Reza Amini, Aurelien Delaitre, Vadim Okun · Aug 28, 2025 · Citations: 0
In this paper, we present AVIATOR, the first AI-agentic vulnerability injection framework.
- From Guidelines to Guarantees: A Graph-Based Evaluation Harness for Domain-Specific Evaluation of LLMs
Jessica M. Lundin, Usman Nasir Nakakana, Guillaume Chabot-Couture · Aug 28, 2025 · Citations: 0
Rigorous evaluation of domain-specific language models requires benchmarks that are comprehensive, contamination-resistant, and maintainable.
- Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP
Lorenz Hufe, Constantin Venhoff, Erblina Purelku, Maximilian Dreyer, Sebastian Lapuschkin · Aug 28, 2025 · Citations: 0
Red Team
These models serve as suitable drop-in replacements for a broad range of safety-critical applications, where the risks of text-based manipulation outweigh the utility of text recognition.
- NPG-Muse: Scaling Long Chain-of-Thought Reasoning with NP-Hard Graph Problems
Yuyao Wang, Bowen Liu, Jianheng Tang, Nuo Chen, Yuhan Li · Aug 28, 2025 · Citations: 0
However, developing these Long CoT behaviors relies heavily on post-training with high-quality datasets, which are typically costly and human-curated (e.g., mathematics and code), leaving scalable alternatives unexplored.
- AgentCoMa: A Compositional Benchmark Mixing Commonsense and Mathematical Reasoning in Real-World Scenarios
Lisa Alazraki, Lihu Chen, Ana Brassard, Joe Stacey, Hossein A. Rahmani · Aug 27, 2025 · Citations: 0
In this work, we introduce an Agentic Commonsense and Math benchmark (AgentCoMa), where each compositional task requires a commonsense reasoning step and a math reasoning step.
- Diffusion Language Models Know the Answer Before Decoding
Pengxiang Li, Yefan Zhou, Dilxat Muhtar, Lu Yin, Shilin Yan · Aug 27, 2025 · Citations: 0
Empirical evaluations of LLaDA-8B and Dream-7B across multiple tasks show that Prophet reduces the number of decoding steps by up to 3.4x while preserving high generation quality.
- Your AI Bosses Are Still Prejudiced: The Emergence of Stereotypes in LLM-Based Multi-Agent Systems
Jingyu Guo, Yingying Xu · Aug 27, 2025 · Citations: 0
Multi Agent
While stereotypes are well-documented in human social interactions, AI systems are often presumed to be less susceptible to such biases.
- The Information Dynamics of Generative Diffusion
Dejan Stancevic, Luca Ambrogioni · Aug 27, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Language and Experience: A Computational Model of Social Learning in Complex Tasks
Cédric Colas, Tracey Mills, Ben Prystawski, Michael Henry Tessler, Noah Goodman · Aug 26, 2025 · Citations: 0
The ability to combine linguistic guidance from others with direct experience is central to human development, enabling safe and rapid learning in new environments.
- Hybrid Deep Searcher: Scalable Parallel and Sequential Search Reasoning
Dayoon Ko, Jihyuk Kim, Haeju Park, Sohyeon Kim, Dahyun Lee · Aug 26, 2025 · Citations: 0
Long Horizon
Large reasoning models (LRMs) combined with retrieval-augmented generation (RAG) have enabled deep research agents capable of multi-step reasoning with external knowledge retrieval.
- LaTeXTrans: Structured LaTeX Translation with Multi-Agent Coordination
Ziming Zhu, Chenglong Wang, Haosong Xv, Shunjie Xing, Yifu Huo · Aug 26, 2025 · Citations: 0
Demonstrations Multi Agent
In this paper, we introduce LaTeXTrans, a collaborative multi-agent system designed to address this challenge.
- VistaWise: Building Cost-Effective Agent with Cross-Modal Knowledge Graph for Minecraft
Honghao Fu, Junlong Ren, Qi Chai, Deheng Ye, Yujun Cai · Aug 26, 2025 · Citations: 0
- Latent Self-Consistency for Reliable Majority-Set Selection in Short- and Long-Answer Reasoning
Jungsuk Oh, Jay-Yoon Lee · Aug 25, 2025 · Citations: 0
- Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center Dialogue Generation
Rishikesh Devanathan, Varun Nathan, Ayush Kumar · Aug 25, 2025 · Citations: 0
In this work, we benchmark multiple generation strategies guided by structured supervision on call attributes (Intent Summaries, Topic Flows, and Quality Assurance (QA) Forms) across multiple languages.
- Agri-Query: A Case Study on RAG vs. Long-Context LLMs for Cross-Lingual Technical Question Answering
Julius Gun, Timo Oksanen · Aug 25, 2025 · Citations: 0
Our benchmark is built on a user manual for an agricultural machine, available in English, French, and German.
- How Quantization Shapes Bias in Large Language Models
Federico Marcuzzi, Xuefei Ning, Roy Schwartz, Iryna Gurevych · Aug 25, 2025 · Citations: 0
This work presents a comprehensive evaluation of how quantization affects model bias, with particular attention to its impact on individual demographic subgroups.