- RLShield: Practical Multi-Agent RL for Financial Cyber Defense with Attack-Surface MDPs and Real-Time Response Orchestration
Srikumar Nayak · Feb 26, 2026 · Citations: 0
Multi Agent
This paper proposes RLShield, a practical multi-agent RL pipeline for financial cyber defense.
- France or Spain or Germany or France: A Neural Account of Non-Redundant Redundant Disjunctions
Sasha Boguraev, Qing Yao, Kyle Mahowald · Feb 26, 2026 · Citations: 0
- Humans and LLMs Diverge on Probabilistic Inferences
Gaurav Kamath, Sreenath Madathil, Sebastian Schuster, Marie-Catherine de Marneffe, Siva Reddy · Feb 26, 2026 · Citations: 0
- TaCarla: A comprehensive benchmarking dataset for end-to-end autonomous driving
Tugrul Gorgulu, Atakan Dag, M. Esat Kalfaoglu, Halil Ibrahim Kuru, Baris Can Cam · Feb 26, 2026 · Citations: 0
In addition, many real datasets struggle to evaluate their models, especially for planning tasks, since they lack a proper closed-loop evaluation setup.
- IDP Accelerator: Agentic Document Intelligence from Extraction to Compliance Validation
Md Mofijul Islam, Md Sirajus Salekin, Joe King, Priyashree Roy, Vamsi Thilak Gudi · Feb 26, 2026 · Citations: 0
Demonstrations
We present IDP (Intelligent Document Processing) Accelerator, a framework enabling agentic AI for end-to-end document intelligence with four key components: (1) DocSplit, a novel benchmark dataset and multimodal classifier using BIO tagging…
- FHIRPath-QA: Executable Question Answering over FHIR Electronic Health Records
Michael Frew, Nishit Bheda, Bryan Tripp · Feb 26, 2026 · Citations: 0
Expert Verification
In this work, we introduce FHIRPath-QA, the first open dataset and benchmark for patient-specific QA that includes open-standard FHIRPath queries over real-world clinical data.
- CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era
Zhengqing Yuan, Kaiwen Shi, Zheyuan Zhang, Lichao Sun, Nitesh V. Chawla · Feb 26, 2026 · Citations: 0
Multi Agent
Meanwhile, rapidly growing reference lists make manual verification impractical, and existing automated tools remain fragile to noisy and heterogeneous citation formats and lack standardized evaluation.
- Truncated Step-Level Sampling with Process Rewards for Retrieval-Augmented Reasoning
Chris Samarinas, Haw-Shiuan Chang, Hamed Zamani · Feb 26, 2026 · Citations: 0
Long Horizon
Second, dense, decomposed process rewards separately evaluate reasoning quality, query quality, and answer correctness on a ternary scale via an LLM judge, providing richer supervision than binary outcome signals or heuristic step-level…
- Model Agreement via Anchoring
Eric Eaton, Surbhi Goel, Marcel Hussing, Michael Kearns, Aaron Roth · Feb 26, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation
Vaibhav Agrawal, Rishubh Parihar, Pradhaan Bhat, Ravi Kiran Sarvadevabhatla, R. Venkatesh Babu · Feb 26, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport
Simon Roschmann, Paul Krzakala, Sonia Mazelet, Quentin Bouniot, Zeynep Akata · Feb 26, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- EvoX: Meta-Evolution for Automated Discovery
Shu Liu, Shubham Agarwal, Monishwaran Maheswaran, Mert Cemri, Zhifei Li · Feb 26, 2026 · Citations: 0
- Scale Can't Overcome Pragmatics: The Impact of Reporting Bias on Vision-Language Reasoning
Amita Kamath, Jack Hessel, Khyathi Chandu, Jena D. Hwang, Kai-Wei Chang · Feb 26, 2026 · Citations: 0
With a set of curated benchmarks, we demonstrate that: (i) VLMs perform poorly on the aforementioned types of reasoning suppressed in the training data by reporting bias; (ii) contrary to popular belief, scaling data size, model size, and…
- FlashOptim: Optimizers for Memory Efficient Training
Jose Javier Gonzalez Ortiz, Abhay Gupta, Chris Renard, Davis Blalock · Feb 26, 2026 · Citations: 0
- Understanding Usage and Engagement in AI-Powered Scientific Research Tools: The Asta Interaction Dataset
Dany Haddad, Dan Bareket, Joseph Chee Chang, Jay DeYoung, Jena D. Hwang · Feb 26, 2026 · Citations: 0
- Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators
Yuhao Liu, Salim Ullah, Akash Kumar · Feb 26, 2026 · Citations: 0
- Utilizing LLMs for Industrial Process Automation
Salim Fares · Feb 26, 2026 · Citations: 0
- Toward Expert Investment Teams:A Multi-Agent LLM System with Fine-Grained Trading Tasks
Kunihiro Miyazaki, Takanobu Kawahara, Stephen Roberts, Stefan Zohren · Feb 26, 2026 · Citations: 0
Pairwise Preference Multi Agent
While mainstream approaches deploy multi-agent systems mimicking analyst and manager roles, they often rely on abstract instructions that overlook the intricacies of real-world workflows, which can lead to degraded inference performance and…
- LLM Novice Uplift on Dual-Use, In Silico Biology Tasks
Chen Bo Calvin Zhang, Christina Q. Knight, Nicholas Kruus, Jason Hausenloy, Pedro Medeiros · Feb 26, 2026 · Citations: 0
Large language models (LLMs) perform increasingly well on biology benchmarks, but it remains unclear whether they uplift novice users -- i.e., enable humans to perform better than with internet-only resources.
- Generalized Rapid Action Value Estimation in Memory-Constrained Environments
Aloïs Rautureau, Tristan Cazenave, Éric Piette · Feb 26, 2026 · Citations: 0
- Invariant Transformation and Resampling based Epistemic-Uncertainty Reduction
Sha Hu · Feb 26, 2026 · Citations: 0
- Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction
Rafael R. Baptista, André de Lima Salgado, Ricardo V. Godoy, Marcelo Becker, Thiago Boaventura · Feb 26, 2026 · Citations: 0
- The logic of KM belief update is contained in the logic of AGM belief revision
Giacomo Bonanno · Feb 26, 2026 · Citations: 0
Critique Edit
Denoting the latter by \mathcal L_{AGM} and the former by \mathcal L_{KM} we show that every axiom of \mathcal L_{KM} is a theorem of \mathcal L_{AGM}.
- A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations
Soumya Dutta, Smruthi Balaji, Sriram Ganapathy · Feb 26, 2026 · Citations: 0
Experiments on three benchmark datasets-IEMOCAP, MELD, and MOSI-show that our proposal achieves 70.9%, 69.5%, and 87.9% weighted F1-scores respectively, outperforming several baseline speech-text ERC systems.
- Conformalized Neural Networks for Federated Uncertainty Quantification under Dual Heterogeneity
Quang-Huy Nguyen, Jiaqi Wang, Wei-Shinn Ku · Feb 26, 2026 · Citations: 0
- SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables
Sungho Park, Jueun Kim, Wook-Shin Han · Feb 26, 2026 · Citations: 0
We present SPARTA, an end-to-end construction framework that automatically generates large-scale Table-Text QA benchmarks with lightweight human validation, requiring only one quarter of the annotation time of HybridQA.
- ODEBrain: Continuous-Time EEG Graph for Modeling Dynamic Brain Networks
Haohui Jia, Zheng Chen, Lingwei Zhu, Rikuto Kotoge, Jathurshan Pradeepkumar · Feb 26, 2026 · Citations: 0
- CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays
Hyungyung Lee, Hangyul Yoon, Edward Choi · Feb 26, 2026 · Citations: 0
- Evaluating Stochasticity in Deep Research Agents
Haotian Zhai, Elias Stengel-Eskin, Pratik Patil, Liu Leqi · Feb 26, 2026 · Citations: 0
- Discourse-Aware Dual-Track Streaming Response for Low-Latency Spoken Dialogue Systems
Siyuan Liu, Jiahui Xu, Feng Jiang, Kuang Wang, Zefeng Zhao · Feb 26, 2026 · Citations: 0
Achieving human-like responsiveness is a critical yet challenging goal for cascaded spoken dialogue systems.
- Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving
Jiangxin Sun, Feng Xue, Teng Long, Chang Liu, Jian-Fang Hu · Feb 26, 2026 · Citations: 0
Demonstrations
Practically, RaWMPC leverages a world model to predict the consequences of multiple candidate actions and selects low-risk actions through explicit risk evaluation.
- AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning
Yutong Wang, Siyuan Xiong, Xuebo Liu, Wenkang Zhou, Liang Ding · Feb 26, 2026 · Citations: 0
Multi Agent
We propose AgentDropoutV2, a test-time rectify-or-reject pruning framework designed to dynamically optimize MAS information flow without retraining.
- Mitigating Legibility Tax with Decoupled Prover-Verifier Games
Yegon Kim, Juho Lee · Feb 26, 2026 · Citations: 0
- A Model-Free Universal AI
Yegon Kim, Juho Lee · Feb 26, 2026 · Citations: 0
- Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive
Radha Sarma · Feb 26, 2026 · Citations: 0
This paper demonstrates that assumption is formally invalid for optimization-based systems, specifically Large Language Models trained via Reinforcement Learning from Human Feedback (RLHF).
- Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents
Zhou Xu, Bowen Zhou, Qi Wang, Shuwen Feng, Jingyu Xiao · Feb 26, 2026 · Citations: 0
Web Browsing
Pure-vision GUI agents provide universal interaction capabilities but suffer from severe efficiency bottlenecks due to the massive spatiotemporal redundancy inherent in high-resolution screenshots and historical trajectories.
- Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments
Evangelia Christakopoulou, Vivekkumar Patel, Hemanth Velaga, Sandip Gaikwad · Feb 26, 2026 · Citations: 0
- ReCoN-Ipsundrum: An Inspectable Recurrent Persistence Loop Agent with Affect-Coupled Control and Mechanism-Linked Consciousness Indicator Assays
Aishik Sanyal · Feb 26, 2026 · Citations: 0
Pairwise Preference
Inspired by Humphrey's ipsundrum hypothesis, we implement ReCoN-Ipsundrum, an inspectable agent that extends a ReCoN state machine with a recurrent persistence loop over sensory salience Ns and an optional affect proxy reporting…
- MovieTeller: Tool-augmented Movie Synopsis with ID Consistent Progressive Abstraction
Yizhi Li, Xiaohan Chen, Miao Jiang, Wentao Tang, Gaoang Wang · Feb 26, 2026 · Citations: 0
- Why Diffusion Language Models Struggle with Truly Parallel (Non-Autoregressive) Decoding?
Pengxiang Li, Dilxat Muhtar, Tianlong Chen, Lu Yin, Shiwei Liu · Feb 26, 2026 · Citations: 0
Across math reasoning benchmarks, NAP yields stronger performance under parallel decoding than DLMs trained on standard long CoT data, with gains growing as parallelism increases.
- ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation
Junhu Fu, Shuyu Liang, Wutong Li, Chen Ma, Peng Huang · Feb 26, 2026 · Citations: 0
- InnerQ: Hardware-aware Tuning-free Quantization of KV Cache for Large Language Models
Sayed Mohammadreza Tayaranian Hosseini, Amir Ardakani, Warren J. Gross · Feb 26, 2026 · Citations: 0
Our evaluation experiments on Llama models shows that InnerQ maintains a few-shot GSM8K performance comparable to non-quantized KV caches and surpasses prior KV cache quantization methods.
- SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation
Jiahao Zhao, Feng Jiang, Shaowei Qin, Zhonghui Zhang, Junhao Liu · Feb 26, 2026 · Citations: 0
- Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models
Chungpa Lee, Jy-yong Sohn, Kangwook Lee · Feb 26, 2026 · Citations: 0
Demonstrations
We show that fine-tuning all attention parameters can harm in-context learning, whereas restricting updates to the value matrix improves zero-shot performance while preserving in-context learning.
- ESAA: Event Sourcing for Autonomous Agents in LLM-Based Software Engineering
Elzo Brito dos Santos Filho · Feb 26, 2026 · Citations: 0
- MTRAG-UN: A Benchmark for Open Challenges in Multi-Turn RAG Conversations
Sara Rosenthal, Yannis Katsis, Vraj Shah, Lihong He, Lucian Popa · Feb 26, 2026 · Citations: 0
We present MTRAG-UN, a benchmark for exploring open challenges in multi-turn retrieval augmented generation, a popular use of large language models.
- Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking
Maximilian Luz, Rohit Mohan, Thomas Nürnberg, Yakov Miron, Daniele Cattaneo · Feb 26, 2026 · Citations: 0
- A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
Usman Anwar, Julianna Piskorz, David D. Baek, David Africa, Jim Weatherall · Feb 26, 2026 · Citations: 0
Our central insight is that steganography creates an asymmetry in usable information between agents who can and cannot decode the hidden content (present within a steganographic signal), and this otherwise latent asymmetry can be inferred…
- PATRA: Pattern-Aware Alignment and Balanced Reasoning for Time Series Question Answering
Junkai Lu, Peng Chen, Xingjian Wu, Yang Shu, Chenjuan Guo · Feb 26, 2026 · Citations: 0
- Efficient Encoder-Free Fourier-based 3D Large Multimodal Model
Guofeng Mei, Wei Lin, Luigi Riz, Yujiao Wu, Yiming Wang · Feb 26, 2026 · Citations: 0
- The Trinity of Consistency as a Defining Principle for General World Models
Jingxuan Wei, Siyuan Li, Yuhang Xu, Zheng Sun, Junjie Jiang · Feb 26, 2026 · Citations: 0
Long Horizon
To complement this conceptual framework, we introduce CoW-Bench, a benchmark centered on multi-frame reasoning and generation scenarios.
- On Sample-Efficient Generalized Planning via Learned Transition Models
Nitin Gupta, Vishal Pallagani, John A. Aydin, Biplav Srivastava · Feb 26, 2026 · Citations: 0
- Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs
Jayadev Billa · Feb 26, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- DyGnROLE: Modeling Asymmetry in Dynamic Graphs with Node-Role-Oriented Latent Encoding
Tyler Bonnet, Marek Rei · Feb 26, 2026 · Citations: 0
- SvfEye: A Semantic-Visual Fusion Framework with Multi-Scale Visual Context for Multimodal Reasoning
Yuxiang Shen, Hailong Huang, Zhenkun Gao, Xueheng Li, Man Zhou · Feb 26, 2026 · Citations: 0
- Multi-Agent Large Language Model Based Emotional Detoxification Through Personalized Intensity Control for Consumer Protection
Keito Inoshita · Feb 26, 2026 · Citations: 0
- Automated Vulnerability Detection in Source Code Using Deep Representation Learning
C. Seas, G. Fitzpatrick, J. A. Hamilton, M. C. Carlisle · Feb 26, 2026 · Citations: 0
- Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation
Xiaosen Wang, Zhijin Ge, Bohan Liu, Zheng Fang, Fengfan Zhou · Feb 26, 2026 · Citations: 0
- Three AI-agents walk into a bar . . . . `Lord of the Flies' tribalism emerges among smart AI-Agents
Dhwanil M. Mori, Neil F. Johnson · Feb 26, 2026 · Citations: 0
Near-future infrastructure systems may be controlled by autonomous AI agents that repeatedly request access to limited resources such as energy, bandwidth, or computing power.
- Enhancing CVRP Solver through LLM-driven Automatic Heuristic Design
Zhuoliang Xie, Fei Liu, Zhenkun Wang, Qingfu Zhang · Feb 26, 2026 · Citations: 0