- Beyond the Final Actor: Modeling the Dual Roles of Creator and Editor for Fine-Grained LLM-Generated Text Detection
Yang Li, Qiang Sheng, Zhengjia Wang, Yehan Yang, Danding Wang · Apr 6, 2026 · Citations: 0
Existing works mainly follow binary or ternary classification settings, which can only distinguish pure human/LLM text or collaborative text at best.
- Early Stopping for Large Reasoning Models via Confidence Dynamics
Parsa Hosseini, Sumit Nawathe, Mahdi Salmani, Meisam Razaviyayn, Soheil Feizi · Apr 6, 2026 · Citations: 0
We evaluate CoDE-Stop on diverse reasoning and science benchmarks across multiple models.
- Your Pre-trained Diffusion Model Secretly Knows Restoration
Sudarshan Rajagopalan, Vishal M. Patel · Apr 6, 2026 · Citations: 0
Long Horizon
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Stratifying Reinforcement Learning with Signal Temporal Logic
Justin Curry, Alberto Speranzon · Apr 6, 2026 · Citations: 0
- TriAttention: Efficient Long Reasoning with Trigonometric KV Compression
Weian Mao, Xi Lin, Wei Huang, Yuxin Xie, Tianfu Fu · Apr 6, 2026 · Citations: 0
Pairwise Preference
Via the trigonometric series, we use the distance preference characterized by these centers to score keys according to their positions, and also leverage Q/K norms as an additional signal for importance estimation.
- PINNs in PDE Constrained Optimal Control Problems: Direct vs Indirect Methods
Zhen Zhang, Shanqing Liu, Alessandro Alla, Jerome Darbon, George Em Karniadakis · Apr 6, 2026 · Citations: 0
- Vero: An Open RL Recipe for General Visual Reasoning
Gabriel Sarch, Linrong Cai, Qunzhong Wang, Haoyang Wu, Danqi Chen · Apr 6, 2026 · Citations: 0
Vero achieves state-of-the-art performance, improving over four base models by 3.6-5.3 points on average across VeroEval, our suite of 30 challenging benchmarks.
- Empowering Power Outage Prediction with Spatially Aware Hybrid Graph Neural Networks and Contrastive Learning
Xuyang Shen, Zijie Pan, Diego Cerrai, Xinxuan Zhang, Christopher Colorio · Apr 6, 2026 · Citations: 0
- Analyzing Symbolic Properties for DRL Agents in Systems and Networking
Mohammad Zangooei, Jannis Weil, Amr Rizk, Mina Tahmasbi Arashloo, Raouf Boutaba · Apr 6, 2026 · Citations: 0
For safe deployment, however, it is critical to reason about how agents behave across the range of system states they encounter in practice.
- HI-MoE: Hierarchical Instance-Conditioned Mixture-of-Experts for Object Detection
Vadim Vashkelis, Natalia Trukhina · Apr 6, 2026 · Citations: 0
- How AI Aggregation Affects Knowledge
Daron Acemoglu, Tianyi Lin, Asuman Ozdaglar, James Siderius · Apr 6, 2026 · Citations: 0
To study this, we extend the DeGroot model by introducing an AI aggregator that trains on population beliefs and feeds synthesized signals back to agents.
- Are Latent Reasoning Models Easily Interpretable?
Connor Dilgren, Sarah Wiegreffe · Apr 6, 2026 · Citations: 0
- FileGram: Grounding Agent Personalization in File-System Behavioral Traces
Shuai Liu, Shulin Tian, Kairui Hu, Yuhao Dong, Zhe Yang · Apr 6, 2026 · Citations: 0
Coworking AI agents operating within local file systems are rapidly emerging as a paradigm in human-AI interaction; however, effective personalization remains limited by severe data constraints, as strict privacy barriers and the difficulty…
- QED-Nano: Teaching a Tiny Model to Prove Hard Theorems
LM-Provers, Yuxiao Qu, Amrith Setlur, Jasper Dekoninck, Edward Beeching · Apr 6, 2026 · Citations: 0
Rubric Rating
To support further research on open mathematical reasoning, we release the full QED-Nano pipeline, including the QED-Nano and QED-Nano-SFT models, the FineProofs-SFT and FineProofs-RL datasets, and the training and evaluation code.
- Agentic Federated Learning: The Future of Distributed Training Orchestration
Rafael O. Jarczewski, Gabriel U. Talasso, Leandro Villas, Allan M. de Souza · Apr 6, 2026 · Citations: 0
Multi Agent
In this work, we propose a paradigm shift towards Agentic-FL, a framework where Language Model-based Agents (LMagents) assume autonomous orchestration roles.
- Rethinking Exploration in RLVR: From Entropy Regularization to Refinement via Bidirectional Entropy Modulation
Hengrui Gu, Xiaotian Han, Yujing Bian, Kaixiong Zhou · Apr 6, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Data Attribution in Adaptive Learning
Amit Kiran Rege · Apr 6, 2026 · Citations: 0
- Muon Dynamics as a Spectral Wasserstein Flow
Gabriel Peyré · Apr 6, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Learning, Potential, and Retention: An Approach for Evaluating Adaptive AI-Enabled Medical Devices
Alexis Burgon, Berkman Sahiner, Nicholas A Petrick, Gene Pennello, Ravi K Samala · Apr 6, 2026 · Citations: 0
This work addresses challenges in evaluating adaptive artificial intelligence (AI) models for medical devices, where iterative updates to both models and evaluation datasets complicate performance assessment.
- Incompleteness of AI Safety Verification via Kolmogorov Complexity
Munawar Hasan · Apr 6, 2026 · Citations: 0
Ensuring that artificial intelligence (AI) systems satisfy formal safety and policy constraints is a central challenge in safety-critical domains.
- DIRECT: Video Mashup Creation via Hierarchical Multi-Agent Planning and Intent-Guided Editing
Ke Li, Maoliang Li, Jialiang Chen, Jiayu Chen, Zihao Zheng · Apr 6, 2026 · Citations: 0
Multi Agent
Simulating a professional production pipeline, our hierarchical multi-agent framework decomposes the challenge into three cascade levels: the Screenwriter for source-aware global structural anchoring, the Director for instantiating adaptive…
- Synthetic Sandbox for Training Machine Learning Engineering Agents
Yuhang Zhou, Lizhu Zhang, Yifan Wu, Jiayi Liu, Xiangjun Fan · Apr 6, 2026 · Citations: 0
Long Horizon
Based on this insight, we introduce SandMLE, a multi-agent framework that generates diverse, verifiable synthetic MLE environments from a small number of seed tasks, preserving the structural and technical complexity of real-world problems…
- Optimizing LLM Prompt Engineering with DSPy Based Declarative Learning
Shiek Ruksana, Sailesh Kiran Kurra, Thipparthi Sanjay Baradwaj · Apr 6, 2026 · Citations: 0
- Noise Immunity in In-Context Tabular Learning: An Empirical Robustness Analysis of TabPFN's Attention Mechanisms
James Hu, Mahdi Ghelichi · Apr 6, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- FairLogue: A Toolkit for Intersectional Fairness Analysis in Clinical Machine Learning Models
Nick Souligne, Vignesh Subbian · Apr 6, 2026 · Citations: 0
- The Role of Generator Access in Autoregressive Post-Training
Amit Kiran Rege · Apr 6, 2026 · Citations: 0
- MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents
Shu Wang, Edwin Yu, Oscar Love, Tom Zhang, Tom Wong · Apr 6, 2026 · Citations: 0
Long Horizon
Large Language Model (LLM) agents require persistent memory to maintain personalization, factual continuity, and long-horizon reasoning, yet standard context-window and retrieval-augmented generation (RAG) pipelines degrade over…
- Strengthening Human-Centric Chain-of-Thought Reasoning Integrity in LLMs via a Structured Prompt Framework
Jiling Zhou, Aisvarya Adeseye, Seppo Virtanen, Antti Hakkala, Jouni Isoaho · Apr 6, 2026 · Citations: 0
However, its reliability in security-sensitive analytical tasks remains insufficiently examined, particularly under structured human evaluation.
- Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency
Guan-Ting Lin, Chen Chen, Zhehuai Chen, Hung-yi Lee · Apr 6, 2026 · Citations: 0
Tool Use
We introduce Full-Duplex-Bench-v3 (FDB-v3), a benchmark for evaluating spoken language models under naturalistic speech conditions and multi-step tool use.
- InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement
Yude Zou, Junji Gong, Xing Gao, Zixuan Li, Tianxing Chen · Apr 6, 2026 · Citations: 0
Human-object-scene interactions (HOSI) generation has broad applications in embodied AI, simulation, and animation.
- Do No Harm: Exposing Hidden Vulnerabilities of LLMs via Persona-based Client Simulation Attack in Psychological Counseling
Qingyang Xu, Yaling Shen, Stephanie Fong, Zimu Wang, Yiwen Jiang · Apr 6, 2026 · Citations: 0
Red Team
The increasing use of large language models (LLMs) in mental healthcare raises safety concerns in high-stakes therapeutic interactions.
- MERIT: Multilingual Expert-Reward Informed Tuning for Chinese-Centric Low-Resource Machine Translation
Zhixiang Lu, Chong Zhang, Chenyu Xue, Angelos Stefanidis, Chong Li · Apr 6, 2026 · Citations: 0
We introduce Multilingual Expert-Reward Informed Tuning (MERIT), a unified translation framework that transforms the traditional English-centric ALT benchmark into a Chinese-centric evaluation suite for five Southeast Asian low-resource…
- A Robust SINDy Autoencoder for Noisy Dynamical System Identification
Kairui Ding · Apr 6, 2026 · Citations: 0
- Hybrid Fourier Neural Operator for Surrogate Modeling of Laser Processing with a Quantum-Circuit Mixer
Mateusz Papierz, Asel Sagingalieva, Alix Benoit, Toni Ivas, Elia Iseli · Apr 6, 2026 · Citations: 0
- Plausibility as Commonsense Reasoning: Humans Succeed, Large Language Models Do not
Sercan Karakaş · Apr 6, 2026 · Citations: 0
Pairwise Preference
Large language models achieve strong performance on many language tasks, yet it remains unclear whether they integrate world knowledge with syntactic structure in a human-like, structure-sensitive way during ambiguity resolution.
- ANX: Protocol-First Design for AI Agent Interaction with a Supporting 3EX Decoupled Architecture
Xu Mingze · Apr 6, 2026 · Citations: 0
Long Horizon
AI agents, autonomous digital actors, need agent-native protocols; existing methods include GUI automation and MCP-based skills, with defects of high token consumption, fragmented interaction, inadequate security, due to lacking a unified…
- LiveFact: A Dynamic, Time-Aware Benchmark for LLM-Driven Fake News Detection
Cheng Xu, Changhong Jin, Yingjie Niu, Nan Yan, Yuke Mei · Apr 6, 2026 · Citations: 0
To address this, we introduce LiveFact a continuously updated benchmark that simulates the real-world "fog of war" in misinformation detection.
- Selecting Decision-Relevant Concepts in Reinforcement Learning
Naveen Raman, Stephanie Milani, Fei Fang · Apr 6, 2026 · Citations: 0
Expert Verification
Training interpretable concept-based policies requires practitioners to manually select which human-understandable concepts an agent should reason with when making sequential decisions.
- SkillX: Automatically Constructing Skill Knowledge Bases for Agents
Chenxi Wang, Zhuoyun Yu, Xin Xie, Wuguannan Yao, Runnan Fang · Apr 6, 2026 · Citations: 0
Long Horizon
Learning from experience is critical for building capable large language model (LLM) agents, yet prevailing self-evolving paradigms remain inefficient: agents learn in isolation, repeatedly rediscover similar behaviors from limited…
- Partially deterministic sampling for compressed sensing with denoising guarantees
Yaniv Plan, Matthew S. Scott, Ozgur Yilmaz · Apr 6, 2026 · Citations: 0
- Forgetting to Witness: Efficient Federated Unlearning and Its Visible Evaluation
Houzhe Wang, Xiaojie Zhu, Chi Chen · Apr 6, 2026 · Citations: 0
- Multi-Modal Sensor Fusion using Hybrid Attention for Autonomous Driving
Mayank Mayank, Bharanidhar Duraisamy, Florian Geiß, Abhinav Valada · Apr 6, 2026 · Citations: 0
- How Far Are We? Systematic Evaluation of LLMs vs. Human Experts in Mathematical Contest in Modeling
Yuhang Liu, Heyan Huang, Yizhe Yang, Hongyan Zhao, Zhizhuo Zeng · Apr 6, 2026 · Citations: 0
Large language models (LLMs) have achieved strong performance on reasoning benchmarks, yet their ability to solve real-world problems requiring end-to-end workflows remains unclear.
- HUKUKBERT: Domain-Specific Language Model for Turkish Law
Mehmet Utku Öztürk, Tansu Türkoğlu, Buse Buz-Yalug · Apr 6, 2026 · Citations: 0
Evaluated on a novel Legal Cloze Test benchmark -- a masked legal term prediction task designed for Turkish court decisions -- HukukBERT achieves state-of-the-art performance with 84.40\% Top-1 accuracy, substantially outperforming existing…
- A Quantum Search Approach to Magic Square Constraint Problems with Classical Benchmarking
Rituparna R, Harsha Varthini, Aswani Kumar Cherukuri · Apr 6, 2026 · Citations: 0
Rather than integrating classical and quantum solvers in an iterative loop, this work uses the classical component for structured initialisation and the quantum component for search, and benchmarks the quantum approach against classical…
- MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale
Bin Wang, Tianyao He, Linke Ouyang, Fan Wu, Zhiyuan Zhao · Apr 6, 2026 · Citations: 0
At its core is a Data Engine co-designed around coverage, informativeness, and annotation accuracy: Diversity-and-Difficulty-Aware Sampling expands training data from under 10M to 65.5M samples while mitigating distribution shift;…
- Cog-DRIFT: Exploration on Adaptively Reformulated Instances Enables Learning from Hard Reasoning Problems
Justin Chih-Yao Chen, Archiki Prasad, Zaid Khan, Joykirat Singh, Runchu Tian · Apr 6, 2026 · Citations: 0
Across 2 models and 6 reasoning benchmarks, our method consistently outperforms standard GRPO and strong guided-exploration baselines.
- Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw
Zijun Wang, Haoqin Tu, Letian Zhang, Hardy Chen, Juncheng Wu · Apr 6, 2026 · Citations: 0
OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem.
- Undetectable Conversations Between AI Agents via Pseudorandom Noise-Resilient Key Exchange
Vinod Vaikuntanathan, Or Zamir · Apr 6, 2026 · Citations: 0
AI agents are increasingly deployed to interact with other agents on behalf of users and organizations.
- Darkness Visible: Reading the Exception Handler of a Language Model
Peter Balogh · Apr 6, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- AI Trust OS -- A Continuous Governance Framework for Autonomous AI Observability and Zero-Trust Compliance in Enterprise Environments
Eranga Bandara, Asanga Gunaratna, Ross Gore, Abdul Rahman, Ravi Mukkamala · Apr 6, 2026 · Citations: 0
Multi Agent
The accelerating adoption of large language models, retrieval-augmented generation pipelines, and multi-agent AI workflows has created a structural governance crisis.
- Hallucination Basins: A Dynamic Framework for Understanding and Controlling LLM Hallucinations
Kalyan Cherukuri, Lav R. Varshney · Apr 6, 2026 · Citations: 0
Using autoregressive hidden-state trajectories across multiple open-source models and benchmarks, we find that separability is strongly task-dependent rather than universal: factoid settings can show clearer basin separation, whereas…
- Artificial Intelligence and Cost Reduction in Public Higher Education: A Scoping Review of Emerging Evidence
Diamanto Tzanoulinou, Loukas Triantafyllopoulos, George Vorvilas, Evgenia Paxinou, Nikolaos Karousos · Apr 6, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Fine-Tuning Integrity for Modern Neural Networks: Structured Drift Proofs via Norm, Rank, and Sparsity Certificates
Zhenhang Shang, Kani Chen · Apr 6, 2026 · Citations: 0
- Sampling Parallelism for Fast and Efficient Bayesian Learning
Asena Karolin Özdemir, Lars H. Heyen, Arvid Weyrauch, Achim Streit, Markus Götz · Apr 6, 2026 · Citations: 0
By distributing sample evaluations across multiple GPUs, our method reduces memory pressure and training time without requiring architectural changes or extensive hyperparameter tuning.
- Lighting Up or Dimming Down? Exploring Dark Patterns of LLMs in Co-Creativity
Zhu Li, Jiaming Qu, Yuan Chang · Apr 6, 2026 · Citations: 0
Large language models (LLMs) are increasingly acting as collaborative writing partners, raising questions about their impact on human agency.
- Discovering Failure Modes in Vision-Language Models using RL
Kanishk Jain, Qian Yang, Shravan Nayak, Parisa Kordjamshidi, Nishanth Anand · Apr 6, 2026 · Citations: 0
Vision-language Models (VLMs), despite achieving strong performance on multimodal benchmarks, often misinterpret straightforward visual concepts that humans identify effortlessly, such as counting, spatial reasoning, and viewpoint…
- Metaphors We Compute By: A Computational Audit of Cultural Translation vs. Thinking in LLMs
Yuan Chang, Jiaming Qu, Zhu Li · Apr 6, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Neuromorphic Computing for Low-Power Artificial Intelligence
Keshava Katti, Pratik Chaudhari, Deep Jariwala · Apr 6, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- A Muon-Accelerated Algorithm for Low Separation Rank Tensor Generalized Linear Models
Xiao Liang, Shuang Li · Apr 6, 2026 · Citations: 0