- Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models
Shilin Yan, Jintao Tong, Hongwei Xue, Xiaojun Tang, Yangyang Wang · Apr 9, 2026 · Citations: 0
Tool Use
The advent of agentic multimodal models has empowered systems to actively interact with external environments.
- SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds
Yunsong Zhou, Hangxu Liu, Xuekun Jiang, Xing Shen, Yuanzhen Zhou · Apr 9, 2026 · Citations: 0
- Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts
Haolei Xu, Haiwen Hong, Hongxing Li, Rui Zhou, Yang Zhang · Apr 9, 2026 · Citations: 0
Expert Verification
Experiments on three multimodal MoE models across six benchmarks demonstrate consistent improvements, with gains of up to 3.17% on complex visual reasoning tasks.
- OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks
Wenbo Hu, Xin Chen, Yan Gao-Tian, Yihe Deng, Nanyun Peng · Apr 9, 2026 · Citations: 0
Long Horizon
Extensive evaluations across 18 diverse benchmarks demonstrate its superior performance over strong open-source and leading proprietary frontier models.
- AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
Ziwei Zhou, Zeyuan Lai, Rui Wang, Yifan Yang, Zhen Xing · Apr 9, 2026 · Citations: 0
We introduce AVGen-Bench, a task-driven benchmark for T2AV generation featuring high-quality prompts across 11 real-world categories.
- RewardFlow: Generate Images by Optimizing What You Reward
Onkar Susladkar, Dong-Hwan Jang, Tushar Prakash, Adheesh Juvekar, Vedant Shah · Apr 9, 2026 · Citations: 0
- PSI: Shared State as the Missing Layer for Coherent AI-Generated Instruments in Personal AI Agents
Zhiyuan Wang, Erzhen Hu, Mark Rucker, Laura E. Barnes · Apr 9, 2026 · Citations: 0
- Demystifying OPD: Length Inflation and Stabilization Strategies for Large Language Models
Feng Luo, Yu-Neng Chuang, Guanchu Wang, Zicheng Xu, Xiaotian Han · Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest
Addison J. Wu, Ryan Liu, Shuyue Stella Li, Yulia Tsvetkov, Thomas L. Griffiths · Apr 9, 2026 · Citations: 0
Pairwise Preference
Today's large language models (LLMs) are trained to align with user preferences through methods such as reinforcement learning.
- What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal
Stephen Cheng, Sarah Wiegreffe, Dinesh Manocha · Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- ClawBench: Can AI Agents Complete Everyday Online Tasks?
Yuxuan Zhang, Yubo Wang, Yipeng Zhu, Penghui Du, Junwen Miao · Apr 9, 2026 · Citations: 0
Long Horizon
AI agents may be able to automate your inbox, but can they automate other routine aspects of your life?
- Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts
Jiayuan Ye, Vitaly Feldman, Kunal Talwar · Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- What do Language Models Learn and When? The Implicit Curriculum Hypothesis
Emmy Liu, Kaiser Sun, Millicent Li, Isabelle Lee, Lindia Tjuatja · Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Differentially Private Language Generation and Identification in the Limit
Anay Mehrotra, Grigoris Velegkas, Xifan Yu, Felix Zhou · Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification
Kabilan Elangovan, Daniel Ting · Apr 9, 2026 · Citations: 0
- sciwrite-lint: Verification Infrastructure for the Age of Science Vibe-Writing
Sergey V Samsonau · Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- PIArena: A Platform for Prompt Injection Evaluation
Runpeng Geng, Chenlong Yin, Yanting Wang, Ying Chen, Jinyuan Jia · Apr 9, 2026 · Citations: 0
While receiving increasing attention, the community faces a critical gap: the lack of a unified platform for prompt injection evaluation.
- What They Saw, Not Just Where They Looked: Semantic Scanpath Similarity via VLMs and NLP metric
Mohamed Amine Kerkouri, Marouane Tliba, Bin Wang, Aladine Chetouani, Ulas Bagci · Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Formalizing building-up constructions of self-dual codes through isotropic lines in Lean
Jae-Hyun Baek, Jon-Lark Kim · Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- AI generates well-liked but templatic empathic responses
Emma Gueorguieva, Hongli Zhan, Jina Suh, Javier Hernandez, Tatiana Lau · Apr 9, 2026 · Citations: 0
Recent research shows that greater numbers of people are turning to Large Language Models (LLMs) for emotional support, and that people rate LLM responses as more empathic than human-written responses.
- SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions
Ashima Suvarna, Kendrick Phan, Mehrab Beikzadeh, Hritik Bansal, Saadia Gabriel · Apr 9, 2026 · Citations: 0
- Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization
Sai Srinivas Kancheti, Aditya Kanade, Rohit Sinha, Vineeth N Balasubramanian, Tanuja Ganu · Apr 9, 2026 · Citations: 0
- TTVS: Boosting Self-Exploring Reinforcement Learning via Test-time Variational Synthesis
Sikai Bai, Haoxi Li, Jie Zhang, Yongjiang Liu, Song Guo · Apr 9, 2026 · Citations: 0
- From Safety Risk to Design Principle: Peer-Preservation in Multi-Agent LLM Systems and Its Implications for Orchestrated Democratic Discourse Analysis
Juergen Dietrich · Apr 9, 2026 · Citations: 0
- OVS-DINO: Open-Vocabulary Segmentation via Structure-Aligned SAM-DINO with Language Guidance
Haoxi Zeng, Qiankun Liu, Yi Bin, Haiyue Zhang, Yujuan Ding · Apr 9, 2026 · Citations: 0
- A Machine Learning Framework for Turbofan Health Estimation via Inverse Problem Formulation
Milad Leyli-Abadi, Lucas Thil, Sebastien Razakarivony, Guillaume Doquet, Jesse Read · Apr 9, 2026 · Citations: 0
- CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning
Rui Gan, Junyi Ma, Pei Li, Xingyou Yang, Kai Chen · Apr 9, 2026 · Citations: 0
- Entropy-Gradient Grounding: Training-Free Evidence Retrieval in Vision-Language Models
Marcel Gröpl, Jaewoo Jung, Seungryong Kim, Marc Pollefeys, Sunghwan Hong · Apr 9, 2026 · Citations: 0
Experiments on seven benchmarks across four VLM architectures demonstrate consistent improvements over existing methods, with the largest gains on detail-critical and high-resolution settings, while also producing more interpretable…
- KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation
Tongbo Chen, Zhengxi Lu, Zhan Xu, Guocheng Shao, Shaohan Zhao · Apr 9, 2026 · Citations: 0
- AfriVoices-KE: A Multilingual Speech Dataset for Kenyan Languages
Lilian Wanzare, Cynthia Amol, zekiel Maina, Nelson Odhiambo, Hope Kerubo · Apr 9, 2026 · Citations: 0
Quality assurance operated at multiple layers, encompassing automated signal-to-noise ratio validation prior to recording and human review for content accuracy.
- HST-HGN: Heterogeneous Spatial-Temporal Hypergraph Networks with Bidirectional State Space Models for Global Fatigue Assessment
Changdao Chen · Apr 9, 2026 · Citations: 0
- Small-scale photonic Kolmogorov-Arnold networks using standard telecom nonlinear modules
Luca Nogueira Calçado, Sergei K. Turitsyn, Egor Manuylovich · Apr 9, 2026 · Citations: 0
- KV Cache Offloading for Context-Intensive Tasks
Andrey Bocharnikov, Ivan Ermakov, Denis Kuznedelev, Vyacheslav Zhdanovskiy, Yegor Yershov · Apr 9, 2026 · Citations: 0
Prior evaluations have largely focused on tasks that do not require extracting large amounts of information from the context.
- Learning Who Disagrees: Demographic Importance Weighting for Modeling Annotator Distributions with DiADEM
Samay U. Shetty, Tharindu Cyril Weerasooriya, Deepak Pandita, Christopher M. Homan · Apr 9, 2026 · Citations: 0
When humans label subjective content, they disagree, and that disagreement is not noise.
- On-board Telemetry Monitoring in Autonomous Satellites: Challenges and Opportunities
Lorenzo Capelli, Leandro de Souza Rosa, Maurizio De Tommasi, Livia Manovi, Andriy Enttsel · Apr 9, 2026 · Citations: 0
- Synthetic Data for any Differentiable Target
Tristan Thrush, Sung Min Park, Herman Brunborg, Luke Bailey, Marcel Roed · Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Exploring Temporal Representation in Neural Processes for Multimodal Action Prediction
Marco Gabriele Fedozzi, Yukie Nagai, Francesco Rea, Alessandra Sciutti · Apr 9, 2026 · Citations: 0
- Selective Attention System (SAS): Device-Addressed Speech Detection for Real-Time On-Device Voice AI
David Joohun Kim, Daniyal Anjum, Bonny Banerjee, Omar Abbasi · Apr 9, 2026 · Citations: 0
- Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing
Wenhao Yuan, Chenchen Lin, Jian Chen, Jinfeng Xu, Xuehe Wang · Apr 9, 2026 · Citations: 0
Long Horizon
In large language model (LLM) agents, reasoning trajectories are treated as reliable internal beliefs for guiding actions and updating memory.
- Zero-shot Multivariate Time Series Forecasting Using Tabular Prior Fitted Networks
Mayuka Jayawardhana, Nihal Sharma, Kazem Meidani, Bayan Bruss, Tom Goldstein · Apr 9, 2026 · Citations: 0
- ADAPTive Input Training for Many-to-One Pre-Training on Time-Series Classification
Paul Quinlan, Qingguo Li, Xiaodan Zhu · Apr 9, 2026 · Citations: 0
- Phantasia: Context-Adaptive Backdoors in Vision Language Models
Nam Duong Tran, Phi Le Nguyen · Apr 9, 2026 · Citations: 0
- Awakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel Prover
Jui-Hui Chung, Hongzhou Lin, Lai Jiang, Shange Tang, Chi Jin · Apr 9, 2026 · Citations: 0
- TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs
Jing Peng, Chenghao Wang, Yi Yang, Lirong Qian, Junjie Li · Apr 9, 2026 · Citations: 0
- A GAN and LLM-Driven Data Augmentation Framework for Dynamic Linguistic Pattern Modeling in Chinese Sarcasm Detection
Wenxian Wang, Xiaohu Luo, Junfeng Hao, Xiaoming Gu, Xingshu Chen · Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
Ziyu Ma, Shidong Yang, Yuxiang Ji, Xucong Wang, Yong Wang · Apr 9, 2026 · Citations: 0
Large language model (LLM) agents such as OpenClaw rely on reusable skills to perform complex tasks, yet these skills remain largely static after deployment.
- Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents
Khushal Sethi · Apr 9, 2026 · Citations: 0
Long Horizon
We introduce TrACE (Trajectorical Adaptive Compute via agrEement), a training-free controller that allocates LLM calls adaptively across agent timesteps by measuring inter-rollout action agreement.
- SOLAR: Communication-Efficient Model Adaptation via Subspace-Oriented Latent Adapter Reparametrization
Seyed Mahmoud Sajjadi Mohammadabadi, Xiaolong Ma, Lei Yang, Feng Yan, Junshan Zhang · Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Scaling-Aware Data Selection for End-to-End Autonomous Driving Systems
Tolga Dimlioglu, Nadine Chang, Maying Shen, Rafid Mahmood, Jose M. Alvarez · Apr 9, 2026 · Citations: 0
- Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces
Jiawei Chen, Ruoxi Xu, Boxi Cao, Ruotong Pan, Yunfei Zhang · Apr 9, 2026 · Citations: 0
Long Horizon
However, existing benchmarks remain constrained to isolated scenarios, narrow action spaces, or synthetic data, failing to capture the holistic nature of authentic human behavior.
- Scalable Neural Decoders for Practical Fault-Tolerant Quantum Computation
Andi Gu, J. Pablo Bonilla Ataides, Mikhail D. Lukin, Susanne F. Yelin · Apr 9, 2026 · Citations: 0
- ASPECT:Analogical Semantic Policy Execution via Language Conditioned Transfer
Ajsal Shereef Palattuparambil, Thommen George Karimpanal, Santu Rana · Apr 9, 2026 · Citations: 0
- Human-AI Collaboration Reconfigures Group Regulation from Socially Shared to Hybrid Co-Regulation
Yujing Zhang, Xianghui Meng, Shihui Feng, Jionghao Lin · Apr 9, 2026 · Citations: 0
- PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models
Ruizhi Zhang, Ye Huang, Yuangang Pan, Chuanfu Shen, Zhilin Liu · Apr 9, 2026 · Citations: 0
- InstAP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal Understanding
Ashutosh Kumar, Rajat Saini, Jingjing Pan, Mustafa Erdogan, Mingfang Zhang · Apr 9, 2026 · Citations: 0
- Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models
Marcus Armstrong, Navid Ayoobi, Arjun Mukherjee · Apr 9, 2026 · Citations: 0
- Lost in the Hype: Revealing and Dissecting the Performance Degradation of Medical Multimodal Large Language Models in Image Classification
Xun Zhu, Fanbin Mo, Xi Chen, Kaili Zheng, Shaoshuai Yang · Apr 9, 2026 · Citations: 0
- ProMedical: Hierarchical Fine-Grained Criteria Modeling for Medical LLM Alignment via Explicit Injection
He Geng, Yangmin Huang, Lixian Lai, Qianyun Du, Hui Chu · Apr 9, 2026 · Citations: 0
- Multi-Modal Learning meets Genetic Programming: Analyzing Alignment in Latent Space Optimization
Benjamin Léger, Kazem Meidani, Christian Gagné · Apr 9, 2026 · Citations: 0
- HistDiT: A Structure-Aware Latent Conditional Diffusion Model for High-Fidelity Virtual Staining in Histopathology
Aasim Bin Saleem, Amr Ahmed, Ardhendu Behera, Hafeezullah Amin, Iman Yi Liao · Apr 9, 2026 · Citations: 0