- Moving On, Even When You're Broken: Fail-Active Trajectory Generation via Diffusion Policies Conditioned on Embodiment and Task
Gilberto G. Briscoe-Martinez, Yaashia Gautam, Rahul Shetty, Anuj Pasricha, Marco M. Nicotra · Feb 2, 2026 · Citations: 0
- WAXAL: A Large-Scale Multilingual African Language Speech Corpus
Abdoulaye Diack, Perry Nelson, Kwaku Agbesi, Angela Nakalembe, MohamedElfatih MohamedKhair · Feb 2, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- From Sycophancy to Sensemaking: Premise Governance for Human-AI Decision Making
Raunak Jain · Feb 2, 2026 · Citations: 0
We argue reliable human-AI partnership requires a shift from answer generation to collaborative premise governance over a knowledge substrate, negotiating only what is decision-critical.
- Proof-RM: A Scalable and Generalizable Reward Model for Math Proof
Haotong Yang, Zitong Wang, Shijia Kang, Siqi Yang, Wenkai Yu · Feb 2, 2026 · Citations: 0
In this work, we design a *scalable* data-construction pipeline that, with minimal human effort, leverages LLMs to generate a large quantity of high-quality ``**question-proof-check**'' triplet data.
- VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations
Fatemeh Zargarbashi, Dhruv Agrawal, Jakob Buhmann, Martin Guay, Stelian Coros · Feb 2, 2026 · Citations: 0
Human motion data is inherently rich and complex, containing both semantic content and subtle stylistic features that are challenging to model.
- Language Steering for Multilingual In-Context Learning
Neeraja Kirtane, Kuan-Hao Huang · Feb 2, 2026 · Citations: 0
Demonstrations
We propose language vectors, computed as the mean activation difference between parallel source and target language examples at a particular layer, and added as an offset to hidden states at inference time to shift the model's internal…
- Hallucination or Creativity: How to Evaluate AI-Generated Scientific Stories?
Alex Argese, Pasquale Lisena, Raphaël Troncy · Feb 2, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models
Yu Zeng, Wenxuan Huang, Zhen Fang, Shuang Chen, Yufan Shen · Feb 2, 2026 · Citations: 0
Expert Verification Web Browsing
However, evaluating these visual and textual search abilities is still difficult, and existing benchmarks have two major limitations.
- DCoPilot: Generative AI-Empowered Policy Adaptation for Dynamic Data Center Operations
Minghao Li, Ruihang Wang, Rui Tan, Yonggang Wen · Feb 2, 2026 · Citations: 0
However, manually designing piecewise deep reinforcement learning (DRL) agents cannot keep pace with frequent dynamics shifts and service-level agreement (SLA) changes of an evolving DC.
- Out of the Memory Barrier: A Highly Memory Efficient Training System for LLMs with Million-Token Contexts
Wenhao Li, Daohai Yu, Gen Luo, Yuxin Zhang, Fei Chao · Feb 2, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- LEC-KG: An LLM-Embedding Collaborative Framework for Domain-Specific Knowledge Graph Construction -- A Case Study on SDGs
Yikai Zeng, Yingchao Piao, Changhua Pei, Jianhui Li · Feb 2, 2026 · Citations: 0
- CryoLVM: Self-supervised Learning from Cryo-EM Density Maps with Large Vision Models
Weining Fu, Kai Shu, Kui Xu, Qiangfeng Cliff Zhang · Feb 2, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents
Zeping Li, Hongru Wang, Yiwen Zhao, Guanhua Chen, Yixia Li · Feb 2, 2026 · Citations: 0
- Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation
Zhanghao Hu, Qinglin Zhu, Hanqi Yan, Yulan He, Lin Gui · Feb 2, 2026 · Citations: 0
Agent memory systems often adopt the standard Retrieval-Augmented Generation (RAG) pipeline, yet its underlying assumptions differ in this setting.
- Towards Exploratory and Focused Manipulation with Bimanual Active Perception: A New Problem, Benchmark and Strategy
Yuxin He, Ruihao Zhang, Tianao Shen, Cheng Liu, Qiang Nie · Feb 2, 2026 · Citations: 0
- Behavioral Consistency Validation for LLM Agents: An Analysis of Trading-Style Switching through Stock-Market Simulation
Zeping Li, Guancheng Wan, Keyang Chen, Yu Chen, Yiwen Zhao · Feb 2, 2026 · Citations: 0
- Read As Human: Compressing Context via Parallelizable Close Reading and Skimming
Jiwei Tang, Shilei Liu, Zhicheng Zhang, Qingsong Lv, Runsong Zhao · Feb 2, 2026 · Citations: 0
- AXE: Low-Cost Cross-Domain Web Structured Information Extraction
Abdelrahman Mansour, Khaled W. Alshaer, Moataz Elsaban · Feb 2, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Controlling Exploration-Exploitation in GFlowNets via Markov Chain Perspectives
Lin Chen, Samuel Drapeau, Fanghao Shao, Xuekai Zhu, Bo Xue · Feb 2, 2026 · Citations: 0
Across various benchmarks, including Set, Bit Sequence, and Molecule Generation, α-GFN objectives consistently outperform previous GFlowNet objectives, achieving up to a 10 \times increase in the number of discovered modes.
- COMI: Coarse-to-fine Context Compression via Marginal Information Gain
Jiwei Tang, Shilei Liu, Zhicheng Zhang, Yujin Yuan, Libin Zheng · Feb 2, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Mechanistic Indicators of Steering Effectiveness in Large Language Models
Mehdi Jafari, Hao Xue, Flora Salim · Feb 2, 2026 · Citations: 0
Despite its widespread use, the mechanistic factors that govern when steering succeeds or fails remain poorly understood, as prior work has relied primarily on black-box outputs or LLM-based judges.
- Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models
Wenhui Tan, Fiorenzo Parascandolo, Enver Sangineto, Jianzhong Ju, Zhenbo Luo · Feb 2, 2026 · Citations: 0
Without additional training or parameters, LED consistently improves pass@1 and pass@16 accuracy by 0.61 and 1.03 percentage points across multiple reasoning benchmarks and models.
- Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
Hieu Trung Nguyen, Bao Nguyen, Wenao Ma, Yuzhi Zhao, Ruifeng She · Feb 2, 2026 · Citations: 0
Empirical results show that VIP consistently improves sampling efficiency and achieves higher performance than uniform or heuristic allocation strategies in multiple benchmarks.
- Argument Rarity-based Originality Assessment for AI-Assisted Writing
Keito Inoshita, Michiaki Omura, Tsukasa Yamanaka, Go Maeda, Kentaro Tsuji · Feb 2, 2026 · Citations: 0
Experiments using 1,375 human essays and 1,000 AI-generated essays on two argumentative topics revealed three key findings.
- InfoTok: Information-Theoretic Regularization for Capacity-Constrained Shared Visual Tokenization in Unified MLLMs
Lv Tang, Tianyi Zheng, Bo Li, Xingyu Li · Feb 2, 2026 · Citations: 0
- Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning
Qian Wang, Xuandong Zhao, Zirui Zhang, Zhanzhi Lou, Nuo Chen · Feb 2, 2026 · Citations: 0