- Designing Explainable Conversational Agentic Systems for Guaraní Speakers
Samantha Adorno, Akshata Kishore Moharir, Ratna Kandala · Mar 5, 2026 · Citations: 0
Multi Agent
We propose an alternative to the standard "text-to-speech" pipeline, proposing instead an oral-first multi-agent architecture.
- Towards Robust Retrieval-Augmented Generation Based on Knowledge Graph: A Comparative Analysis
Hazem Amamou, Stéphane Gagnon, Alan Davoust, Anderson R. Avila · Mar 5, 2026 · Citations: 0
The Retrieval-Augmented Generation Benchmark (RGB) was introduced to evaluate the robustness of RAG systems under such conditions.
- RACAS: Controlling Diverse Robots With a Single Agentic System
Dylan R. Ashley, Jan Przepióra, Yimeng Chen, Ali Abualsaud, Nurzhan Yesmagambet · Mar 5, 2026 · Citations: 0
We introduce RACAS (Robot-Agnostic Control via Agentic Systems), a cooperative agentic architecture in which three LLM/VLM-based modules (Monitors, a Controller, and a Memory Curator) communicate exclusively through natural language to…
- RoboPocket: Improve Robot Policies Instantly with Your Phone
Junjie Fang, Wendi Chen, Han Xue, Fangyuan Zhou, Tian Le · Mar 5, 2026 · Citations: 0
Demonstrations Long Horizon
To reconcile this trade-off, we introduce RoboPocket, a portable system that enables Robot-Free Instant Policy Iteration using single consumer smartphones.
- POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation
Zeju Qiu, Lixin Liu, Adrian Weller, Han Shi, Weiyang Liu · Mar 5, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks
Shangwen Sun, Alfredo Canziani, Yann LeCun, Jiachen Zhu · Mar 5, 2026 · Citations: 0
- Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
Helena Casademunt, Bartosz Cywiński, Khoi Tran, Arya Jakkli, Samuel Marks · Mar 5, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought
Siddharth Boppana, Annabel Ma, Max Loeffler, Raphael Sarfati, Eric Bigelow · Mar 5, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval
Artem Vazhentsev, Maria Marina, Daniil Moskovskiy, Sergey Pletenev, Mikhail Seleznyov · Mar 5, 2026 · Citations: 0
- NCTB-QA: A Large-Scale Bangla Educational Question Answering Dataset and Benchmarking Performance
Abrar Eyasir, Tahsin Ahmed, Muhammad Ibrahim · Mar 5, 2026 · Citations: 0
- DEBISS: a Corpus of Individual, Semi-structured and Spoken Debates
Klaywert Danillo Ferreira de Souza, David Eduardo Pereira, Cláudio E. C. Campelo, Larissa Lucena Vasconcelos · Mar 5, 2026 · Citations: 0
- FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling
Ted Zadouri, Markus Hoehnerbach, Jay Shah, Timmy Liu, Vijay Thakkar · Mar 5, 2026 · Citations: 0
- Distributed Partial Information Puzzles: Examining Common Ground Construction Under Epistemic Asymmetry
Yifan Zhu, Mariah Bradford, Kenneth Lai, Timothy Obiso, Videep Venkatesha · Mar 5, 2026 · Citations: 0
- Ensembling Language Models with Sequential Monte Carlo
Robin Shing Moon Chan, Tianyu Liu, Samuel Kiegeland, Clemente Pasti, Jacob Hoover Vigly · Mar 5, 2026 · Citations: 0
- Emergent Introspection in AI is Content-Agnostic
Harvey Lederman, Kyle Mahowald · Mar 5, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- An Exploration-Analysis-Disambiguation Reasoning Framework for Word Sense Disambiguation with Low-Parameter LLMs
Deshan Sumanathilaka, Nicholas Micallef, Julian Hough · Mar 5, 2026 · Citations: 0
- Progressive Residual Warmup for Language Model Pretraining
Tianhao Chen, Xin Xu, Lu Yin, Hao Chen, Yang Wang · Mar 5, 2026 · Citations: 0
- DiSCTT: Consensus-Guided Self-Curriculum for Efficient Test-Time Adaptation in Reasoning
Mohammad Mahdi Moradi, Sudhir Mudur · Mar 5, 2026 · Citations: 0
- Exploring the potential and limitations of Model Merging for Multi-Domain Adaptation in ASR
Carlos Carvalho, Francisco Teixeira, Thomas Rolland, Alberto Abad · Mar 5, 2026 · Citations: 0
- A Multilingual Human Annotated Corpus of Original and Easy-to-Read Texts to Support Access to Democratic Participatory Processes
Stefan Bott, Verena Riegler, Horacio Saggion, Almudena Rascón Alcaina, Nouran Khallaf · Mar 5, 2026 · Citations: 0
- Building Effective AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned
Nghi D. Q. Bui · Mar 5, 2026 · Citations: 0
- PersianPunc: A Large-Scale Dataset and BERT-Based Approach for Persian Punctuation Restoration
Mohammad Javad Ranjbar Kalahroodi, Heshaam Faili, Azadeh Shakery · Mar 5, 2026 · Citations: 0
- Latent-Mark: An Audio Watermark Robust to Neural Resynthesis
Yen-Shan Chen, Shih-Yu Lai, Ying-Jung Tsou, Yi-Cheng Lin, Bing-Yu Chen · Mar 5, 2026 · Citations: 0
Extensive evaluations demonstrate robust zero-shot transferability to unseen neural codecs, achieving state-of-the-art resilience against traditional DSP attacks while preserving perceptual imperceptibility.
- Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution
Qiao Jin, Yin Fang, Lauren He, Yifan Yang, Guangzhi Xiong · Mar 5, 2026 · Citations: 0
- WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation
Luca Della Libera, Cem Subakan, Mirco Ravanelli · Mar 5, 2026 · Citations: 0
- Knowledge Divergence and the Value of Debate for Scalable Oversight
Robin Young · Mar 5, 2026 · Citations: 0
Rlaif Or Synthetic Feedback
AI safety via debate and reinforcement learning from AI feedback (RLAIF) are both proposed methods for scalable oversight of advanced AI systems, yet no formal framework relates them or characterizes when debate offers an advantage.
- SarcasmMiner: A Dual-Track Post-Training Framework for Robust Audio-Visual Sarcasm Reasoning
Zhu Li, Yongjian Chen, Huiyuan Lai, Xiyuan Gao, Shekhar Nayak · Mar 5, 2026 · Citations: 0
- Oral to Web: Digitizing 'Zero Resource'Languages of Bangladesh
Mohammad Mamun Or Rashid · Mar 5, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- VietJobs: A Vietnamese Job Advertisement Dataset
Hieu Pham Dinh, Hung Nguyen Huy, Mo El-Haj · Mar 5, 2026 · Citations: 0
- The Geometric Inductive Bias of Grokking: Bypassing Phase Transitions via Architectural Topology
Alper Yıldırım · Mar 5, 2026 · Citations: 0
- Balancing Coverage and Draft Latency in Vocabulary Trimming for Faster Speculative Decoding
Ofir Ben Shoham · Mar 5, 2026 · Citations: 0
- Core-based Hierarchies for Efficient GraphRAG
Jakir Hossain, Ahmet Erdem Sarıyüce · Mar 5, 2026 · Citations: 0
- Distilling Formal Logic into Neural Spaces: A Kernel Alignment Approach for Signal Temporal Logic
Sara Candussio, Gabriele Sarti, Gaia Saveri, Luca Bortolussi · Mar 5, 2026 · Citations: 0
- Diffusion LLMs can think EoS-by-EoS
Sarah Breckner, Sebastian Schuster · Mar 5, 2026 · Citations: 0
- Transducing Language Models
Vésteinn Snæbjarnarson, Samuel Kiegeland, Tianyu Liu, Reda Boumasmoud, Ryan Cotterell · Mar 5, 2026 · Citations: 0
- Guidelines for the Annotation and Visualization of Legal Argumentation Structures in Chinese Judicial Decisions
Kun Chen, Xianglei Liao, Kaixue Fei, Yi Xing, Xinrui Li · Mar 5, 2026 · Citations: 0
- Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity
Di Zhang, Xun Wu, Shaohan Huang, Yudong Wang, Hanyong Shao · Mar 5, 2026 · Citations: 0
- C2-Faith: Benchmarking LLM Judges for Causal and Coverage Faithfulness in Chain-of-Thought Reasoning
Avni Mittal, Rauno Arike · Mar 5, 2026 · Citations: 0
- Feature Resemblance: On the Theoretical Understanding of Analogical Reasoning in Transformers
Ruichen Xu, Wenjing Yan, Ying-Jun Angela Zhang · Mar 5, 2026 · Citations: 0
- Representation Fidelity:Auditing Algorithmic Decisions About Humans Using Self-Descriptions
Theresa Elstner, Martin Potthast · Mar 5, 2026 · Citations: 0
- LBM: Hierarchical Large Auto-Bidding Model via Reasoning and Acting
Yewen Li, Zhiyi Lyu, Peng Jiang, Qingpeng Cai, Fei Pan · Mar 5, 2026 · Citations: 0
- Measuring the Redundancy of Decoder Layers in SpeechLLMs
Adel Moumen, Guangzhi Sun, Philip C Woodland · Mar 5, 2026 · Citations: 0
- ARC-TGI: Human-Validated Task Generators with Reasoning Chain Templates for ARC-AGI
Jens Lehmann, Syeda Khushbakht, Nikoo Salehfard, Nur A Zarin Nishat, Dhananjay Bhandiwad · Mar 5, 2026 · Citations: 0
- Aura: Universal Multi-dimensional Exogenous Integration for Aviation Time Series
Jiafeng Lin, Mengren Zheng, Simeng Ye, Yuxuan Wang, Huan Zhang · Mar 5, 2026 · Citations: 0
Our findings highlight Aura's potential as a general-purpose enhancement for aviation safety and reliability.
- MUTEX: Leveraging Multilingual Transformers and Conditional Random Fields for Enhanced Urdu Toxic Span Detection
Inayat Arshad, Fajar Saleem, Ijaz Hussain · Mar 5, 2026 · Citations: 0
- NeuronMoE: Neuron-Guided Mixture-of-Experts for Efficient Multilingual LLM Extension
Rongzhi Li, Hitomi Yanaka · Mar 5, 2026 · Citations: 0
- Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure
Yida Lu, Jianwei Fang, Xuyang Shao, Zixuan Chen, Shiyao Cui · Mar 5, 2026 · Citations: 0
- S5-SHB Agent: Society 5.0 enabled Multi-model Agentic Blockchain Framework for Smart Home
Janani Rangila, Akila Siriweera, Incheon Paik, Keitaro Naruse, Isuru Jayanada · Mar 5, 2026 · Citations: 0
Pairwise Preference Multi Agent
The smart home is a key application domain within the Society 5.0 vision for a human-centered society.
- HiFlow: Hierarchical Feedback-Driven Optimization for Constrained Long-Form Text Generation
Yifan Zhu, Guanting Chen, Bing Wei, Haoran Luo · Mar 5, 2026 · Citations: 0
- Towards Efficient and Stable Ocean State Forecasting: A Continuous-Time Koopman Approach
Rares Grozavescu, Pengyu Zhang, Mark Girolami, Etienne Meunier · Mar 5, 2026 · Citations: 0
- ThaiSafetyBench: Assessing Language Model Safety in Thai Cultural Contexts
Trapoom Ukarapol, Nut Chukamphaeng, Kunat Pipatanakul, Pakhapoom Sarapat · Mar 5, 2026 · Citations: 0
Using ThaiSafetyBench, we evaluate 24 LLMs, with GPT-4.1 and Gemini-2.5-Pro serving as LLM-as-a-judge evaluators.
- VRM: Teaching Reward Models to Understand Authentic Human Preferences
Biao Liu, Ning Xu, Junming Yang, Hao Xu, Xin Geng · Mar 5, 2026 · Citations: 0
Pairwise Preference
Large Language Models (LLMs) have achieved remarkable success across diverse natural language tasks, yet the reward models employed for aligning LLMs often encounter challenges of reward hacking, where the approaches predominantly rely on…
- Functionality-Oriented LLM Merging on the Fisher--Rao Manifold
Jiayu Wang, Zuojun Ye, Wenpeng Yin · Mar 5, 2026 · Citations: 0
Across various benchmarks and collapse diagnostics, our method remains stable as the number and heterogeneity of merged models increase, consistently outperforming prior baselines.
- Mixture of Universal Experts: Scaling Virtual Width via Depth-Width Transformation
Yilong Chen, Naibin Gu, Junyuan Shang, Zhenyu Zhang, Yuchen Feng · Mar 5, 2026 · Citations: 0
Long Horizon
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- MPCEval: A Benchmark for Multi-Party Conversation Generation
Minxing Zhang, Yi Yang, Zhuofan Jia, Xuan Yang, Jian Pei · Mar 5, 2026 · Citations: 0
Multi-party conversation generation, such as smart reply and collaborative assistants, is an increasingly important capability of generative AI, yet its evaluation remains a critical bottleneck.
- When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger
Amirabbas Afzali, Myeongho Jeon, Maria Brbic · Mar 5, 2026 · Citations: 0
Pairwise Preference
Building on this insight, we propose Confidence-Weighted Preference Optimization (CW-PO), a general framework that re-weights training samples by a weak LLM's confidence and can be applied across different preference optimization…
- Replaying pre-training data improves fine-tuning
Suhas Kotha, Percy Liang · Mar 5, 2026 · Citations: 0
Web Browsing
We demonstrate the success of replay in practice for fine-tuning 8B parameter models, improving agentic web navigation success by 4.5\% and Basque question-answering accuracy by 2\%.
- VisionPangu: A Compact and Fine-Grained Multimodal Assistant with 1.7B Parameters
Jiaxin Fan, Wenpo Song · Mar 5, 2026 · Citations: 0
By incorporating dense human-authored descriptions from the DOCCI dataset, VisionPangu improves semantic coherence and descriptive richness without relying on aggressive model scaling.
- Retrieval-Augmented Generation with Covariate Time Series
Kenny Ye Liang, Zhongyi Pei, Huan Zhang, Yuhui Liu, Shaoxu Song · Mar 5, 2026 · Citations: 0
- TimeWarp: Evaluating Web Agents by Revisiting the Past
Md Farhan Ishmam, Kenneth Marino · Mar 5, 2026 · Citations: 0
Demonstrations Web Browsing
The improvement of web agents on current benchmarks raises the question: Do today's agents perform just as well when the web changes?