- DreamAudio: Customized Text-to-Audio Generation with Diffusion Models
Yi Yuan, Xubo Liu, Haohe Liu, Xiyuan Kang, Zhuo Chen · Sep 7, 2025 · Citations: 0
- TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition
Tran Nguyen Anh, Truong Dinh Dung, Vo Van Nam, Minh N. H. Nguyen · Sep 7, 2025 · Citations: 0
- QCSE: A Pretrained Quantum Context-Sensitive Word Embedding for Natural Language Processing
Charles M. Varmantchaonala, Niclas Götting, Nils-Erik Schütte, Jean Louis E. K. Fendji, Christopher Gies · Sep 6, 2025 · Citations: 0
To evaluate the effectiveness of the model and the associated context matrix methods, evaluations are conducted on both a Fulani corpus, a low-resource African language, dataset of small size and an English corpus of slightly larger size.
- New Insights into Optimal Alignment of Acoustic and Linguistic Representations for Knowledge Transfer in ASR
Xugang Lu, Peng Shen, Hisashi Kawai · Sep 6, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- BinaryShield: Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints
Waris Gill, Natalie Isak, Matthew Dressman · Sep 6, 2025 · Citations: 0
- No Text Needed: Forecasting MT Quality and Inequity from Fertility and Metadata
Jessica M. Lundin, Ada Zhang, David Adelani, Cody Carroll · Sep 5, 2025 · Citations: 0
Using only a handful of features, token fertility ratios, token counts, and basic linguistic metadata (language family, script, and region), we can forecast ChrF scores for GPT-4o translations across 203 languages in the FLORES-200…
- Post-training Large Language Models for Diverse High-Quality Responses
Yilei Chen, Souradip Chakraborty, Lorenz Wolf, Yannis Paschalidis, Aldo Pacchiano · Sep 5, 2025 · Citations: 0
- Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios
Jingen Qu, Lijun Li, Bo Zhang, Yichen Yan, Jing Shao · Sep 4, 2025 · Citations: 0
Multimodal large language models (MLLMs) are rapidly evolving, presenting increasingly complex safety challenges.
- From Editor to Dense Geometry Estimator
JiYuan Wang, Chunyu Lin, Lei Sun, Rongying Liu, Lang Nie · Sep 4, 2025 · Citations: 0
- MultiWikiQA: A Reading Comprehension Benchmark in 300+ Languages
Dan Saattrup Smart · Sep 4, 2025 · Citations: 0
- CausalARC: Abstract Reasoning with Causal World Models
Jacqueline Maasch, John Kalantari, Kia Khezeli · Sep 3, 2025 · Citations: 0
Demonstrations
As a proof-of-concept, we illustrate the use of CausalARC for four language model evaluation settings: (1) abstract reasoning with test-time training, (2) counterfactual reasoning with in-context learning, (3) program synthesis, and (4)…
- Do Language Models Follow Occam's Razor? An Evaluation of Parsimony in Inductive and Abductive Reasoning
Yunxin Sun, Abulhair Saparov · Sep 3, 2025 · Citations: 0
The task for the intelligent agent is to produce hypotheses to explain observations under a given world model.
- Mitigating Multimodal Hallucinations via Gradient-based Self-Reflection
Shan Wang, Maying Shen, Nadine Chang, Chuong Nguyen, Hongdong Li · Sep 3, 2025 · Citations: 0
Experiments across multiple benchmarks demonstrate that GACD effectively reduces hallucinations and improves the visual grounding of MLLM outputs.
- Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR
Jiaming Li, Longze Chen, Ze Gong, Yukun Chen, Lu Wang · Sep 2, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Do LLMs Adhere to Label Definitions? Examining Their Receptivity to External Label Definitions
Seyedali Mohammadi, Bhaskara Hanuma Vedula, Hemank Lamba, Edward Raff, Ponnurangam Kumaraguru · Sep 2, 2025 · Citations: 0
To address these questions, we conduct controlled experiments across multiple explanation benchmark datasets (general and domain-specific) and label definition conditions, including expert-curated, LLM-generated, perturbed, and swapped…
- From Noisy Labels to Intrinsic Structure: A Geometric-Structural Dual-Guided Framework for Noise-Robust Medical Image Segmentation
Tao Wang, Zhenxuan Zhang, Yuanbo Zhou, Xinlin Zhang, Yuanbin Chen · Sep 2, 2025 · Citations: 0
- BioBlue: Systematic runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format
Roland Pihlakas, Sruthi Susan Kuriakose · Sep 2, 2025 · Citations: 0
Long Horizon
Many AI alignment discussions of "runaway optimisation" focus on RL agents: unbounded utility maximisers that over-optimise a proxy objective (e.g., "paperclip maximiser", specification gaming) at the expense of everything else.
- CMRAG: Co-modality-based visual document retrieval and question answering
Wang Chen, Wenhan Yu, Guanqiang Qi, Weikang Li, Yang Li · Sep 2, 2025 · Citations: 0
Experiments demonstrate that our proposed framework consistently outperforms single-modality--based RAG in multiple visual document question-answering (VDQA) benchmarks.
- End-to-End Low-Level Neural Control of an Industrial-Grade 6D Magnetic Levitation System
Philipp Hartmann, Jannick Stranghöner, Klaus Neumann · Sep 1, 2025 · Citations: 0
Demonstrations
Magnetic levitation is poised to revolutionize industrial automation by integrating flexible in-machine product transport and seamless manipulation.
- Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models
Yunqing Liu, Nan Zhang, Zhiming Tan · Sep 1, 2025 · Citations: 0
Pairwise Preference Long Horizon
We additionally contribute a CAD dataset with human preference annotations.
- TempCore: Are Video QA Benchmarks Temporally Grounded? A Frame Selection Sensitivity Analysis and Benchmark
Hyunjong Ok, Jaeho Lee · Sep 1, 2025 · Citations: 0
But do current Video QA benchmarks genuinely require temporal frame selection, or can most questions be answered regardless of which frames are shown?
- L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search
Ziqi Wang, Boqin Yuan · Aug 31, 2025 · Citations: 0
Multi Agent
We present L-MARS (Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search), a multi-agent retrieval framework for grounded legal question answering that decomposes queries into structured sub-problems, retrieves evidence…
- When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment
Hanqi Yan, Hainiu Xu, Siya Qi, Shu Yang, Yulan He · Aug 30, 2025 · Citations: 0
- Estimating Parameter Fields in Multi-Physics PDEs from Scarce Measurements
Xuyang Li, Mahdi Masmoudi, Rami Gharbi, Nizar Lajnef, Vishnu Naresh Boddeti · Aug 29, 2025 · Citations: 0
- On the Theoretical Limitations of Embedding-Based Retrieval
Orion Weller, Michael Boratko, Iftekhar Naim, Jinhyuk Lee · Aug 28, 2025 · Citations: 0
These new benchmarks push embeddings to work for any query and any notion of relevance that could be given.
- EO-1: An Open Unified Embodied Foundation Model for General Robot Control
Delin Qu, Haoming Song, Qizhi Chen, Zhaoqing Chen, Xianqiang Gao · Aug 28, 2025 · Citations: 0
Long Horizon
The human ability to seamlessly perform multimodal reasoning and physical interaction in the open world is a core goal for general purpose embodied intelligent systems.
- AVIATOR: Towards AI-Agentic Vulnerability Injection Workflow for High-Fidelity, Large-Scale Code Security Dataset
Amine Lbath, Massih-Reza Amini, Aurelien Delaitre, Vadim Okun · Aug 28, 2025 · Citations: 0
In this paper, we present AVIATOR, the first AI-agentic vulnerability injection framework.
- From Guidelines to Guarantees: A Graph-Based Evaluation Harness for Domain-Specific Evaluation of LLMs
Jessica M. Lundin, Usman Nasir Nakakana, Guillaume Chabot-Couture · Aug 28, 2025 · Citations: 0
Rigorous evaluation of domain-specific language models requires benchmarks that are comprehensive, contamination-resistant, and maintainable.
- Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP
Lorenz Hufe, Constantin Venhoff, Erblina Purelku, Maximilian Dreyer, Sebastian Lapuschkin · Aug 28, 2025 · Citations: 0
Red Team
These models serve as suitable drop-in replacements for a broad range of safety-critical applications, where the risks of text-based manipulation outweigh the utility of text recognition.
- NPG-Muse: Scaling Long Chain-of-Thought Reasoning with NP-Hard Graph Problems
Yuyao Wang, Bowen Liu, Jianheng Tang, Nuo Chen, Yuhan Li · Aug 28, 2025 · Citations: 0
However, developing these Long CoT behaviors relies heavily on post-training with high-quality datasets, which are typically costly and human-curated (e.g., mathematics and code), leaving scalable alternatives unexplored.
- AgentCoMa: A Compositional Benchmark Mixing Commonsense and Mathematical Reasoning in Real-World Scenarios
Lisa Alazraki, Lihu Chen, Ana Brassard, Joe Stacey, Hossein A. Rahmani · Aug 27, 2025 · Citations: 0
In this work, we introduce an Agentic Commonsense and Math benchmark (AgentCoMa), where each compositional task requires a commonsense reasoning step and a math reasoning step.
- Diffusion Language Models Know the Answer Before Decoding
Pengxiang Li, Yefan Zhou, Dilxat Muhtar, Lu Yin, Shilin Yan · Aug 27, 2025 · Citations: 0
Empirical evaluations of LLaDA-8B and Dream-7B across multiple tasks show that Prophet reduces the number of decoding steps by up to 3.4x while preserving high generation quality.
- Your AI Bosses Are Still Prejudiced: The Emergence of Stereotypes in LLM-Based Multi-Agent Systems
Jingyu Guo, Yingying Xu · Aug 27, 2025 · Citations: 0
Multi Agent
While stereotypes are well-documented in human social interactions, AI systems are often presumed to be less susceptible to such biases.
- The Information Dynamics of Generative Diffusion
Dejan Stancevic, Luca Ambrogioni · Aug 27, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Language and Experience: A Computational Model of Social Learning in Complex Tasks
Cédric Colas, Tracey Mills, Ben Prystawski, Michael Henry Tessler, Noah Goodman · Aug 26, 2025 · Citations: 0
The ability to combine linguistic guidance from others with direct experience is central to human development, enabling safe and rapid learning in new environments.
- Hybrid Deep Searcher: Scalable Parallel and Sequential Search Reasoning
Dayoon Ko, Jihyuk Kim, Haeju Park, Sohyeon Kim, Dahyun Lee · Aug 26, 2025 · Citations: 0
Long Horizon
Large reasoning models (LRMs) combined with retrieval-augmented generation (RAG) have enabled deep research agents capable of multi-step reasoning with external knowledge retrieval.
- LaTeXTrans: Structured LaTeX Translation with Multi-Agent Coordination
Ziming Zhu, Chenglong Wang, Haosong Xv, Shunjie Xing, Yifu Huo · Aug 26, 2025 · Citations: 0
Demonstrations Multi Agent
In this paper, we introduce LaTeXTrans, a collaborative multi-agent system designed to address this challenge.
- VistaWise: Building Cost-Effective Agent with Cross-Modal Knowledge Graph for Minecraft
Honghao Fu, Junlong Ren, Qi Chai, Deheng Ye, Yujun Cai · Aug 26, 2025 · Citations: 0
- Latent Self-Consistency for Reliable Majority-Set Selection in Short- and Long-Answer Reasoning
Jungsuk Oh, Jay-Yoon Lee · Aug 25, 2025 · Citations: 0
- Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center Dialogue Generation
Rishikesh Devanathan, Varun Nathan, Ayush Kumar · Aug 25, 2025 · Citations: 0
In this work, we benchmark multiple generation strategies guided by structured supervision on call attributes (Intent Summaries, Topic Flows, and Quality Assurance (QA) Forms) across multiple languages.
- Agri-Query: A Case Study on RAG vs. Long-Context LLMs for Cross-Lingual Technical Question Answering
Julius Gun, Timo Oksanen · Aug 25, 2025 · Citations: 0
Our benchmark is built on a user manual for an agricultural machine, available in English, French, and German.
- How Quantization Shapes Bias in Large Language Models
Federico Marcuzzi, Xuefei Ning, Roy Schwartz, Iryna Gurevych · Aug 25, 2025 · Citations: 0
This work presents a comprehensive evaluation of how quantization affects model bias, with particular attention to its impact on individual demographic subgroups.