- From Passive to Persuasive: Localized Activation Injection for Empathy and Negotiation
Niranjan Chebrolu, Kokil Jaidka, Gerard Christopher Yeo · Nov 16, 2025 · Citations: 0
Evaluated on emotional dialogue and negotiation in both single- and multi-turn settings, localized injection consistently outperforms global steering and instruction priming; human evaluation confirms that gains reflect genuine improvements…
- Co-Layout: LLM-driven Co-optimization for Interior Layout
Chucheng Xiang, Ruchao Bao, Biyin Feng, Wenzheng Wu, Zhongyuan Liu · Nov 16, 2025 · Citations: 0
Pairwise Preference
Given a textual prompt, the LLM-driven agent workflow extracts structured design constraints related to room configurations and furniture arrangements.
- Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation
Yuxiang Zhou, Jichang Li, Yanhao Zhang, Haonan Lu, Guanbin Li · Nov 15, 2025 · Citations: 0
- PRISM of Opinions: A Persona-Reasoned Multimodal Framework for User-centric Conversational Stance Detection
Bingbing Wang, Zhixin Bai, Zhengda Jin, Zihan Wang, Xintong Song · Nov 15, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- MediRound: Multi-Round Entity-Level Reasoning Segmentation in Medical Images
Qinyue Tong, Ziqian Lu, Jun Liu, Rui Zuo, Zheming Lu · Nov 15, 2025 · Citations: 0
- EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation
Jiahe Shi, Zhengqi Gao, Ching-Yun Ko, Duane Boning · Nov 15, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Context-Emotion Aware Therapeutic Dialogue Generation: A Multi-component Reinforcement Learning Approach to Language Models for Mental Health Support
Eric Hua Qing Zhang, Julia Ive · Nov 14, 2025 · Citations: 0
Results demonstrated substantial improvements through RLs over baseline GPT-2 across multiple evaluation metrics: BLEU (0.0111), ROUGE-1 (0.1397), ROUGE-2 (0.0213), ROUGE-L (0.1317), and METEOR (0.0581).
- MedPT: A Massive Medical Question Answering Dataset for Brazilian-Portuguese Speakers
Fernanda Bufon Färber, Iago Alves Brito, Julia Soares Dollis, Pedro Schindler Freire Brasil Ribeiro, Rafael Teixeira Sousa · Nov 14, 2025 · Citations: 0
To validate MedPT's utility, we benchmark it in a medical specialty classification task: fine-tuning a 1.7B parameter model achieves an outstanding 94\% F1-score on a 20-class setup.
- Conformal Constrained Policy Optimization for Cost-Effective LLM Agents
Wenwen Si, Sooyong Jang, Insup Lee, Osbert Bastani · Nov 14, 2025 · Citations: 0
We propose a novel strategy where we combine multiple LLM models with varying cost/accuracy tradeoffs in an agentic manner, where models and tools are run in sequence as determined by an orchestration model to minimize cost subject to a…
- From Synthetic Scenes to Real Performance: Enhancing Spatial Reasoning in VLMs
Massimo Rizzoli, Simone Alghisi, Seyed Mahed Mousavi, Giuseppe Riccardi · Nov 14, 2025 · Citations: 0
We conduct exhaustive evaluations on both synthetic and real-world benchmarks.
- D-GAP: Improving Out-of-Domain Robustness via Dataset-Agnostic and Gradient-Guided Augmentation in Frequency and Pixel Spaces
Ruoqi Wang, Haitao Wang, Shaojie Guo, Qiong Luo · Nov 14, 2025 · Citations: 0
- CLARITY: Contextual Linguistic Adaptation and Accent Retrieval for Dual-Bias Mitigation in Text-to-Speech Generation
Crystal Min Hui Poon, Pai Chet Ng, Xiaoxiao Miao, Immanuel Jun Kai Loh, Bowen Zhang · Nov 14, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Multimodal Peer Review Simulation with Actionable To-Do Recommendations for Community-Aware Manuscript Revisions
Mengze Hong, Di Jiang, Weiwei Zhao, Yawen Li, Yihang Wang · Nov 14, 2025 · Citations: 0
Critique Edit
Experimental results highlight the effectiveness of the proposed system in generating more comprehensive and useful reviews aligned with expert standards, surpassing ablated baselines and advancing transparent, human-centered scholarly…
- From Efficiency to Adaptivity: A Deeper Look at Adaptive Reasoning in Large Language Models
Chao Wu, Baoheng Li, Mingchen Gao, Yu Tian, Zhenyi Wang · Nov 13, 2025 · Citations: 0
Recent advances in large language models (LLMs) have made reasoning a central benchmark for evaluating intelligence.
- Mastering Olympiad-Level Physics with Artificial Intelligence
Dong-Shan Jian, Xiang Li, Chen-Xu Yan, Hui-Wen Zheng, Zhi-Zhang Bian · Nov 13, 2025 · Citations: 0
Olympiad-level physics problem-solving significantly challenges both humans and artificial intelligence (AI), as it requires integrating appropriate modeling, application of physical principles, and precise calculation within long reasoning…
- Beyond Elicitation: Provision-based Prompt Optimization for Knowledge-Intensive Tasks
Yunzhe Xu, Zhuosheng Zhang, Zhe Liu · Nov 13, 2025 · Citations: 0
KPPO introduces three key innovations: 1) a knowledge gap filling mechanism for knowledge gap identification and targeted remediation; 2) a batch-wise candidate evaluation approach that considers both performance improvement and…
- Quality Assurance of LLM-generated Code: Addressing Non-Functional Quality Characteristics
Xin Sun, Daniel Ståhl, Kristian Sandahl, Christoph Kessler · Nov 13, 2025 · Citations: 0
- RadHiera: Semantic Hierarchical Reinforcement Learning for Medical Report Generation
Bodong Du, Honglong Yang, Xiaomeng Li · Nov 13, 2025 · Citations: 0
Experiments on three public chest X-ray benchmarks show that RadHiera consistently improves diagnostic accuracy and inter-section consistency over state-of-the-art methods, while also demonstrating good adaptability to report generation in…
- LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models
Huimin Ren, Yan Liang, Baiqiao Su, Chaobo Sun, Hengtong Lu · Nov 13, 2025 · Citations: 0
Current methods either rely on subjective and costly human evaluation or on automated LLM-as-a-judge systems, which suffer from inherent biases and unreliability.
- Do Language Models Associate Sound with Meaning? A Multimodal Study of Sound Symbolism
Jinhong Jeong, Sunghyun Lee, Jaeyoung Lee, Seonah Han, Youngjae Yu · Nov 13, 2025 · Citations: 0
We suggest that this can be a compelling probe into how Multimodal Large Language Models (MLLMs) interpret auditory information in human languages.
- Chain of Summaries: Summarization Through Iterative Questioning
William Brach, Kristián Košťál, Lukas Galke Poech · Nov 12, 2025 · Citations: 0
CoS thus resembles an appealing option for website maintainers to make their content more accessible for LLMs, while retaining possibilities for human oversight.
- What We Don't C: Manifold Disentanglement for Structured Discovery
Brian Rogers, Micah Bowles, Chris J. Lintott, Steve Croft, Oliver N. F. King · Nov 12, 2025 · Citations: 0
- Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque
Lukas Arana, Julen Etxaniz, Ander Salaberria, Gorka Azkune · Nov 12, 2025 · Citations: 0
- POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
Xuanchen Li, Chenrui Cui, Tianrui Wang, Meng Ge, Zikang Huang · Nov 12, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Controllable protein design with particle-based Feynman-Kac steering
Erik Hartman, Jonas Wallin, Johan Malmström, Jimmy Olsson · Nov 12, 2025 · Citations: 0
- Human or LLM as Standardized Patients? A Comparative Study for Medical Education
Bingquan Zhang, Xiaoxiao Liu, Yuchi Wang, Lei Zhou, Qianqian Xie · Nov 12, 2025 · Citations: 0
Multi Agent
Although large language model (LLM)-based virtual standardized patients (VSPs) have been proposed as an alternative, their behavior remains unstable and lacks rigorous comparison with human standardized patients.
- Towards Hyper-Efficient RAG Systems in VecDBs: Distributed Parallel Multi-Resolution Vector Search
Dong Liu, Yanxuan Yu · Nov 12, 2025 · Citations: 0
We implement SPI as a plugin for both FAISS and Qdrant backends and evaluate it across multiple RAG tasks including MS MARCO, Natural Questions, and multimodal retrieval benchmarks.
- $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
Dong Liu, Yanxuan Yu · Nov 12, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Does Scientific Writing Converge to U.S. English? Evidence from Generative AI-Assisted Publications
Dragan Filimonovic, Christian Rutzer, Jeffrey Macher, Rolf Weder · Nov 12, 2025 · Citations: 0
benchmark corpus using SciBERT text embeddings, and estimate dynamic changes using an event-study difference-in-differences design with repeated cross-sections centered on the late-2022 release of ChatGPT.
- CastMind: An Interaction-Driven Agentic Reasoning Framework for Cognition-Inspired Time Series Forecasting
Xiaohan Zhang, Tian Gao, Mingyue Cheng, Bokai Pan, Ze Guo · Nov 12, 2025 · Citations: 0
- TransactionGPT
Yingtong Dou, Zhimeng Jiang, Tianyi Zhang, Mingzhi Hu, Zhichao Xu · Nov 12, 2025 · Citations: 0
We conduct extensive empirical evaluations utilizing a diverse collection of company transaction datasets spanning multiple downstream tasks, thereby enabling a thorough assessment of TGPT's effectiveness and efficiency in comparison to…
- iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification
Zixun Xiong, Gaoyi Wu, Qingyang Yu, Mingyu Derek Ma, Lingfeng Yao · Nov 12, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models
Davi Bastos Costa, Felippe Alves, Renato Vicente · Nov 11, 2025 · Citations: 0
- AlphaResearch: Accelerating New Algorithm Discovery with Language Models
Zhaojian Yu, Kaiyue Feng, Yilun Zhao, Shilin He, Xiao-Ping Zhang · Nov 11, 2025 · Citations: 0
In this paper, we present AlphaResearch, an autonomous research agent designed to discover new algorithms on open-ended problems by iteratively running the following steps: (1) propose new ideas (2) program to verify (3) optimize the…
- Automatic Paper Reviewing with Heterogeneous Graph Reasoning over LLM-Simulated Reviewer-Author Debates
Shuaimin Li, Liyang Fan, Yufang Lin, Zeyang Li, Xian Wei · Nov 11, 2025 · Citations: 0
Multi Agent
In our approach, reviewer-author exchanges are simulated through LLM-based multi-agent collaboration.
- Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback
Yishan Du, Conrad Borchers, Mutlu Cukurova · Nov 11, 2025 · Citations: 0
As teachers increasingly turn to GenAI in their educational practice, we need robust methods to benchmark large language models (LLMs) for pedagogical purposes.
- Quantification and object perception in Multimodal Large Language Models and human linguistic cognition
Raquel Montero, Natalia Moskvina, Paolo Morosi, Tamara Serrano, Elena Pagliarini · Nov 11, 2025 · Citations: 0
This paper looks at three key features of human quantification shared cross-linguistically that have remained so far unexplored in the (M)LLM literature: the ordering of quantifiers into scales, the ranges of use and prototypicality, and…
- A robust methodology for long-term sustainability evaluation of Machine Learning models
Jorge Paz-Ruza, João Gama, Amparo Alonso-Betanzos, Bertha Guijarro-Berdiñas · Nov 11, 2025 · Citations: 0
- Multimodal LLMs Do Not Compose Skills Optimally Across Modalities
Paula Ontalvilla, Aitor Ormazabal, Gorka Azkune · Nov 11, 2025 · Citations: 0
- Information Capacity: Evaluating the Efficiency of Large Language Models via Text Compression
Cheng Yuan, Jiawei Shao, Xuelong Li · Nov 11, 2025 · Citations: 0
A distinctive feature of information capacity is its incorporation of tokenizer efficiency, which affects inference costs but is often neglected in LLM evaluations.
- State of the Art in Text Classification for South Slavic Languages: Fine-Tuning or Prompting?
Taja Kuzman Pungeršek, Peter Rupnik, Ivan Porupski, Vuk Dinić, Nikola Ljubešić · Nov 11, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker
Matthias De Lange, Jens-Joris Decorte, Jeroen Van Hautte · Nov 11, 2025 · Citations: 0
These constraints have led to isolated, task-specific developments in the field, with models and benchmarks focused on single prediction tasks.
- Intelligence per Watt: Measuring Intelligence Efficiency of Local AI
Jon Saad-Falcon, Avanika Narayan, Hakki Orhun Akengin, J. Wes Griffin, Herumb Shandilya · Nov 11, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- ViPRA: Video Prediction for Robot Actions
Sandeep Routray, Hengkai Pan, Unnat Jain, Shikhar Bahl, Deepak Pathak · Nov 11, 2025 · Citations: 0
Demonstrations
Videos, including those of humans or teleoperated robots, capture rich physical interactions.
- Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces
Shreyas Rajesh, Pavan Holur, Chenda Duan, David Chong, Vwani Roychowdhury · Nov 10, 2025 · Citations: 0
Long Horizon
On the Episodic Memory Benchmark (EpBench) huet_episodic_2025 comprising corpora ranging from 100k to 1M tokens in length, GSW outperforms existing RAG based baselines by up to 20\%.
- SPOT: An Annotated French Corpus and Benchmark for Detecting Critical Interventions in Online Conversations
Manon Berriche, Célia Nouri, Chloée Clavel, Jean-Philippe Cointet · Nov 10, 2025 · Citations: 0
- Graph Representation-based Model Poisoning on the Heterogeneous Internet of Agents
Hanlin Cai, Houtianfu Wang, Haofan Dong, Kai Li, Sai Zou · Nov 10, 2025 · Citations: 0
Internet of Agents (IoA) envisions a unified, agent-centric paradigm where heterogeneous large language model (LLM) agents can interconnect and collaborate at scale.
- Categorical Emotions or Appraisals - Which Emotion Model Explains Argument Convincingness Better?
Lynn Greschner, Meike Bauer, Sabine Weber, Roman Klinger · Nov 10, 2025 · Citations: 0
- More Agents Improve Math Problem Solving but Adversarial Robustness Gap Persists
Khashayar Alavi, Zhastay Yeltay, Lucie Flek, Akbar Karimi · Nov 10, 2025 · Citations: 0
These perturbations include punctuation noise with three intensities (10%, 30%, 50%), plus real-world and human-like typos (WikiTypo, R2ATA).
- RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation
Haofeng Wang, Yu Zhang · Nov 10, 2025 · Citations: 0
Large Vision-Language Models (LVLMs) excel in multimodal reasoning and have shown impressive performance on various multimodal benchmarks.
- QUARK: Quantization-Enabled Circuit Sharing for Transformer Acceleration by Exploiting Common Patterns in Nonlinear Operations
Zhixiong Zhao, Haomin Li, Fangxin Liu, Yuncheng Lu, Zongwu Wang · Nov 10, 2025 · Citations: 0
- Steering LLMs toward Korean Local Speech: Iterative Refinement Framework for Faithful Dialect Translation
Keunhyeung Park, Seunguk Yu, Youngbin Kim · Nov 10, 2025 · Citations: 0
Standard-to-dialect machine translation remains challenging due to a persistent dialect gap in large language models and evaluation distortions inherent in n-gram metrics, which favor source copying over authentic dialect translation.
- How AI Fails: An Interactive Pedagogical Tool for Demonstrating Dialectal Bias in Automated Toxicity Models
Subhojit Ghimire · Nov 10, 2025 · Citations: 0
First, I conduct a quantitative benchmark of a widely used toxicity model (unitary/toxic-bert) to measure performance disparity between text in African-American English (AAE) and Standard American English (SAE).
- Contradictions in Context: Challenges for Retrieval-Augmented Generation in Healthcare
Saeedeh Javadi, Sara Mirabi, Manan Gangar, Bahadorreza Ofoghi · Nov 10, 2025 · Citations: 0
- Dutch Metaphor Extraction from Cancer Patients' Interviews and Forum Data using LLMs and Human in the Loop
Lifeng Han, David Lindevelt, Sander Puts, Erik van Mulligen, Suzan Verberne · Nov 9, 2025 · Citations: 0
Expert Verification
With a human-in-the-loop setup, we verify the extracted metaphors and compile the outputs into a corpus named HealthQuote.NL.
- HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection
Irina Proskurina, Marc-Antoine Carpentier, Julien Velcin · Nov 9, 2025 · Citations: 0
Optimization of offensive content moderation models for different types of hateful messages is typically achieved through continued pre-training or fine-tuning on new hate speech benchmarks.
- IDALC: A Semi-Supervised Framework for Intent Detection and Active Learning based Correction
Ankan Mullick, Sukannya Purkayastha, Saransh Sharma, Pawan Goyal, Niloy Ganguly · Nov 8, 2025 · Citations: 0
In this paper, we introduce IDALC (Intent Detection and Active Learning based Correction), a semi-supervised framework designed to detect user intents and rectify system-rejected utterances while minimizing the need for human annotation.
- Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs
Alina Fastowski, Bardh Prenkaj, Yuxiao Li, Gjergji Kasneci · Nov 8, 2025 · Citations: 0
Here, we propose the first principled attack evaluation on LLM factual memory under prompt injection via Xmera, our novel, theory-grounded MitM framework.
- Q$^2$: Quantization-Aware Gradient Balancing and Attention Alignment for Low-Bit Quantization
Zhaoyang Wang, Dong Wang · Nov 8, 2025 · Citations: 0
Long Horizon
Quantization-aware training (QAT) has achieved remarkable success in low-bit ($\leq$4-bit) quantization for classification networks.
- VLAD-Grasp: Zero-shot Grasp Detection via Vision-Language Models
Manav Kulshrestha, S. Talha Bukhari, Damon Conover, Aniket Bera · Nov 8, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.