- EFT-CoT: A Multi-Agent Chain-of-Thought Framework for Emotion-Focused Therapy
Lanqing Du, Yunong Li, YuJie Long, Shihong Chen · Jan 25, 2026 · Citations: 0
Multi Agent
To address this gap, we propose EFT-CoT, a multi-agent chain-of-thought framework grounded in Emotion-Focused Therapy (EFT).
- Generation-Step-Aware Framework for Cross-Modal Representation and Control in Multilingual Speech-Text Models
Toshiki Nakai, Varsha Suresh, Vera Demberg · Jan 24, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Decoupling Strategy and Execution in Task-Focused Dialogue via Goal-Oriented Preference Optimization
Jingyi Xu, Xingyu Ren, Zhoupeng Shou, Yumeng Zhang, Zhiqiang You · Jan 24, 2026 · Citations: 0
Pairwise Preference Long Horizon
To address this, we propose Goal-Oriented Preference Optimization (GOPO), a hierarchical reinforcement learning framework that decouples strategy planning from response generation via an Expert Agent and a Customer Service Agent.
- Building Safe and Deployable Clinical Natural Language Processing under Temporal Leakage Constraints
Ha Na Cho, Sairam Sutari, Alexander Lopez, Hansen Bow, Kai Zheng · Jan 24, 2026 · Citations: 0
Such behavior poses substantial risks for real-world deployment, where overconfident or temporally invalid predictions can disrupt clinical workflows and compromise patient safety.
- IntelliAsk: Learning to Ask High-Quality Research Questions via RLVR
Karun Sharma, Vidushee Vats, Shengzhi Li, Yuxiang Wang, Zhongtian Sun · Jan 23, 2026 · Citations: 0
Pairwise PreferenceExpert Verification
Peer review relies on substantive, evidence-based questions, yet current LLMs generate surface-level queries that perform worse than human reviewer questions in expert evaluation.
- Large Language Models as Automatic Annotators and Annotation Adjudicators for Fine-Grained Opinion Analysis
Gaurav Negi, MA Waskow, John McCrae, Paul Buitelaar · Jan 23, 2026 · Citations: 0
Although this level of detail is sound, it requires considerable human effort and substantial cost to annotate opinions in datasets for training models, especially across diverse domains and real-world applications.
- The Mouth is Not the Brain: Bridging Energy-Based World Models and Language Generation
Junichiro Niimi · Jan 23, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Jacobian Scopes: token-level causal attributions in LLMs
Toni J. B. Liu, Baran Zadeoğlu, Nicolas Boullé, Raphaël Sarfati, Christopher J. Earls · Jan 23, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- PhysE-Inv: A Physics-Encoded Inverse Modeling approach for Arctic Snow Depth Prediction
Akila Sampath, Vandana Janeja, Jianwu Wang · Jan 23, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Where is the multimodal goal post? On the Ability of Foundation Models to Recognize Contextually Important Moments
Aditya K Surikuchi, Raquel Fernández, Sandro Pezzelle · Jan 22, 2026 · Citations: 0
Pairwise Preference
To this end, we construct a new dataset by leveraging human preferences for importance implicit in football game highlight reels, without any additional annotation costs.
- A Longitudinal, Multinational, and Multilingual Corpus of News Coverage of the Russo-Ukrainian War
Dikshya Mohanty, Taisiia Sabadyn, Jelwin Rodrigues, Chenlu Wang, Abhishek Kalugade · Jan 22, 2026 · Citations: 0
The corpus features comprehensive metadata and human-evaluated annotations for stance, sentiment, and topical framing, enabling systematic analysis of competing geopolitical narratives.
- Computer Environments Elicit General Agentic Intelligence in LLMs
Daixuan Cheng, Shaohan Huang, Yuxian Gu, Huatong Song, Guoxin Chen · Jan 22, 2026 · Citations: 0
Agentic intelligence in large language models (LLMs) requires not only model intrinsic capabilities but also interactions with external environments.
- Between Search and Platform: ChatGPT Under the DSA
Toni Lorente, Kathrin Gardhouse · Jan 22, 2026 · Citations: 0
Web Browsing
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- ErrorMap and ErrorAtlas: Charting the Failure Landscape of Large Language Models
Shir Ashury-Tahan, Yifan Mai, Elron Bandel, Michal Shmueli-Scheuer, Leshem Choshen · Jan 22, 2026 · Citations: 0
Large Language Models (LLM) benchmarks tell us when models fail, but not why they fail.
- RebuttalAgent: Strategic Persuasion in Academic Rebuttal via Theory of Mind
Zhitao He, Zongwei Lyu, Yi R Fung · Jan 22, 2026 · Citations: 0
Pairwise PreferenceCritique Edit
In this paper, we introduce RebuttalAgent, the first framework to ground academic rebuttal in Theory of Mind (ToM), operationalized through a ToM-Strategy-Response (TSR) framework that models reviewer mental state, formulates persuasion…
- What Patients Really Ask: Exploring the Effect of False Assumptions in Patient Information Seeking
Raymond Xiong, Furong Jia, Lionel Wong, Monica Agrawal · Jan 22, 2026 · Citations: 0
However, benchmarking efforts in LLMs for question answering often focus on medical exam questions, which differ significantly in style and content from the questions patients actually raise in real life.
- The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models
Zanlin Ni, Shenzhi Wang, Yang Yue, Tianyu Yu, Weilin Zhao · Jan 21, 2026 · Citations: 0
Long Horizon
We demonstrate that effective reasoning can be better elicited by intentionally forgoing arbitrary order and applying standard Group Relative Policy Optimization (GRPO) instead.
- Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning
Yuval Kansal, Niraj K. Jha · Jan 21, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models
Injin Kong, Hyoungjoon Lee, Yohan Jo · Jan 21, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks
Zixuan Ke, Yifei Ming, Austin Xu, Ryan Chin, Xuan-Phi Nguyen · Jan 21, 2026 · Citations: 0
Multi Agent
While multi-agent systems (MAS) promise elevated intelligence through coordination of agents, current approaches to automatic MAS design under-deliver.
- Forest-Chat: Adapting Vision-Language Agents for Interactive Forest Change Analysis
James Brock, Ce Zhang, Nantheera Anantrasirichai · Jan 21, 2026 · Citations: 0
This paper introduces Forest-Chat, an LLM-driven agent for forest change analysis, enabling natural language querying across multiple RSICI tasks, including change detection and captioning, object counting, deforestation characterisation,…
- From Toil to Thought: Designing for Strategic Exploration and Responsible AI in Systematic Literature Reviews
Runlong Ye, Naaz Sibia, Angela Zavaleta Bernuy, Tingting Zhu, Carolina Nobre · Jan 21, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Compounding Disadvantage: Auditing Intersectional Bias in LLM-Generated Explanations Across Indian and American STEM Education
Amogh Gupta, Niharika Patil, Sourojit Ghosh, SnehalKumar, S Gaikwad · Jan 20, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- VisTIRA: Closing the Image-Text Modality Gap in Visual Math Reasoning via Structured Tool Integration
Saeed Khaki, Ashudeep Singh, Nima Safaei, Kamal Ginotra · Jan 20, 2026 · Citations: 0
First, we introduce VisTIRA (Vision and Tool-Integrated Reasoning Agent), a tool-integrated reasoning framework that enables structured problem solving by iteratively decomposing a given math problem (as an image) into natural language…
- APEX-Agents
Bertie Vidgen, Austin Mann, Abby Fennelly, John Wright Stanly, Lucas Rothman · Jan 20, 2026 · Citations: 0
Rubric RatingExpert Verification Long Horizon
We introduce the AI Productivity Index for Agents (APEX-Agents), a benchmark for assessing whether AI agents can execute long-horizon, cross-application tasks created by investment banking analysts, management consultants, and corporate…
- Human Values in a Single Sentence: Moral Presence, Hierarchies, and Transformer Ensembles on the Schwartz Continuum
Víctor Yeste, Paolo Rosso · Jan 20, 2026 · Citations: 0
We study sentence-level detection of the 19 human values in the refined Schwartz continuum in about 74k English sentences from news and political manifestos (ValueEval'24 corpus).
- Agentic SPARQL: Evaluating SPARQL-MCP-powered Intelligent Agents on the Federated KGQA Benchmark
Daniel Dobriy, Frederik Bauer, Amr Azzam, Debayan Banerjee, Axel Polleres · Jan 20, 2026 · Citations: 0
- Chain-of-Thought Compression Should Not Be Blind: V-Skip for Efficient Multimodal Reasoning via Dual-Path Anchoring
Dongxu Zhang, Yiding Sun, Cheng Tan, Wenbiao Yan, Ning Yang · Jan 20, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis
Yushen Chen, Junzhe Liu, Yujie Tu, Zhikang Niu, Yuzhe Liang · Jan 20, 2026 · Citations: 0
Long Horizon
Key barriers include substantial cross-dialect lexical and phonological divergence, scarce synthesis-grade data, and the absence of a standardized multi-dialect evaluation benchmark.
- Yuan3.0 Ultra: A Trillion-Parameter Enterprise-Oriented MoE LLM
YuanLab. ai, :, Shawn Wu, Jiangang Luo, Darcy Chen · Jan 20, 2026 · Citations: 0
- Hierarchical Long Video Understanding with Audiovisual Entity Cohesion and Agentic Search
Xinlei Yin, Xiulian Peng, Xiao Li, Zhiwei Xiong, Yan Lu · Jan 20, 2026 · Citations: 0
- Vulnerability of LLMs' Stated Beliefs? LLMs Belief Resistance Check Through Strategic Persuasive Conversation Interventions
Fan Huang, Haewoon Kwak, Jisun An · Jan 20, 2026 · Citations: 0
We present a systematic evaluation of LLM susceptibility to persuasion under the Source--Message--Channel--Receiver (SMCR) communication framework.
- RAGExplorer: A Visual Analytics System for the Comparative Diagnosis of RAG Systems
Haoyu Tian, Yingchaojie Feng, Zhen Wen, Haoxuan Li, Minfeng Zhu · Jan 19, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation
Jesus-German Ortiz-Barajas, Jonathan Tonglet, Vivek Gupta, Iryna Gurevych · Jan 19, 2026 · Citations: 0
Preliminary human results (limited sample size) indicate a 20.2-point accuracy drop.
- A Component-Based Survey of Interactions between Large Language Models and Multi-Armed Bandits
Siguang Chen, Chunli Lv, Miao Xie · Jan 19, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- SciCoQA: Quality Assurance for Scientific Paper--Code Alignment
Tim Baumgärtner, Iryna Gurevych · Jan 19, 2026 · Citations: 0
Our evaluation of 22 LLMs demonstrates the difficulty of SciCoQA, particularly for instances involving omitted paper details, long-context inputs, and data outside the models' pre-training corpus.
- YOLO26: An Analysis of NMS-Free End to End Framework for Real-Time Object Detection
Sudip Chakrabarty · Jan 19, 2026 · Citations: 0
To contextualize its performance, this article reviews exhaustive benchmark data from the COCO val2017 leaderboard.
- When LLMs Imagine People: A Human-Centered Persona Brainstorm Audit for Bias and Fairness in Creative Applications
Hongliu Cao, Eoin Thomas, Rodrigo Acuna Agost · Jan 19, 2026 · Citations: 0
Existing methods rely on constrained tasks and fixed benchmarks, leaving open-ended creative outputs unexamined.
- Multimodal Multi-Agent Empowered Legal Judgment Prediction
Zhaolu Kang, Junhao Gong, Qingxi Chen, Hao Zhang, Jiaxin Liu · Jan 19, 2026 · Citations: 0
Multi Agent
Furthermore, we build JurisMM, a large dataset with over 100,000 recent Chinese judicial records, including both text and multimodal video-text data, enabling comprehensive evaluation.
- Empowering All-in-Loop Health Management of Spacecraft Power System in the Mega-Constellation Era via Human-AI Collaboration
Yi Di, Zhibin Zhao, Fujin Wang, Xue Liu, Jiafeng Tang · Jan 19, 2026 · Citations: 0