- Support-Contra Asymmetry in LLM Explanations
Avinash Patil · Oct 23, 2025 · Citations: 0
Across three benchmark datasets-WIKIONTOLOGY, AG NEWS, and IMDB-we observe a consistent empirical pattern that we term support-contra asymmetry.
- Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation
Yuhan Liu, Lianhui Qin, Shengjie Wang · Oct 23, 2025 · Citations: 0
- Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People
Gabriel Grand, Valerio Pepe, Jacob Andreas, Joshua B. Tenenbaum · Oct 23, 2025 · Citations: 0
Drawing on insights from human cognition, we develop methods to evaluate and enhance agentic information-seeking.
- Co-Designing Quantum Codes with Transversal Diagonal Gates via Multi-Agent Systems
Xi He, Sirui Lu, Bei Zeng · Oct 23, 2025 · Citations: 0
Multi Agent
We address this gap by extending TeXRA with an independent Lean 4 verification layer, turning it into a human-guided multi-agent platform for exact scientific discovery.
- Transferable Graph Learning for Transmission Congestion Management via Busbar Splitting
Ali Rajaei, Peter Palensky, Jochen L. Cremer · Oct 23, 2025 · Citations: 0
- Automated Coding of Communication Data Using ChatGPT: Consistency Across Subgroups
Jiangang Hao, Wenju Cui, Patrick Kyllonen, Emily Kerzabi · Oct 23, 2025 · Citations: 0
Rubric Rating
Prior research has established that ChatGPT can be directly instructed with coding rubrics to code the communication data and achieves accuracy comparable to human raters.
- GlobalRAG: Enhancing Global Reasoning in Multi-hop Question Answering via Reinforcement Learning
Jinchang Luo, Mingquan Cheng, Fan Wan, Ni Li, Xiaoling Xia · Oct 23, 2025 · Citations: 0
Long Horizon
Extensive experiments on both in-domain and out-of-domain benchmarks demonstrate that GlobalRAG significantly outperforms strong baselines while using only 8k training data (42% of the training data used by strong baselines), achieving…
- Assessing the Political Fairness of Multilingual LLMs: A Case Study based on a 21-way Multiparallel EuroParl Dataset
Paul Lerner, François Yvon · Oct 23, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- RELOOP: Recursive Retrieval with Multi-Hop Reasoner and Planners for Heterogeneous QA
Ruiyi Yang, Hao Xue, Imran Razzak, Hakim Hacid, Flora D. Salim · Oct 23, 2025 · Citations: 0
Long Horizon
A Head Agent provides guidance that leads retrieval, while an Iteration Agent selects and expands HSeq via structure-respecting actions (e.g., parent/child hops, table row/column neighbors, KG relations); Finally the head agent composes…
- Robust Preference Alignment via Directional Neighborhood Consensus
Ruochen Mao, Yuling Shi, Xiaodong Gu, Jiaheng Wei · Oct 23, 2025 · Citations: 0
Pairwise Preference
To address this challenge, we introduce Robust Preference Selection (RPS), a post-hoc, training-free method by leveraging directional neighborhood consensus.
- Steering Evaluation-Aware Language Models to Act Like They Are Deployed
Tim Tian Hua, Andrew Qin, Samuel Marks, Neel Nanda · Oct 23, 2025 · Citations: 0
- Evaluating Latent Knowledge of Public Tabular Datasets in Large Language Models
Matteo Silvestri, Fabiano Veglianti, Flavio Giorgi, Fabrizio Silvestri, Gabriele Tolomei · Oct 23, 2025 · Citations: 0
In contrast, we propose a framework for assessing contamination in tabular datasets by generating controlled queries and performing comparative evaluation.
- Citation Failure: Definition, Analysis and Efficient Mitigation
Jan Buchmann, Iryna Gurevych · Oct 23, 2025 · Citations: 0
- CreativityPrism: A Holistic Evaluation Framework for Large Language Model Creativity
Zhaoyi Joey Hou, Bowei Alvin Zhang, Yining Lu, Bhiman Kumar Baghel, Anneliese Brei · Oct 23, 2025 · Citations: 0
Creativity is often seen as a hallmark of human intelligence.