- Improving Attributed Long-form Question Answering with Intent Awareness
Xinran Zhao, Aakanksha Naik, Jay DeYoung, Joseph Chee Chang, Jena D. Hwang · Mar 28, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- The Geometry of Harmful Intent: Training-Free Anomaly Detection via Angular Deviation in LLM Residual Streams
Isaac Llorente-Saguer · Mar 28, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Heterogeneous Debate Engine: Identity-Grounded Cognitive Architecture for Resilient LLM-Based Ethical Tutoring
Jakub Masłowski, Jarosław A. Chudziak · Mar 28, 2026 · Citations: 0
Multi Agent
Large Language Models (LLMs) are being increasingly used as autonomous agents in complex reasoning tasks, opening the niche for dialectical interactions.
- Not Worth Mentioning? A Pilot Study on Salient Proposition Annotation
Amir Zeldes, Katherine Conhaim, Lauren Levine · Mar 28, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Culturally Adaptive Explainable LLM Assessment for Multilingual Information Disorder: A Human-in-the-Loop Approach
Maziar Kianimoghadam Jouneghani · Mar 28, 2026 · Citations: 0
To address this gap, this ongoing study proposes a Hybrid Intelligence Loop, a human-in-the-loop (HITL) framework that grounds model assessment in human-written rationales from native-speaking annotators.
- LLM Readiness Harness: Evaluation, Observability, and CI Gates for LLM/RAG Applications
Alexandre Cristovão Maiorano · Mar 28, 2026 · Citations: 0
We present a readiness harness for LLM and RAG applications that turns evaluation into a deployment decision workflow.
- Inference-Time Structural Reasoning for Compositional Vision-Language Understanding
Amartya Bhattacharya · Mar 28, 2026 · Citations: 0
We present, a unified evaluation and augmentation framework benchmarking four architecturally diverse VLMs,CLIP, BLIP, LLaVA, and Qwen3-VL-8B-Thinking,on the Winoground benchmark under plain and scene-graph-augmented regimes.
- ASTRA: Mapping Art-Technology Institutions via Conceptual Axes, Text Embeddings, and Unsupervised Clustering
Joonhyung Bae · Mar 28, 2026 · Citations: 0
- PubMed Reasoner: Dynamic Reasoning-based Retrieval for Evidence-Grounded Biomedical Question Answering
Yiqing Zhang, Xiaozhong Liu, Fabricio Murai · Mar 28, 2026 · Citations: 0
Expert Verification
In this context, we introduce PubMed Reasoner, a biomedical QA agent composed of three stages: self-critic query refinement evaluates MeSH terms for coverage, alignment, and redundancy to enhance PubMed queries based on partial (metadata)…
- SACRED: A Faithful Annotated Multimedia Multimodal Multilingual Dataset for Classifying Connectedness Types in Online Spirituality
Qinghao Guan, Yuchen Pan, Donghao Li, Zishi Zhang, Yiyang Chen · Mar 28, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Self-evolving AI agents for protein discovery and directed evolution
Yang Tan, Lingrong Zhang, Mingchen Li, Yuanxi Yu, Bozitao Zhong · Mar 28, 2026 · Citations: 0
Multi Agent
Protein scientific discovery is bottlenecked by the manual orchestration of information and algorithms, while general agents are insufficient in complex domain projects.
- Mitigating Hallucination on Hallucination in RAG via Ensemble Voting
Zequn Xie, Zhengyang Sun · Mar 28, 2026 · Citations: 0
Multi Agent
VOTE-RAG includes: (1) Retrieval Voting, where multiple agents generate diverse queries in parallel and aggregate all retrieved documents; (2) Response Voting, where multiple agents independently generate answers based on the aggregated…
- SCOPE: Tree-based Self-Correcting Online Log Parsing via Syntactic-Semantic Collaboration
Dongyi Fan, Suqiong Zhang, Lili He, Ming Liu, Yifan Huo · Mar 28, 2026 · Citations: 0
Extensive evaluations on diverse benchmark datasets show that SCOPE outperforms state-of-the-art methods in both accuracy and efficiency.
- Structural Stress and Learned Helplessness in Afghanistan: A Multi-Layer Analysis of the AFSTRESS Dari Corpus
Jawid Ahmad Baktash, Mursal Dawodi, Nadira Ahmadi · Mar 28, 2026 · Citations: 0
We introduce AFSTRESS, the first multi-label corpus of self-reported stress narratives in Dari (Eastern Persian), comprising 737 responses collected from Afghan individuals during an ongoing humanitarian crisis.
- Rethinking Easy-to-Hard: Limits of Curriculum Learning in Post-Training for Deductive Reasoning
Maximilian Mordig, Andreas Opedal, Weiyang Liu, Bernhard Schölkopf · Mar 28, 2026 · Citations: 0
We present a systematic empirical study of CL for post-training of LLMs, using synthetic arithmetic and logical benchmarks where difficulty is characterized by reasoning complexity rather than surface-level proxies.
- LightMover: Generative Light Movement with Color and Intensity Controls
Gengze Zhou, Tianyu Wang, Soo Ye Kim, Zhixin Shu, Xin Yu · Mar 28, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- daVinci-LLM:Towards the Science of Pretraining
Yiwei Qin, Yixiu Liu, Tiantian Mi, Muhang Xie, Zhen Huang · Mar 28, 2026 · Citations: 0
Through 200+ controlled ablations, we establish that: processing depth systematically enhances capabilities, establishing it as a critical dimension alongside volume scaling; different domains exhibit distinct saturation dynamics,…
- Weakly Convex Ridge Regularization for 3D Non-Cartesian MRI Reconstruction
German Shâma Wache, Chaithya G R, Asma Tanabene, Sebastian Neumayer · Mar 28, 2026 · Citations: 0
- Learning to Predict Future-Aligned Research Proposals with Language Models
Heng Wang, Pengcheng Jiang, Jiashuo Sun, Zhiyi Shi, Haofei Yu · Mar 28, 2026 · Citations: 0
Across Llama-3.1 and Qwen2.5 models, future-aligned tuning improves future alignment over unaligned baselines (up to +10.6% overall FAS), and domain-expert human evaluation corroborates improved proposal quality.
- Routing Sensitivity Without Controllability: A Diagnostic Study of Fairness in MoE Language Models
Junhyeok Lee, Kyu Sung Choi · Mar 28, 2026 · Citations: 0
Pairwise Preference
FARE reveals that routing-level preference shifts are either unachievable (Mixtral, Qwen1.5, Qwen3), statistically non-robust (DeepSeekMoE), or accompanied by substantial utility cost (OLMoE, -4.4%p CrowS-Pairs at -6.3%p TQA).
- Story2Proposal: A Scaffold for Structured Scientific Paper Writing
Zhuoyang Qian, Wei Shi, Xu Lin, Li Ling, Meng Luo · Mar 28, 2026 · Citations: 0
Multi Agent
We introduce Story2Proposal, a contract-governed multi-agent framework that converts a research story into a structured manuscript through coordinated agents operating under a persistent shared visual contract.
- ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding
Jovana Kondic, Pengyuan Li, Dhiraj Joshi, Isaac Sanchez, Ben Wiesel · Mar 28, 2026 · Citations: 0
To capture the full spectrum of chart comprehension, ChartNet additionally includes specialized subsets encompassing human annotated data, real-world data, safety, and grounding.
- Debiasing Large Language Models toward Social Factors in Online Behavior Analytics through Prompt Knowledge Tuning
Hossein Salemi, Jitin Krishnan, Hemant Purohit · Mar 28, 2026 · Citations: 0
Large Language Models (LLMs), trained on human-generated corpora, may implicitly mimic this social attribution process in social contexts.
- Text Data Integration
Md Ataur Rahman, Dimitris Sacharidis, Oscar Romero, Sergi Nadal · Mar 28, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.