- NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors
Lingfeng Ren, Weihao Yu, Runpeng Yu, Xinchao Wang · Feb 25, 2026
Automatic Metrics Coding
Object hallucination is a critical issue in Large Vision-Language Models (LVLMs), where outputs include objects that do not appear in the input image.
- DynamicGTR: Leveraging Graph Topology Representation Preferences to Boost VLM Capabilities on Graph QAs
Yanbin Wei, Jiangyue Yan, Chun Kang, Yang Chen, Hua Liu · Feb 25, 2026
Automatic Metrics General
This ``one-size-fits-all'' strategy often neglects model-specific and task-specific preferences, resulting in inaccurate or over-lengthy responses to graph-related queries.
- SurGo-R1: Benchmarking and Modeling Contextual Reasoning for Operative Zone in Surgical Video
Guanyi Qin, Xiaozhen Wang, Zhu Zhuo, Chang Han Low, Yuancan Xiao · Feb 25, 2026
Automatic Metrics MedicineCoding
Existing AI systems offer binary safety verification or static detection, ignoring the phase-dependent nature of intraoperative reasoning.
- Virtual Biopsy for Intracranial Tumors Diagnosis on MRI
Xinzhe Luo, Shuai Shao, Yan Wang, Jiangtao Wang, Yutong Bai · Feb 25, 2026
Automatic Metrics Medicine
To address these challenges, we construct the ICT-MRI dataset - the first public biopsy-verified benchmark with 249 cases across four categories.
- XMorph: Explainable Brain Tumor Analysis Via LLM-Assisted Hybrid Deep Intelligence
Sepehr Salem Ghahfarokhi, M. Moein Esfahani, Raj Sunderraman, Vince Calhoun, Mohammed Alser · Feb 24, 2026
Automatic Metrics MedicineCoding
Deep learning has significantly advanced automated brain tumor diagnosis, yet clinical adoption remains limited by interpretability and computational constraints.
- OrthoDiffusion: A Generalizable Multi-Task Diffusion Foundation Model for Musculoskeletal MRI Interpretation
Tian Lan, Lei Xu, Zimu Yuan, Shanggui Liu, Jiajun Liu · Feb 24, 2026
Automatic Metrics Medicine
Our evaluation demonstrates that OrthoDiffusion achieves excellent performance in the segmentation of 11 knee structures and the detection of 8 knee abnormalities.
- MedCLIPSeg: Probabilistic Vision-Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation
Taha Koleilat, Hojat Asgariandehkordi, Omid Nejati Manzari, Berardino Barile, Yiming Xiao · Feb 23, 2026
Automatic Metrics Medicine
Medical image segmentation remains challenging due to limited annotations for training, ambiguous anatomical features, and domain shifts.
- When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators
Krzysztof Adamkiewicz, Brian Moser, Stanislav Frolov, Tobias Christian Nauen, Federico Raue · Feb 23, 2026
Automatic Metrics General
Recent text-to-image (T2I) diffusion models produce visually stunning images and demonstrate excellent prompt following.
- Classroom Final Exam: An Instructor-Tested Reasoning Benchmark
Chongyang Gao, Diji Yang, Shuyan Zhou, Xichen Yan, Luchuan Song · Feb 23, 2026
Automatic Metrics Coding
We introduce \CFE{} (\textbf{C}lassroom \textbf{F}inal \textbf{E}xam), a multimodal benchmark for evaluating the reasoning capabilities of large language models across more than 20 STEM domains.
- Adaptive Data Augmentation with Multi-armed Bandit: Sample-Efficient Embedding Calibration for Implicit Pattern Recognition
Minxue Tang, Yangyang Yu, Aolin Ding, Maziyar Baran Pouyan, Taha Belkhouja Yujia Bao · Feb 22, 2026
Automatic Metrics General
Recognizing implicit visual and textual patterns is essential in many real-world applications of modern AI.
- VIGiA: Instructional Video Guidance via Dialogue Reasoning and Retrieval
Diogo Glória-Silva, David Semedo, João Maglhães · Feb 22, 2026
Automatic Metrics General
Our evaluation shows that VIGiA outperforms existing state-of-the-art models on all tasks in a conversational plan guidance setting, reaching over 90\% accuracy on plan-aware VQA.
- Index Light, Reason Deep: Deferred Visual Ingestion for Visual-Dense Document Question Answering
Tao Xu · Feb 15, 2026
Automatic Metrics Coding
16.1\% (+14.5pp); on CircuitVQA, a public benchmark (9,315 questions), retrieval ImgR@3 achieves 31.2\% vs.
- Chain-of-Thought Compression Should Not Be Blind: V-Skip for Efficient Multimodal Reasoning via Dual-Path Anchoring
Dongxu Zhang, Yiding Sun, Cheng Tan, Wenbiao Yan, Ning Yang · Jan 20, 2026
Automatic Metrics General
While Chain-of-Thought (CoT) reasoning significantly enhances the performance of Multimodal Large Language Models (MLLMs), its autoregressive nature incurs prohibitive latency constraints.
- KD-OCT: Efficient Knowledge Distillation for Clinical-Grade Retinal OCT Classification
Erfan Nourbakhsh, Nasrin Sanjari, Ali Nourbakhsh · Dec 9, 2025
Automatic Metrics MedicineCoding
Age-related macular degeneration (AMD) and choroidal neovascularization (CNV)-related conditions are leading causes of vision loss worldwide, with optical coherence tomography (OCT) serving as a cornerstone for early detection and managemen
- BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning
Qiusi Zhan, Hyeonjeong Ha, Rui Yang, Sirui Xu, Hanyang Chen · Oct 31, 2025
Automatic MetricsSimulation Env General
Recent advances in Vision-Language Models (VLMs) have propelled embodied agents by enabling direct perception, reasoning, and planning task-oriented actions from visual inputs.
- Uncovering Grounding IDs: How External Cues Shape Multimodal Binding
Hosein Hasani, Amirmohammad Izadi, Fatemeh Askari, Mobin Bagherian, Sadegh Mohammadian · Sep 28, 2025
Automatic Metrics General
Large vision-language models (LVLMs) show strong performance across multimodal benchmarks but remain limited in structured reasoning and precise grounding.
- MedicalPatchNet: A Patch-Based Self-Explainable AI Architecture for Chest X-ray Classification
Patrick Wienholt, Christiane Kuhl, Jakob Nikolas Kather, Sven Nebelung, Daniel Truhn · Sep 9, 2025
Automatic Metrics MedicineCoding
Deep neural networks excel in radiological image classification but frequently suffer from poor interpretability, limiting clinical acceptance.
- HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning
Chuhao Zhou, Jianfei Yang · May 23, 2025
Automatic MetricsSimulation Env Coding
Embodied agents operating in smart homes must understand human behavior through diverse sensory inputs and communicate via natural language.