Exploiting Vision Encoder Vulnerabilities for Universal Adversarial Perturbations on Large Vision-Language Models
Hee-Seon Kim, Minbeom Kim, Seokil Ham, Changick Kim · Dec 11, 2024 · Citations: 0
How to use this page
Low trustUse this as background context only. Do not make protocol decisions from this page alone.
Best use
Background context only
What to verify
Read the full paper before copying any benchmark, metric, or protocol choices.
Evidence quality
Low
Derived from extracted protocol signals and abstract evidence.
Abstract
Large Vision-Language Models (LVLMs) have achieved remarkable performance on multimodal tasks but remain highly vulnerable to small adversarial perturbations in input images. Existing attacks typically target the vision encoder's final output embeddings, implicitly treating the encoder as a uniform attack surface, while a systematic analysis of which internal components are most vulnerable has remained largely unexplored. We show such analysis is essential, as adversarial vulnerability in LVLM vision encoders is structurally concentrated rather than uniformly distributed. Building on this, we propose Vision Encoder Vulnerable-Component-Targeted Universal Adversarial Perturbation (VEV-UAP), a task-agnostic and cost-efficient attack framework. Through a component- and layer-wise analysis of attention mechanisms, we identify the value components in middle layers as critical vulnerabilities that strongly influence downstream language model behavior. VEV-UAP selectively targets these components to generate a single universal perturbation shared across images, without involving textual inputs or the language model during optimization. Experiments across multiple LVLMs and tasks show VEV-UAP achieves state-of-the-art attack success rates with reduced computational overhead. Moreover, a single VEV-UAP transfers across LVLMs sharing the same vision encoder, even when paired with different language models, making it a practical framework for scalable robustness evaluation.