Adversarial Robustness of Deep Learning-Based Thyroid Nodule Segmentation in Ultrasound
Nicholas Dietrich, David McShannon
Abstract
Introduction: Deep learning-based segmentation models are increasingly integrated into clinical imaging workflows, yet their robustness to adversarial perturbations remains incompletely characterized, particularly for ultrasound images. We evaluated adversarial attacks and inference-time defenses for thyroid nodule segmentation in B-mode ultrasound. Methods: Two black-box adversarial attacks were developed: (1) Structured Speckle Amplification Attack (SSAA), which injects boundary-targeted noise, and (2) Frequency-Domain Ultrasound Attack (FDUA), which applies bandpass-filtered phase perturbations in the Fourier domain. Three inference-time mitigations were evaluated on adversarial images: randomized preprocessing with test-time augmentation, deterministic input denoising, and stochastic ensemble inference with consistency-aware aggregation. Experiments were conducted on a U-Net segmentation model trained on cine-clips from a database of 192 thyroid nodules. Results: The baseline model achieved a mean Dice similarity coefficient (DSC) of 0.76 (SD 0.20) on unperturbed images. SSAA reduced DSC by 0.29 (SD 0.20) while maintaining high visual similarity (SSIM = 0.94). FDUA resulted in a smaller DSC reduction of 0.11 (SD 0.09) with lower visual fidelity (SSIM = 0.82). Against SSAA, all three defenses significantly improved DSC after correction, with deterministic denoising showing the largest recovery (+0.10, p < 0.001), followed by randomized preprocessing (+0.09, p < 0.001), and stochastic ensemble inference (+0.08, p = 0.002). No defense achieved statistically significant improvement against FDUA. Conclusion: Spatial-domain adversarial perturbations in ultrasound segmentation showed partial mitigation with input preprocessing, whereas frequency-domain perturbations were not mitigated by the defenses, highlighting modality-specific challenges in adversarial robustness evaluation.
Full analysis loading… Code implementations, benchmark data, and reproduction guides are being assembled. Please check back shortly.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.