Robust Image Classifiers Fail under Shifted Adversarial Perturbations

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Non-robustness of image classifiers to subtle, adversarial perturbations is a well-known failure mode. Defenses against such attacks are typically evaluated by measuring the error rate on perturbed versions of the natural test set, quantifying the worst-case performance within a specified perturbation budget. However, these evaluations often isolate specific perturbation types, underestimating the adaptability of real-world adversaries who can modify or compose attacks in unforeseen ways. In this work, we show that models considered robust to strong attacks, such as AutoAttack, can be compromised by a simple modification of the weaker FGSM attack, where the adversarial perturbation is slightly transformed prior to being added to the input. Despite the attack's simplicity, robust models that perform well against standard FGSM become vulnerable to this variant. These findings suggest that current defenses may generalize poorly beyond their assumed threat models and can achieve inflated robustness scores under narrowly defined evaluation settings.

Original languageEnglish
Title of host publicationDocEng 2025 - Proceedings of the 2025 ACM Symposium on Document Engineering
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400713514
DOIs
Publication statusPublished - 27 Aug 2025
Event25th ACM Symposium on Document Engineering, DocEng 2025 - Nottingham, United Kingdom
Duration: 2 Sep 20255 Sep 2025

Publication series

NameDocEng 2025 - Proceedings of the 2025 ACM Symposium on Document Engineering

Conference

Conference25th ACM Symposium on Document Engineering, DocEng 2025
Country/TerritoryUnited Kingdom
CityNottingham
Period2/09/255/09/25

Keywords

  • Adversarial Attacks
  • Adversarial purification
  • Diffusion models
  • Image Classification
  • Security

Fingerprint

Dive into the research topics of 'Robust Image Classifiers Fail under Shifted Adversarial Perturbations'. Together they form a unique fingerprint.

Cite this