TY - GEN
T1 - Robust Image Classifiers Fail under Shifted Adversarial Perturbations
AU - Amerehi, Fatemeh
AU - Healy, Patrick
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/8/27
Y1 - 2025/8/27
N2 - Non-robustness of image classifiers to subtle, adversarial perturbations is a well-known failure mode. Defenses against such attacks are typically evaluated by measuring the error rate on perturbed versions of the natural test set, quantifying the worst-case performance within a specified perturbation budget. However, these evaluations often isolate specific perturbation types, underestimating the adaptability of real-world adversaries who can modify or compose attacks in unforeseen ways. In this work, we show that models considered robust to strong attacks, such as AutoAttack, can be compromised by a simple modification of the weaker FGSM attack, where the adversarial perturbation is slightly transformed prior to being added to the input. Despite the attack's simplicity, robust models that perform well against standard FGSM become vulnerable to this variant. These findings suggest that current defenses may generalize poorly beyond their assumed threat models and can achieve inflated robustness scores under narrowly defined evaluation settings.
AB - Non-robustness of image classifiers to subtle, adversarial perturbations is a well-known failure mode. Defenses against such attacks are typically evaluated by measuring the error rate on perturbed versions of the natural test set, quantifying the worst-case performance within a specified perturbation budget. However, these evaluations often isolate specific perturbation types, underestimating the adaptability of real-world adversaries who can modify or compose attacks in unforeseen ways. In this work, we show that models considered robust to strong attacks, such as AutoAttack, can be compromised by a simple modification of the weaker FGSM attack, where the adversarial perturbation is slightly transformed prior to being added to the input. Despite the attack's simplicity, robust models that perform well against standard FGSM become vulnerable to this variant. These findings suggest that current defenses may generalize poorly beyond their assumed threat models and can achieve inflated robustness scores under narrowly defined evaluation settings.
KW - Adversarial Attacks
KW - Adversarial purification
KW - Diffusion models
KW - Image Classification
KW - Security
UR - https://www.scopus.com/pages/publications/105015769540
U2 - 10.1145/3704268.3742694
DO - 10.1145/3704268.3742694
M3 - Conference contribution
AN - SCOPUS:105015769540
T3 - DocEng 2025 - Proceedings of the 2025 ACM Symposium on Document Engineering
BT - DocEng 2025 - Proceedings of the 2025 ACM Symposium on Document Engineering
PB - Association for Computing Machinery, Inc
T2 - 25th ACM Symposium on Document Engineering, DocEng 2025
Y2 - 2 September 2025 through 5 September 2025
ER -