Metadata only
Date
2022Type
- Conference Paper
ETH Bibliography
yes
Altmetrics
Abstract
Hundreds of defenses have been proposed to make deep neural networks robust against minimal (adversarial) input perturbations. However, only a handful of these defenses held up their claims because correctly evaluating robustness is extremely challenging: Weak attacks often fail to find adversarial examples even if they unknowingly exist, thereby making a vulnerable network look robust. In this paper, we propose a test to identify weak attacks, and thus weak defense evaluations. Our test slightly modifies a neural network to guarantee the existence of an adversarial example for every sample. Consequentially, any correct attack must succeed in breaking this modified network. For eleven out of thirteen previously-published defenses, the original evaluation of the defense fails our test, while stronger attacks that break these defenses pass it. We hope that attack unit tests - such as ours - will be a major component in future robustness evaluations and increase confidence in an empirical field that is currently riddled with skepticism. Show more
Publication status
publishedExternal links
Book title
Advances in Neural Information Processing Systems 35Pages / Article No.
Publisher
CurranEvent
Subject
Machine Learning; Adversarial ExamplesOrganisational unit
09764 - Tramèr, Florian / Tramèr, Florian
More
Show all metadata
ETH Bibliography
yes
Altmetrics