
Open access
Author
Date
2021Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Deep learning has seen tremendous growth, largely fueled by more powerful computers, the availability of ever-larger datasets and advances in software infrastructure, with deep neural networks setting a new state-of-the-art in virtually every task considered in machine learning. The empirical success of deep neural networks is undisputed, but there is still a big gap in our understanding of why these models work (when they do) and there are surprising ways in which they can fail miserably (when tampered with). This thesis investigates why powerful deep neural networks sometimes fail and what can be done to prevent this.
While deep neural networks are known to be robust to random noise, it has been shown that their accuracy can dramatically deteriorate in the face of so-called adversarial examples, i.e. small specifically crafted perturbations of the input signal, often imperceptible to humans, that are sufficient to induce large changes in the model output. This apparent vulnerability is worrisome as deep neural networks start to proliferate in the real-world, including in safety-critical deployments.
The most direct and popular strategy of robustification, called adversarial training, uses adversarial examples as data augmentation during training. We establish a theoretical link between adversarial training and operator norm regularization for deep neural networks. Specifically, we prove that $\ell_p$-norm constrained projected gradient ascent based adversarial training with an $\ell_q$-norm loss on the logits of clean and perturbed inputs is equivalent to data-dependent (p, q) operator norm regularization. This fundamental connection confirms the long-standing argument that a network's sensitivity to adversarial examples is tied to its spectral properties and hints at novel ways to robustify and defend against adversarial attacks.
We also propose a detection method that exploits certain anomalies that adversarial attacks introduce.
Specifically, we propose a method that measures how feature representations and log-odds change under noise: If the input is adversarially perturbed, the noise-induced feature variation tends to have a characteristic direction, whereas it tends not to have any specific direction if the input is natural. We evaluate our method against strong iterative attacks and show that even an adversary aware of the defense cannot evade our detector.
The go-to strategy to quantify adversarial vulnerability is to evaluate the model against specific attack algorithms. This approach is however inherently limited, as it says little about the robustness of the model against more powerful attacks not included in the evaluation. We develop a unified mathematical framework to describe relaxation-based robustness certification methods, which go beyond adversary-specific robustness evaluation and instead provide provable robustness guarantees against attacks by any adversary.
We also propose a new regularization approach to stabilize training of Generative Adversarial Networks (GANs). We show that training with noise or convolving densities is equivalent to gradient-based discriminator regularization, which gives rise to a smoother family of discriminants without having to explicitly add noise. The resulting regularizer is a simple yet effective modification of the GAN objective with low computational cost that yields a stable GAN training procedure.
We also study Bayesian neural networks (BNNs), which learn a distribution over model parameters or equivalently sample an ensemble of likely models, instead of optimizing a single network. Despite the promise of better generalization performance (no overfitting) and principled uncertainty quantification (robust predictions), however, the use of Bayesian neural networks has remained restrained.
We demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. On the other hand, we demonstrate that Bayes predictive performance can be improved significantly through the use of ``Cold Posteriors'' that overcount evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristics in Bayesian deep learning. Our findings cast doubt on the current understanding of Bayesian deep learning
and suggest that it is timely to focus on understanding the origin of the improved performance of cold posteriors. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000490303Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichSubject
Deep Learning; Deep Neural Networks; Artificial Intelligence; Adversarial Examples; Generative Adversarial Networks (GANs); Bayesian Neural Networks; Generative models; Robustness; RegularizationOrganisational unit
09462 - Hofmann, Thomas / Hofmann, Thomas
More
Show all metadata
ETH Bibliography
yes
Altmetrics