Advances in Bayesian Model Selection and Uncertainty Estimation for Deep Learning

Open access
Author
Date
2024Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Deep learning has achieved remarkable success across various fields, such as computer vision, natural language processing, and scientific problems, enabled by the ability of deep neural networks to learn complex patterns from large amounts of data. Yet, despite these advances, several key limitations that can hinder their application to real-world problems remain. These include the need for large amounts of labeled data, time- and cost-intensive model design and tuning, and overconfident predictions. This thesis explores how Bayesian methods and probabilistic principles can be leveraged to address these limitations by developing novel algorithms for deep learning that require less data and manual model tuning while also providing improved estimation of uncertainties. The developed methods are assessed on applications, which are in part solely enabled by these new algorithms.
A key focus of this thesis is Bayesian model selection, which provides a principled frame- work for automatically selecting hyperparameters and improving generalization to unseen examples. We introduce a scalable marginal likelihood estimation method for deep learning that enables the optimization of thousands of hyperparameters during training based on the training data alone. The method relies on the Laplace approximation, is significantly more efficient than traditional manual tuning, and scales to more hyperparameters. After train- ing, the marginal likelihood estimate further allows to select between models, for example, with different architectures. To further enhance scalability, we derive novel lower bounds to the Laplace approximation of the marginal likelihood that permit unbiased stochastic gradient estimation, paving the way for efficient hyperparameter optimization with stochas- tic gradient descent for large datasets and complex models. This presents a potential step towards optimizing neural networks end-to-end by using successful gradient-based optimization not only for weights but also hyperparameters.
Building upon this foundation, we demonstrate that the reach of Bayesian model selection extends beyond traditional hyperparameter optimization. We show that differentiable Laplace approximations can be used to learn invariances in deep neural networks during training without any supervision or prior knowledge directly from the training data, akin to automatic data augmentation. Further, we find that discrete Bayesian model selection can be used to probe representations for linguistic tasks, overcome limitations of existing methods, and resolve counter-intuitive prior results. We also introduce PathFA, a probabilistic pathway-based multimodal factor analysis that leverages prior biological knowledge to integrate transcriptomics and proteomics data. Due to automatic model selection, PathFA is effective for small sample cohorts, common in biomedical studies.
Complementing the work on Bayesian model selection, we explore methods for improving the predictive uncertainty estimates of deep learning models in terms of both epistemic and aleatoric uncertainty. First, we discuss how a linearized predictive naturally arises in Bayesian neural networks and can greatly improve the performance of existing inference methods, for example, by alleviating prior stability issues of Laplace approximations. We further extend Laplace approximations to heteroscedastic regression with deep neural networks, allowing for flexible and automatic quantification of both aleatoric and epistemic uncertainty. Using the same natural parameterization, we study the problem of causal inference in the case of heteroscedastic models, where we show identifiability and propose novel state-of-the-art estimators.
The methods developed in this thesis are implemented and documented in laplace-torch and are therefore ready to use for practitioners or can be extended by researchers. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000724968Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichOrganisational unit
09568 - Rätsch, Gunnar / Rätsch, Gunnar
More
Show all metadata
ETH Bibliography
yes
Altmetrics