Advances in Bayesian Model Selection and Uncertainty Estimation for Deep Learning
OPEN ACCESS
Loading...
Author / Producer
Date
2024
Publication Type
Doctoral Thesis
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Deep learning has achieved remarkable success across various fields, such as computer vision, natural language processing, and scientific problems, enabled by the ability of deep neural networks to learn complex patterns from large amounts of data. Yet, despite these advances, several key limitations that can hinder their application to real-world problems remain. These include the need for large amounts of labeled data, time- and cost-intensive model design and tuning, and overconfident predictions. This thesis explores how Bayesian methods and probabilistic principles can be leveraged to address these limitations by developing novel algorithms for deep learning that require less data and manual model tuning while also providing improved estimation of uncertainties. The developed methods are assessed on applications, which are in part solely enabled by these new algorithms.
A key focus of this thesis is Bayesian model selection, which provides a principled frame- work for automatically selecting hyperparameters and improving generalization to unseen examples. We introduce a scalable marginal likelihood estimation method for deep learning that enables the optimization of thousands of hyperparameters during training based on the training data alone. The method relies on the Laplace approximation, is significantly more efficient than traditional manual tuning, and scales to more hyperparameters. After train- ing, the marginal likelihood estimate further allows to select between models, for example, with different architectures. To further enhance scalability, we derive novel lower bounds to the Laplace approximation of the marginal likelihood that permit unbiased stochastic gradient estimation, paving the way for efficient hyperparameter optimization with stochas- tic gradient descent for large datasets and complex models. This presents a potential step towards optimizing neural networks end-to-end by using successful gradient-based optimization not only for weights but also hyperparameters.
Building upon this foundation, we demonstrate that the reach of Bayesian model selection extends beyond traditional hyperparameter optimization. We show that differentiable Laplace approximations can be used to learn invariances in deep neural networks during training without any supervision or prior knowledge directly from the training data, akin to automatic data augmentation. Further, we find that discrete Bayesian model selection can be used to probe representations for linguistic tasks, overcome limitations of existing methods, and resolve counter-intuitive prior results. We also introduce PathFA, a probabilistic pathway-based multimodal factor analysis that leverages prior biological knowledge to integrate transcriptomics and proteomics data. Due to automatic model selection, PathFA is effective for small sample cohorts, common in biomedical studies.
Complementing the work on Bayesian model selection, we explore methods for improving the predictive uncertainty estimates of deep learning models in terms of both epistemic and aleatoric uncertainty. First, we discuss how a linearized predictive naturally arises in Bayesian neural networks and can greatly improve the performance of existing inference methods, for example, by alleviating prior stability issues of Laplace approximations. We further extend Laplace approximations to heteroscedastic regression with deep neural networks, allowing for flexible and automatic quantification of both aleatoric and epistemic uncertainty. Using the same natural parameterization, we study the problem of causal inference in the case of heteroscedastic models, where we show identifiability and propose novel state-of-the-art estimators.
The methods developed in this thesis are implemented and documented in laplace-torch and are therefore ready to use for practitioners or can be extended by researchers.
Permanent link
Publication status
published
External links
Editor
Contributors
Book title
Journal / series
Volume
Pages / Article No.
Publisher
ETH Zurich
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
09568 - Rätsch, Gunnar / Rätsch, Gunnar