Inductive Bias of Neural Networks and Selected Applications
OPEN ACCESS
Loading...
Author / Producer
Date
2024
Publication Type
Doctoral Thesis
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
This thesis is concerned with the inductive bias of (deep) neural networks (NNs) NN_θ ∶ X → Y. For commonly used loss functionals L, a large set of functions f ∶ X → Y minimizes L equally well. Therefore, it is up to the learning method (e.g., a NN architecture, with certain hyper-parameters including a certain training algorithm) to choose one function among all functions with sufficiently small loss. The preference underlying this choice is an inductive bias. Our main theorem derives a regularization-functional P on function space that exactly mimics the preferences of ℓ2-regularized deep ReLU-NNs with sufficiently many neurons per layer. Interestingly, we can prove that this preference structure over functions cannot be mimicked by any shallow Gaussian process (GP). This result contrasts prominent results stating that other large-width limits of NNs are equivalent to GPs [Jacot et al., 2018, Neal, 1996].
We analyze the differences between “deep inductive biases” such as those we have characterized for deep ℓ2-regularized ReLU NNs (with various hyperparameters) and “shallow inductive biases” such as those of GPs (including certain large width limits of deep NNs). Both in the context of multi-task learning and in the context of uncertainty quantification, we can pinpoint these differences very precisely.
From our main theory, we derived a lossless compression algorithm that can provably reduce the number of neurons of an ℓ2-regularized ReLU-NN without changing the function represented by the NN. In our numerical experiments, we can reduce the size of trained NNs by a factor of up to
100 almost without changing their predictions.
We also extend the theory of a NN-based learning method (PD-NJ-ODE), which can forecast irregularly, incompletely, and noisily observed time series, and discuss its inductive bias.
Further, we developed multiple NN-based improvements for combinatorial auctions. To this end, we introduced a NN architecture with an inductive bias tailored to monotonic value functions. Additionally, we introduced an uncertainty quantification method for NNs and used it for exploration in the spirit of Bayesian optimization. Moreover, we developed a new training algorithm that can deal with loss functionals, which enforce global linear inequalities.
Permanent link
Publication status
published
External links
Editor
Contributors
Examiner : Teichmann, Josef
Examiner : Schmidt, Thorsten
Book title
Journal / series
Volume
Pages / Article No.
Publisher
ETH Zurich
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Deep Learning; inductive bias; Multi-task learning; Machine Learning; Machine learning (artificial intelligence); Machine Learning (stat.ML); neural networks; Deep ReLU neural networks; deep neural networks; Bayesian neural networks; artificial intelligence; regularization; generalization; Overparametrized Neural Networks; Large width limit; Infinite width limit; Uncertainty Quantification; uncertainty; Epistemic uncertainty; Model uncertainity; Size Reduction of Neural Networks; Market Design; auction; Auctions and market-based systems; optimization; Bayesian optimization; Black box optimisation; monotone regression; TIME SERIES ANALYSIS (MATHEMATICAL STATISTICS); time series; Neural ODE; stochastic filtering; forecasting and prediction; implicit regularization; gradient descent; spline; Deep learning architectures and techniques
Organisational unit
03845 - Teichmann, Josef / Teichmann, Josef