Inductive Bias of Neural Networks and Selected Applications


Loading...

Author / Producer

Date

2024

Publication Type

Doctoral Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

This thesis is concerned with the inductive bias of (deep) neural networks (NNs) NN_θ ∶ X → Y. For commonly used loss functionals L, a large set of functions f ∶ X → Y minimizes L equally well. Therefore, it is up to the learning method (e.g., a NN architecture, with certain hyper-parameters including a certain training algorithm) to choose one function among all functions with sufficiently small loss. The preference underlying this choice is an inductive bias. Our main theorem derives a regularization-functional P on function space that exactly mimics the preferences of ℓ2-regularized deep ReLU-NNs with sufficiently many neurons per layer. Interestingly, we can prove that this preference structure over functions cannot be mimicked by any shallow Gaussian process (GP). This result contrasts prominent results stating that other large-width limits of NNs are equivalent to GPs [Jacot et al., 2018, Neal, 1996]. We analyze the differences between “deep inductive biases” such as those we have characterized for deep ℓ2-regularized ReLU NNs (with various hyperparameters) and “shallow inductive biases” such as those of GPs (including certain large width limits of deep NNs). Both in the context of multi-task learning and in the context of uncertainty quantification, we can pinpoint these differences very precisely. From our main theory, we derived a lossless compression algorithm that can provably reduce the number of neurons of an ℓ2-regularized ReLU-NN without changing the function represented by the NN. In our numerical experiments, we can reduce the size of trained NNs by a factor of up to 100 almost without changing their predictions. We also extend the theory of a NN-based learning method (PD-NJ-ODE), which can forecast irregularly, incompletely, and noisily observed time series, and discuss its inductive bias. Further, we developed multiple NN-based improvements for combinatorial auctions. To this end, we introduced a NN architecture with an inductive bias tailored to monotonic value functions. Additionally, we introduced an uncertainty quantification method for NNs and used it for exploration in the spirit of Bayesian optimization. Moreover, we developed a new training algorithm that can deal with loss functionals, which enforce global linear inequalities.

Publication status

published

Editor

Contributors

Examiner : Teichmann, Josef
Examiner : Schmidt, Thorsten

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Deep Learning; inductive bias; Multi-task learning; Machine Learning; Machine learning (artificial intelligence); Machine Learning (stat.ML); neural networks; Deep ReLU neural networks; deep neural networks; Bayesian neural networks; artificial intelligence; regularization; generalization; Overparametrized Neural Networks; Large width limit; Infinite width limit; Uncertainty Quantification; uncertainty; Epistemic uncertainty; Model uncertainity; Size Reduction of Neural Networks; Market Design; auction; Auctions and market-based systems; optimization; Bayesian optimization; Black box optimisation; monotone regression; TIME SERIES ANALYSIS (MATHEMATICAL STATISTICS); time series; Neural ODE; stochastic filtering; forecasting and prediction; implicit regularization; gradient descent; spline; Deep learning architectures and techniques

Organisational unit

03845 - Teichmann, Josef / Teichmann, Josef check_circle

Notes

Funding

Related publications and datasets