Open access
Author
Date
2021Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
One of the goals of artificial intelligence is to create machines that can think like humans. Deep learning has been at the core of the remarkable progress made towards this goal. Large artificial neural networks trained on massive datasets can master tasks across vastly different domains. Despite the progress on i.i.d. generalization --- i.e., when the training and test data are independently and identically distributed --- these models struggle when tested outside of the support of the training data. But can we even expect to generalize out of the training distribution? In certain contexts, yes, and one of the hallmarks of human intelligence is to use our causal understanding of the data generating process to correctly make inference out of distribution (o.o.d.). This thesis investigates four different assumptions and techniques to support o.o.d. generalization with deep learning.
(i) o.o.d. generalization via composition. By relying on the assumption of independence of mechanisms from the literature on causality, we learn a set of modular, reusable neural networks via competition of experts. These modules specialize and can be applied sequentially to account for novel combinations of transformations at test time;
(ii) o.o.d. generalization via invariances, where the training data has a mixture of invariant and spurious features, and only the invariances support generalization at test time.
We show that training neural networks with the arithmetic mean of gradients may lead to memorization and spurious features to emerge, while the geometric mean of the gradients suppresses them in favor of invariances;
(iii) o.o.d. generalization via symbolic expressions: where identifying the correct underlying symbolic equation, as commonly done in the sciences, allows making accurate predictions far from the training distribution.
We leverage large-scale pre-training to make a neural network learn to predict symbolic equations from a set of input-output observations, vastly outperforming state-of-the-art hand-designed approaches;
(iv) o.o.d. generalization via planning, a classic technique to reduce uncertainty by investing additional time and compute at test time to solve more complex instances of problems seen during training. We present a divide-and-conquer algorithm that builds on top of Monte Carlo Tree Search with neural policy and value functions. By recursively splitting the problem in half, horizons and their uncertainties get exponentially shorter as a function of planning depth, allowing the model to plan over much longer periods. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000517231Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETh ZurichSubject
Deep Learning; Generalization; Invariance; Planning; Reinforcement LearningOrganisational unit
09462 - Hofmann, Thomas / Hofmann, Thomas
More
Show all metadata
ETH Bibliography
yes
Altmetrics