Deep Learning Beyond the Training Distribution

Parascandolo, Giambattista

doi:10.3929/ethz-b-000517231

Download

Full text (PDF, 30.18Mb)

Open access

Author

Parascandolo, Giambattista

Date

2021

Type

Doctoral Thesis

ETH Bibliography

yes

Altmetrics

Download

Full text (PDF, 30.18Mb)

Rights / license

In Copyright - Non-Commercial Use Permitted

Abstract

One of the goals of artificial intelligence is to create machines that can think like humans. Deep learning has been at the core of the remarkable progress made towards this goal. Large artificial neural networks trained on massive datasets can master tasks across vastly different domains. Despite the progress on i.i.d. generalization --- i.e., when the training and test data are independently and identically distributed --- these models struggle when tested outside of the support of the training data. But can we even expect to generalize out of the training distribution? In certain contexts, yes, and one of the hallmarks of human intelligence is to use our causal understanding of the data generating process to correctly make inference out of distribution (o.o.d.). This thesis investigates four different assumptions and techniques to support o.o.d. generalization with deep learning. (i) o.o.d. generalization via composition. By relying on the assumption of independence of mechanisms from the literature on causality, we learn a set of modular, reusable neural networks via competition of experts. These modules specialize and can be applied sequentially to account for novel combinations of transformations at test time; (ii) o.o.d. generalization via invariances, where the training data has a mixture of invariant and spurious features, and only the invariances support generalization at test time. We show that training neural networks with the arithmetic mean of gradients may lead to memorization and spurious features to emerge, while the geometric mean of the gradients suppresses them in favor of invariances; (iii) o.o.d. generalization via symbolic expressions: where identifying the correct underlying symbolic equation, as commonly done in the sciences, allows making accurate predictions far from the training distribution. We leverage large-scale pre-training to make a neural network learn to predict symbolic equations from a set of input-output observations, vastly outperforming state-of-the-art hand-designed approaches; (iv) o.o.d. generalization via planning, a classic technique to reduce uncertainty by investing additional time and compute at test time to solve more complex instances of problems seen during training. We present a divide-and-conquer algorithm that builds on top of Monte Carlo Tree Search with neural policy and value functions. By recursively splitting the problem in half, horizons and their uncertainties get exponentially shorter as a function of planning depth, allowing the model to plan over much longer periods. Show more

Permanent link

https://doi.org/10.3929/ethz-b-000517231

Publication status

published

External links

Search print copy at ETH Library

Contributors

Examiner: Hofmann, Thomas
Examiner: Schölkopf, Bernhard
Examiner: Bengio, Yoshua

Publisher

ETh Zurich

Subject

Deep Learning; Generalization; Invariance; Planning; Reinforcement Learning

Organisational unit

09462 - Hofmann, Thomas / Hofmann, Thomas

More

Show all metadata

ETH Bibliography

yes

Altmetrics

Research Collection

Search

Deep Learning Beyond the Training Distribution Mendeley CSV RIS BibTeX

Deep Learning Beyond the Training Distribution

Mendeley

CSV

RIS

BibTeX