The Curious Case of Benign Memorization


METADATA ONLY
Loading...

Date

2023

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric
METADATA ONLY

Data

Rights / License

Abstract

Despite the empirical advances of deep learning across a variety of learning tasks, our theoretical understanding of its success is still very restricted. One of the key challenges is the overparametrized nature of modern models, enabling complete overfitting of the data even if the labels are randomized, i.e. networks can completely \textit{memorize} all given patterns. While such a memorization capacity seems worrisome, in this work we show that under training protocols that include \textit{data augmentation}, neural networks learn to memorize entirely random labels in a benign way, i.e. they learn embeddings that lead to highly non-trivial performance under nearest neighbour probing. We demonstrate that deep models have the surprising ability to separate noise from signal by distributing the task of memorization and feature learning to different layers. As a result, only the very last layers are used for memorization, while preceding layers encode performant features which remain largely unaffected by the label noise. We explore the intricate role of the augmentations used for training and identify a memorization-generalization trade-off in terms of their diversity, marking a clear distinction to all previous works. Finally, we give a first explanation for the emergence of benign memorization by showing that \textit{malign} memorization under data augmentation is infeasible due to the insufficient capacity of the model for the increased sample size. As a consequence, the network is forced to leverage the correlated nature of the augmentations and as a result learns meaningful features. To complete the picture, a better theory of feature learning in deep neural networks is required to fully understand the origins of this phenomenon.

Publication status

published

Editor

Book title

The Eleventh International Conference on Learning Representations

Journal / series

Volume

Pages / Article No.

Publisher

OpenReview

Event

11th International Conference on Learning Representations (ICLR 2023)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

09462 - Hofmann, Thomas / Hofmann, Thomas check_circle

Notes

Funding

Related publications and datasets