Open access
Date
2020Type
- Conference Paper
ETH Bibliography
yes
Altmetrics
Abstract
Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the model may be hampered. We propose to use batch augmentation: replicating instances of samples within the same batch with different data augmentations. Batch augmentation acts as a regularizer and an accelerator, increasing both generalization and performance scaling for a fixed budget of optimization steps. We analyze the effect of batch augmentation on gradient variance and show that it empirically improves convergence for a wide variety of networks and datasets. Our results show that batch augmentation reduces the number of necessary SGD updates to achieve the same accuracy as the state-of-the-art. Overall, this simple yet effective method enables faster training and better generalization by allowing more computational resources to be used concurrently. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000462535Publication status
publishedExternal links
Book title
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Pages / Article No.
Publisher
IEEEEvent
Organisational unit
03950 - Hoefler, Torsten / Hoefler, Torsten
Notes
Due to the Coronavirus (COVID-19) the conference was conducted virtually.More
Show all metadata
ETH Bibliography
yes
Altmetrics