Distributed Gradient Preconditioning for Training Large-Scale Models

Baumann, Noah Andrés

doi:10.3929/ethz-b-000615331

Download

Full text (PDF, 3.094Mb)

Open access

Author

Baumann, Noah Andrés

Date

2023-05-02

Type

Master Thesis

ETH Bibliography

yes

Altmetrics

Download

Full text (PDF, 3.094Mb)

Rights / license

In Copyright - Non-Commercial Use Permitted

Abstract

Neural Networks (NNs) are getting deeper and more complicated to the point where single accelerator training is no longer an option. Training today’s state-of-the-art NNs is done in parallel over thousands of GPUs. Preconditioning-based optimizers are getting more attention in distributed training Show more

Permanent link

https://doi.org/10.3929/ethz-b-000615331

Publication status

published

Contributors

Examiner: Hoefler, Torsten
Examiner: Osawa, Kazuki
Examiner: Li, Shigang

Publisher

ETH Zurich

Subject

Artificial intelligence (AI); Deep Learning; High Performance Computing; Mathematical Optimization; Distributed algorithms; GPU

Organisational unit

03950 - Hoefler, Torsten / Hoefler, Torsten

Related publications and datasets

Is supplemented by: https://github.com/noabauma/3d-shampoo

More

Show all metadata

ETH Bibliography

yes

Altmetrics

Research Collection

Search

Distributed Gradient Preconditioning for Training Large-Scale Models Mendeley CSV RIS BibTeX

Distributed Gradient Preconditioning for Training Large-Scale Models

Mendeley

CSV

RIS

BibTeX