Show simple item record

dc.contributor.author
Li, Shigang
dc.contributor.author
Ben-Nun, Tal
dc.contributor.author
Nadiradze, Giorgi
dc.contributor.author
Di Girolamo, Salvatore
dc.contributor.author
Dryden, Nikoli
dc.contributor.author
Alistarh, Dan
dc.contributor.author
Hoefler, Torsten
dc.date.accessioned
2021-03-09T09:33:54Z
dc.date.available
2021-03-09T06:04:19Z
dc.date.available
2021-03-09T09:33:54Z
dc.date.issued
2021-07-01
dc.identifier.issn
1045-9219
dc.identifier.issn
1558-2183
dc.identifier.issn
2161-9883
dc.identifier.other
10.1109/TPDS.2020.3040606
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/473500
dc.description.abstract
Deep learning at scale is dominated by communication time. Distributing samples across nodes usually yields the best performance, but poses scaling challenges due to global information dissemination and load imbalance across uneven sample lengths. State-of-the-art decentralized optimizers mitigate the problem, but require more iterations to achieve the same accuracy as their globally-communicating counterparts. We present Wait-Avoiding Group Model Averaging (WAGMA) SGD, a wait-avoiding stochastic optimizer that reduces global communication via subgroup weight exchange. The key insight is a combination of algorithmic changes to the averaging scheme and the use of a group allreduce operation. We prove the convergence of WAGMA-SGD, and empirically show that it retains convergence rates similar to Allreduce-SGD. For evaluation, we train ResNet-50 on ImageNet; Transformer for machine translation; and deep reinforcement learning for navigation at scale. Compared with state-of-the-art decentralized SGD variants, WAGMA-SGD significantly improves training throughput (e.g., 2.1× on 1,024 GPUs for reinforcement learning), and achieves the fastest time-to-solution (e.g., the highest score using the shortest training time for Transformer). © 1990-2012 IEEE.
en_US
dc.language.iso
en
en_US
dc.publisher
Institute of Electrical and Electronics Engineers
en_US
dc.subject
Stochastic gradient descent
en_US
dc.subject
distributed deep learning
en_US
dc.subject
decentralized optimization
en_US
dc.title
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging
en_US
dc.type
Journal Article
dc.date.published
2020-11-25
ethz.journal.title
IEEE Transactions on Parallel and Distributed Systems
ethz.journal.volume
32
en_US
ethz.journal.issue
7
en_US
ethz.journal.abbreviated
IEEE Trans. Parallel Distrib. Syst.
ethz.pages.start
1725
en_US
ethz.pages.end
1739
en_US
ethz.grant
DAPP: Data-Centric Parallel Programming
en_US
ethz.grant
Exascale Programming Models for Heterogeneous Systems
en_US
ethz.grant
Empowering Computational Science using Data-Centric Programming
en_US
ethz.identifier.wos
ethz.identifier.scopus
ethz.publication.place
New York, NY
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02666 - Institut für Hochleistungsrechnersysteme / Inst. f. High Performance Computing Syst::03950 - Hoefler, Torsten / Hoefler, Torsten
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02666 - Institut für Hochleistungsrechnersysteme / Inst. f. High Performance Computing Syst::03950 - Hoefler, Torsten / Hoefler, Torsten
ethz.grant.agreementno
678880
ethz.grant.agreementno
801039
ethz.grant.agreementno
185778
ethz.grant.fundername
EC
ethz.grant.fundername
EC
ethz.grant.fundername
SNF
ethz.grant.funderDoi
10.13039/501100000780
ethz.grant.funderDoi
10.13039/501100000780
ethz.grant.funderDoi
10.13039/501100001711
ethz.grant.program
H2020
ethz.grant.program
H2020
ethz.grant.program
Ambizione
ethz.date.deposited
2021-03-09T06:04:40Z
ethz.source
SCOPUS
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2021-03-09T09:34:05Z
ethz.rosetta.lastUpdated
2022-03-29T05:40:34Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Breaking%20(Global)%20Barriers%20in%20Parallel%20Stochastic%20Optimization%20with%20Wait-Avoiding%20Group%20Averaging&rft.jtitle=IEEE%20Transactions%20on%20Parallel%20and%20Distributed%20Systems&rft.date=2021-07-01&rft.volume=32&rft.issue=7&rft.spage=1725&rft.epage=1739&rft.issn=1045-9219&1558-2183&2161-9883&rft.au=Li,%20Shigang&Ben-Nun,%20Tal&Nadiradze,%20Giorgi&Di%20Girolamo,%20Salvatore&Dryden,%20Nikoli&rft.genre=article&rft_id=info:doi/10.1109/TPDS.2020.3040606&
 Search print copy at ETH Library

Files in this item

FilesSizeFormatOpen in viewer

There are no files associated with this item.

Publication type

Show simple item record