Show simple item record

dc.contributor.author
Tselepidis, Nikolaos
dc.contributor.author
Kohler, Jonas
dc.contributor.author
Orvieto, Antonio
dc.date.accessioned
2021-03-04T13:46:04Z
dc.date.available
2021-01-21T10:02:04Z
dc.date.available
2021-03-04T13:46:04Z
dc.date.issued
2020-12-11
dc.identifier.uri
http://hdl.handle.net/20.500.11850/464429
dc.description.abstract
In the context of deep learning, many optimization methods use gradient covariance information in order to accelerate the convergence of Stochastic Gradient Descent. In particular, starting with Adagrad, a seemingly endless line of research advocates the use of diagonal approximations of the so-called empirical Fisher matrix in stochastic gradient-based algorithms, with the most prominent one arguably being Adam. However, in recent years, several works cast doubt on the theoretical basis of preconditioning with the empirical Fisher matrix, and it has been shown that more sophisticated approximations of the actual Fisher matrix more closely resemble the theoretically well-motivated Natural Gradient Descent. One particularly successful variant of such methods is the so-called K-FAC optimizer, which uses a Kronecker-factored block-diagonal Fisher approximation as preconditioner. In this work, drawing inspiration from two-level domain decomposition methods used as preconditioners in the field of scientific computing, we extend K-FAC by enriching it with off-diagonal (i.e. global) curvature information in a computationally efficient way. We achieve this by adding a coarse-space correction term to the preconditioner, which captures the global Fisher information matrix at a coarser scale. We present a small set of experimental results suggesting improved convergence behaviour of our proposed method.
en_US
dc.language.iso
en
en_US
dc.publisher
OPT 2020
en_US
dc.title
Two-Level K-FAC Preconditioning for Deep Learning
en_US
dc.type
Conference Paper
ethz.book.title
12th Annual Workshop on Optimization for Learning (OPT 2020). Accepted Papers
en_US
ethz.pages.start
63
en_US
ethz.size
12 p.
en_US
ethz.event
12th Annual Workshop on Optimization for Machine Learning (OPT 2020)
en_US
ethz.event.location
Online
en_US
ethz.event.date
December 11, 2020
en_US
ethz.notes
Poster presented on December 11, 2020. Due to the Coronavirus (COVID-19) the conference was conducted virtually.
en_US
ethz.publication.place
s.l.
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09462 - Hofmann, Thomas / Hofmann, Thomas
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09462 - Hofmann, Thomas / Hofmann, Thomas
en_US
ethz.identifier.url
https://opt-ml.org/papers.html
ethz.date.deposited
2021-01-21T10:02:12Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2021-03-04T13:46:14Z
ethz.rosetta.lastUpdated
2021-03-04T13:46:14Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Two-Level%20K-FAC%20Preconditioning%20for%20Deep%20Learning&rft.date=2020-12-11&rft.spage=63&rft.au=Tselepidis,%20Nikolaos&Kohler,%20Jonas&Orvieto,%20Antonio&rft.genre=proceeding&rft.btitle=12th%20Annual%20Workshop%20on%20Optimization%20for%20Learning%20(OPT%202020).%20Accepted%20Papers
 Search print copy at ETH Library

Files in this item

FilesSizeFormatOpen in viewer

There are no files associated with this item.

Publication type

Show simple item record