Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures
dc.contributor.author
Eschenhagen, Runa
dc.contributor.author
Immer, Alexander
dc.contributor.author
Turner, Richard E.
dc.contributor.author
Schneider, Frank
dc.contributor.author
Hennig, Philipp
dc.contributor.editor
Oh, Alice
dc.contributor.editor
Naumann, Tristan
dc.contributor.editor
Globerson, Amir
dc.contributor.editor
Saenko, Kate
dc.contributor.editor
Hardt, Moritz
dc.contributor.editor
Levine, Sergey
dc.date.accessioned
2024-07-24T10:27:25Z
dc.date.available
2024-01-30T15:56:06Z
dc.date.available
2024-01-31T09:25:04Z
dc.date.available
2024-07-24T10:27:25Z
dc.date.issued
2024-07
dc.identifier.isbn
978-1-7138-9992-1
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/656652
dc.identifier.doi
10.3929/ethz-b-000656652
dc.description.abstract
The core components of many modern neural network architectures, such as transformers, convolutional, or graph neural networks, can be expressed as linear layers with $\textit{weight-sharing}$. Kronecker-Factored Approximate Curvature (K-FAC), a second-order optimisation method, has shown promise to speed up neural network training and thereby reduce computational costs. However, there is currently no framework to apply it to generic architectures, specifically ones with linear weight-sharing layers. In this work, we identify two different settings of linear weight-sharing layers which motivate two flavours of K-FAC -- $\textit{expand}$ and $\textit{reduce}$. We show that they are exact for deep linear networks with weight-sharing in their respective setting. Notably, K-FAC-reduce is generally faster than K-FAC-expand, which we leverage to speed up automatic hyperparameter selection via optimising the marginal likelihood for a Wide ResNet. Finally, we observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer. However, both variations are able to reach a fixed validation metric target in $50$-$75\%$ of the number of steps of a first-order reference run, which translates into a comparable improvement in wall-clock time. This highlights the potential of applying K-FAC to modern neural network architectures.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
Curran
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.subject
Machine Learning (cs.LG)
en_US
dc.subject
Machine Learning (stat.ML)
en_US
dc.subject
FOS: Computer and information sciences
en_US
dc.subject
Deep learning
en_US
dc.subject
second-order
en_US
dc.subject
Optimization
en_US
dc.subject
Natural gradient
en_US
dc.subject
fisher
en_US
dc.subject
Gauss-Newton
en_US
dc.subject
k-fac
en_US
dc.subject
weight-sharing
en_US
dc.title
Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures
en_US
dc.type
Conference Paper
dc.rights.license
In Copyright - Non-Commercial Use Permitted
ethz.book.title
Advances in Neural Information Processing Systems 36
en_US
ethz.pages.start
33624
en_US
ethz.pages.end
33655
en_US
ethz.size
32 p.
en_US
ethz.version.deposit
publishedVersion
en_US
ethz.event
37th Annual Conference on Neural Information Processing Systems (NeurIPS 2023)
en_US
ethz.event.location
New Orleans, LA, USA
en_US
ethz.event.date
December 10-16, 2023
en_US
ethz.notes
Poster presentation
en_US
ethz.identifier.wos
ethz.publication.place
Red Hook, NY
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09568 - Rätsch, Gunnar / Rätsch, Gunnar
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09568 - Rätsch, Gunnar / Rätsch, Gunnar
en_US
ethz.identifier.url
https://papers.nips.cc/paper_files/paper/2023/hash/6a6679e3d5b9f7d5f09cdb79a5fc3fd8-Abstract-Conference.html
ethz.relation.isNewVersionOf
10.48550/ARXIV.2311.00636
ethz.date.deposited
2024-01-30T15:56:06Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2024-01-31T09:25:09Z
ethz.rosetta.lastUpdated
2024-02-03T09:09:05Z
ethz.rosetta.exportRequired
true
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Kronecker-Factored%20Approximate%20Curvature%20for%20Modern%20Neural%20Network%20Architectures&rft.date=2024-07&rft.spage=33624&rft.epage=33655&rft.au=Eschenhagen,%20Runa&Immer,%20Alexander&Turner,%20Richard%20E.&Schneider,%20Frank&Hennig,%20Philipp&rft.isbn=978-1-7138-9992-1&rft.genre=proceeding&rft.btitle=Advances%20in%20Neural%20Information%20Processing%20Systems%2036
Files in this item
Publication type
-
Conference Paper [35895]