Zur Kurzanzeige

dc.contributor.author
Singh, Sidak Pal
dc.contributor.author
Jaggi, Martin
dc.contributor.editor
Larochelle, Hugo
dc.contributor.editor
Ranzato, Marc'Aurelio
dc.contributor.editor
Hadsell, Raia
dc.contributor.editor
Balcan, Maria-Florina F.
dc.contributor.editor
Lin, Hsuan-Tien
dc.date.accessioned
2021-07-21T07:39:00Z
dc.date.available
2021-01-26T10:04:58Z
dc.date.available
2021-01-26T11:55:30Z
dc.date.available
2021-03-02T15:45:05Z
dc.date.available
2021-07-21T07:39:00Z
dc.date.issued
2021
dc.identifier.isbn
978-1-7138-2954-6
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/465579
dc.description.abstract
Combining different models is a widely used paradigm in machine learning applications. While the most common approach is to form an ensemble of models and average their individual predictions, this approach is often rendered infeasible by given resource constraints in terms of memory and computation, which grow linearly with the number of models. We present a layer-wise model fusion algorithm for neural networks that utilizes optimal transport to (soft-) align neurons across the models before averaging their associated parameters. We show that this can successfully yield "one-shot" knowledge transfer (i.e, without requiring any retraining) between neural networks trained on heterogeneous non-i.i.d. data. In both i.i.d. and non-i.i.d. settings, we illustrate that our approach significantly outperforms vanilla averaging, as well as how it can serve as an efficient replacement for the ensemble with moderate fine-tuning, for standard convolutional networks (like VGG11), residual networks (like ResNet18), and multi-layer perceptrons on CIFAR10, CIFAR100, and MNIST. Finally, our approach also provides a principled way to combine the parameters of neural networks with different widths, and we explore its application for model compression. The code is available at the following link, https://github.com/sidak/otfusion.
en_US
dc.language.iso
en
en_US
dc.publisher
Curran
en_US
dc.title
Model Fusion via Optimal Transport
en_US
dc.type
Conference Paper
dc.date.published
2020
ethz.book.title
Advances in Neural Information Processing Systems 33
en_US
ethz.pages.start
22045
en_US
ethz.pages.end
22055
en_US
ethz.event
34th Annual Conference on Neural Information Processing Systems (NeurIPS 2020)
en_US
ethz.event.location
Online
en_US
ethz.event.date
December 6-12, 2020
en_US
ethz.notes
Due to the Coronavirus (COVID-19) the conference was conducted virtually.
en_US
ethz.publication.place
Red Hook, NY
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09462 - Hofmann, Thomas / Hofmann, Thomas
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09462 - Hofmann, Thomas / Hofmann, Thomas
en_US
ethz.identifier.url
https://papers.nips.cc/paper/2020/hash/fb2697869f56484404c8ceee2985b01d-Abstract.html
ethz.date.deposited
2021-01-26T10:05:06Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2021-03-02T15:45:15Z
ethz.rosetta.lastUpdated
2022-03-29T10:33:38Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Model%20Fusion%20via%20Optimal%20Transport&rft.date=2021&rft.spage=22045&rft.epage=22055&rft.au=Singh,%20Sidak%20Pal&Jaggi,%20Martin&rft.isbn=978-1-7138-2954-6&rft.genre=proceeding&rft.btitle=Advances%20in%20Neural%20Information%20Processing%20Systems%2033
 Printexemplar via ETH-Bibliothek suchen

Dateien zu diesem Eintrag

DateienGrößeFormatIm Viewer öffnen

Zu diesem Eintrag gibt es keine Dateien.

Publikationstyp

Zur Kurzanzeige