Model Fusion via Optimal Transport

Singh, Sidak Pal; Jaggi, Martin

Zur Kurzanzeige

dc.contributor.author

Singh, Sidak Pal

dc.contributor.author

Jaggi, Martin

dc.contributor.editor

Larochelle, Hugo

dc.contributor.editor

Ranzato, Marc'Aurelio

dc.contributor.editor

Hadsell, Raia

dc.contributor.editor

Balcan, Maria-Florina F.

dc.contributor.editor

Lin, Hsuan-Tien

dc.date.accessioned

2021-07-21T07:39:00Z

dc.date.available

2021-01-26T10:04:58Z

dc.date.available

2021-01-26T11:55:30Z

dc.date.available

2021-03-02T15:45:05Z

dc.date.available

2021-07-21T07:39:00Z

dc.date.issued

2021

dc.identifier.isbn

978-1-7138-2954-6

en_US

dc.identifier.uri

http://hdl.handle.net/20.500.11850/465579

dc.description.abstract

Combining different models is a widely used paradigm in machine learning applications. While the most common approach is to form an ensemble of models and average their individual predictions, this approach is often rendered infeasible by given resource constraints in terms of memory and computation, which grow linearly with the number of models. We present a layer-wise model fusion algorithm for neural networks that utilizes optimal transport to (soft-) align neurons across the models before averaging their associated parameters. We show that this can successfully yield "one-shot" knowledge transfer (i.e, without requiring any retraining) between neural networks trained on heterogeneous non-i.i.d. data. In both i.i.d. and non-i.i.d. settings, we illustrate that our approach significantly outperforms vanilla averaging, as well as how it can serve as an efficient replacement for the ensemble with moderate fine-tuning, for standard convolutional networks (like VGG11), residual networks (like ResNet18), and multi-layer perceptrons on CIFAR10, CIFAR100, and MNIST. Finally, our approach also provides a principled way to combine the parameters of neural networks with different widths, and we explore its application for model compression. The code is available at the following link, https://github.com/sidak/otfusion.

en_US

dc.language.iso

en

en_US

dc.publisher

Curran

en_US

dc.title

Model Fusion via Optimal Transport

en_US

dc.type

Conference Paper

dc.date.published

2020

ethz.book.title

Advances in Neural Information Processing Systems 33

en_US

ethz.pages.start

22045

en_US

ethz.pages.end

22055

en_US

ethz.event

34th Annual Conference on Neural Information Processing Systems (NeurIPS 2020)

en_US

ethz.event.location

Online

en_US

ethz.event.date

December 6-12, 2020

en_US

ethz.notes

Due to the Coronavirus (COVID-19) the conference was conducted virtually.

en_US

ethz.publication.place

Red Hook, NY

en_US

ethz.publication.status

published

en_US

ethz.leitzahl

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09462 - Hofmann, Thomas / Hofmann, Thomas

en_US

ethz.leitzahl.certified

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09462 - Hofmann, Thomas / Hofmann, Thomas

en_US

ethz.identifier.url

https://papers.nips.cc/paper/2020/hash/fb2697869f56484404c8ceee2985b01d-Abstract.html

ethz.date.deposited

2021-01-26T10:05:06Z

ethz.source

FORM

ethz.eth

yes

en_US

ethz.availability

Metadata only

en_US

ethz.rosetta.installDate

2021-03-02T15:45:15Z

ethz.rosetta.lastUpdated

2022-03-29T10:33:38Z

ethz.rosetta.versionExported

true

ethz.COinS

ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Model%20Fusion%20via%20Optimal%20Transport&rft.date=2021&rft.spage=22045&rft.epage=22055&rft.au=Singh,%20Sidak%20Pal&Jaggi,%20Martin&rft.isbn=978-1-7138-2954-6&rft.genre=proceeding&rft.btitle=Advances%20in%20Neural%20Information%20Processing%20Systems%2033

Printexemplar via ETH-Bibliothek suchen

Dateien zu diesem Eintrag

Dateien	Größe	Format	Im Viewer öffnen
Zu diesem Eintrag gibt es keine Dateien.

Publikationstyp

Conference Paper [34963]

Zur Kurzanzeige

Research Collection

Suche

Model Fusion via Optimal Transport Mendeley CSV RIS BibTeX

Dateien zu diesem Eintrag

Publikationstyp

Model Fusion via Optimal Transport

Mendeley

CSV

RIS

BibTeX