Show simple item record

dc.contributor.author
Daunhawer, Imant
dc.contributor.author
Bizeul, Alice
dc.contributor.author
Palumbo, Emanuele
dc.contributor.author
Marx, Alexander
dc.contributor.author
Vogt, Julia E.
dc.date.accessioned
2024-02-21T11:39:05Z
dc.date.available
2024-01-04T15:12:19Z
dc.date.available
2024-02-21T11:27:42Z
dc.date.available
2024-02-21T11:39:05Z
dc.date.issued
2023
dc.identifier.uri
http://hdl.handle.net/20.500.11850/650463
dc.description.abstract
Contrastive learning is a cornerstone underlying recent progress in multi-view and multimodal learning, e.g., in representation learning with image/caption pairs. While its effectiveness is not yet fully understood, a line of recent work reveals that contrastive learning can invert the data generating process and recover ground truth latent factors shared between views. In this work, we present new identifiability results for multimodal contrastive learning, showing that it is possible to recover shared factors in a more general setup than the multi-view setting studied previously. Specifically, we distinguish between the multi-view setting with one generative mechanism (e.g., multiple cameras of the same type) and the multimodal setting that is characterized by distinct mechanisms (e.g., cameras and microphones). Our work generalizes previous identifiability results by redefining the generative process in terms of distinct mechanisms with modality-specific latent variables. We prove that contrastive learning can block-identify latent factors shared between modalities, even when there are nontrivial dependencies between factors. We empirically verify our identifiability results with numerical simulations and corroborate our findings on a complex multimodal dataset of image/text pairs. Zooming out, our work provides a theoretical basis for multimodal representation learning and explains in which settings multimodal contrastive learning can be effective in practice.
en_US
dc.language.iso
en
en_US
dc.publisher
OpenReview
en_US
dc.subject
multimodal learning
en_US
dc.subject
multi-view learning
en_US
dc.subject
contrastive learning
en_US
dc.subject
causal representation learning
en_US
dc.subject
nonlinear ica
en_US
dc.subject
identifiability
en_US
dc.title
Identifiability Results for Multimodal Contrastive Learning
en_US
dc.type
Conference Paper
ethz.book.title
The Eleventh International Conference on Learning Representations (ICLR 2023)
en_US
ethz.size
23 p.
en_US
ethz.event
11th International Conference on Learning Representations (ICLR 2023)
en_US
ethz.event.location
Kigali, Rwanda
en_US
ethz.event.date
May 1-5, 2023
en_US
ethz.grant
Machine Learning Methods for Clinical Data Analysis and Precision Medicine
en_US
ethz.publication.place
s.l.
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09670 - Vogt, Julia / Vogt, Julia
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09670 - Vogt, Julia / Vogt, Julia
en_US
ethz.identifier.url
https://iclr.cc/virtual/2023/poster/11159
ethz.identifier.url
https://openreview.net/forum?id=U_2kuqoTcB
ethz.grant.agreementno
188466
ethz.grant.fundername
SNF
ethz.grant.funderDoi
10.13039/501100001711
ethz.grant.program
Projekte MINT
ethz.relation.isNewVersionOf
10.48550/arXiv.2303.09166
ethz.date.deposited
2024-01-04T15:12:19Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2024-02-21T11:27:52Z
ethz.rosetta.lastUpdated
2024-02-21T11:27:52Z
ethz.rosetta.exportRequired
true
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Identifiability%20Results%20for%20Multimodal%20Contrastive%20Learning&rft.date=2023&rft.au=Daunhawer,%20Imant&Bizeul,%20Alice&Palumbo,%20Emanuele&Marx,%20Alexander&Vogt,%20Julia%20E.&rft.genre=proceeding&rft.btitle=The%20Eleventh%20International%20Conference%20on%20Learning%20Representations%20(ICLR%202023)
 Search print copy at ETH Library

Files in this item

FilesSizeFormatOpen in viewer

There are no files associated with this item.

Publication type

Show simple item record