MMVAE+: Enhancing the Generative Quality of Multimodal VAEs without Compromises
Loading...
Author / Producer
Date
2023
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
Data
Rights / License
Abstract
Multimodal VAEs have recently gained attention as efficient models for weakly-supervised generative learning with multiple modalities. However, all existing variants of multimodal VAEs are affected by a non-trivial trade-off between generative quality and generative coherence. In particular mixture-based models achieve good coherence only at the expense of sample diversity and a resulting lack of generative quality. We present a novel variant of the mixture-of-experts multimodal variational autoencoder that improves its generative quality, while maintaining high semantic coherence. We model shared and modality-specific information in separate latent subspaces, proposing an objective that overcomes certain dependencies on hyperparameters that arise for existing approaches with the same latent space structure. Compared to these existing approaches, we show increased robustness with respect to changes in the design of the latent space, in terms of the capacity allocated to modality-specific subspaces. We show that our model achieves both good generative coherence and high generative quality in challenging experiments, including more complex multimodal datasets than those used in previous works.
Permanent link
Publication status
published
External links
Editor
Book title
The Eleventh International Conference on Learning Representations
Journal / series
Volume
Pages / Article No.
Publisher
OpenReview
Event
11th International Conference on Learning Representations (ICLR 2023)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Multimodal Variational Autoencoder; Variational autoencoder; Multimodal Generative Learning
Organisational unit
09670 - Vogt, Julia / Vogt, Julia
02219 - ETH AI Center / ETH AI Center