MMVAE+: Enhancing the Generative Quality of Multimodal VAEs without Compromises


Loading...

Date

2023

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Multimodal VAEs have recently gained attention as efficient models for weakly-supervised generative learning with multiple modalities. However, all existing variants of multimodal VAEs are affected by a non-trivial trade-off between generative quality and generative coherence. In particular mixture-based models achieve good coherence only at the expense of sample diversity and a resulting lack of generative quality. We present a novel variant of the mixture-of-experts multimodal variational autoencoder that improves its generative quality, while maintaining high semantic coherence. We model shared and modality-specific information in separate latent subspaces, proposing an objective that overcomes certain dependencies on hyperparameters that arise for existing approaches with the same latent space structure. Compared to these existing approaches, we show increased robustness with respect to changes in the design of the latent space, in terms of the capacity allocated to modality-specific subspaces. We show that our model achieves both good generative coherence and high generative quality in challenging experiments, including more complex multimodal datasets than those used in previous works.

Publication status

published

Editor

Book title

The Eleventh International Conference on Learning Representations

Journal / series

Volume

Pages / Article No.

Publisher

OpenReview

Event

11th International Conference on Learning Representations (ICLR 2023)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Multimodal Variational Autoencoder; Variational autoencoder; Multimodal Generative Learning

Organisational unit

09670 - Vogt, Julia / Vogt, Julia check_circle
02219 - ETH AI Center / ETH AI Center

Notes

Funding

Related publications and datasets