Fast and Resolution-AgnosticDenoising Diffusion for Visual Content Generation and Computer Vision


Loading...

Author / Producer

Date

2024

Publication Type

Master Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Denoising Diffusion Probabilistic Models (DDPMs) have recently emerged as a transformative technology due to their ability to generate naturally-looking images with unprecedented quality and diversity. However, limitations still prevent them from being adapted to more applications than pure image generation. These models often generate images with fixed resolution and are not directly applicable to canvases of flexible sizes. Moreover, the iterative denoising nature of diffusion models also burdens computation resources and inference time. In this thesis, we investigate methods that (1) allow diffusion models to generate resolution-agnostic generation for canvases of varying sizes and even surfaces in 3D without expensive re-training and (2) speed up the generation with diffusion models by enabling one or a few-step generation. We demonstrate that these techniques allow diffusion models to be better incorporated in various computer vision tasks. For large-scale data generation for domain generalization, we develop a Multi-Resolution Latent Fusion technique that overcomes the limitation of latent diffusion models to generate semantically coherent small instances, improving downstream performance. For creative mesh painting, we develop a novel pipeline to combine one- and few-step Latent Consistency Models with multi-view content fusion, achieving view-consistent painting with good generation quality and fast speeds. Finally, we distill a state-of-the-art monocular depth estimation diffusion model for monocular depth estimation to enable one-step estimation, greatly improving runtime while not compromising on prediction quality.

Publication status

published

External links

Editor

Contributors

Examiner : Schindler, Konrad
Examiner : Obukhov, Anton

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

03886 - Schindler, Konrad / Schindler, Konrad check_circle

Notes

Funding

Related publications and datasets