Show simple item record

dc.contributor.author
Ntavelis, Evangelos
dc.contributor.supervisor
Van Gool, Luc
dc.contributor.supervisor
Kastanis, Iason
dc.contributor.supervisor
Timofte, Radu
dc.contributor.supervisor
Isola, Phillip
dc.contributor.supervisor
Tombari, Federico
dc.date.accessioned
2024-01-04T13:10:09Z
dc.date.available
2024-01-04T09:04:10Z
dc.date.available
2024-01-04T13:10:09Z
dc.date.issued
2023
dc.identifier.uri
http://hdl.handle.net/20.500.11850/650287
dc.identifier.doi
10.3929/ethz-b-000650287
dc.description.abstract
Recent advances in deep learning have enabled generative models to produce samples of unparalleled quality. The true value of these models, however, emerges from our ability to control them. Controllable synthesis and manipulation holds potential as a democratizing tool, enabling those without expert training to materialize creative concepts and revolutionizing various industries: entertainment, virtual and augmented reality, e-commerce and industrial design. This thesis offers four main contributions in this domain. Firstly, we present a semantic image editing pipeline, where the user only needs to provide semantic information of the region they want to edit to materialize their changes. We introduce a semantic inpainting generator and a novel two-stream conditional discriminator enabling local control and improved perceptual quality. Secondly, we design a Generative Adversarial Network(GAN) that can synthesize images of arbitrary-scales. We implement scale-consistent positional encodings and train a patch-based generator with novel inter-scale augmentations. Our model facilitates the generation of a continuum of scales, even ones unseen during training. Thirdly, we propose to sample the latent vector of GANs by concatenating a list of sub-vectors independently sampled from a collection of small learnable embedding codebooks. We show that our approach only uses a limited number of parameters to create a broad and versatile latent representation, while enabling intuitive latent-space exploration, superior disentanglement, and conditional sampling through a pretrained classifier. Lastly, we introduce a latent 3D diffusion model for synthesizing static and articulated 3D assets. At first, we learn a compact 3D representation by training a volumetric autodecoder to reconstruct multi-view images. Then, we train the latent diffusion model on the intermediate features of the autodecoder. We apply our approach on diverse multi-view image datasets of synthetic objects, real in-the-wild videos of moving people, and a large-scale, real video dataset of static objects. We perform both unconditional and text-driven generation; our approach is flexible enough to use either existing camera supervision or efficiently infer the camera parameters during training. To conclude, this thesis explores different approaches to controllable synthesis and manipulation of images and 3D assets. We hope that our contributions brings us a step closer to our vision of democratizing content creation and enabling human creativity.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.subject
Generative models
en_US
dc.subject
Image synthesis
en_US
dc.subject
3D generation
en_US
dc.subject
Image manipulation
en_US
dc.title
Generative models for controllable synthesis and manipulation in 2D and 3D
en_US
dc.type
Doctoral Thesis
dc.rights.license
In Copyright - Non-Commercial Use Permitted
dc.date.published
2024-01-04
ethz.size
171 p.
en_US
ethz.code.ddc
DDC - DDC::0 - Computer science, information & general works::000 - Generalities, science
en_US
ethz.identifier.diss
29701
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02652 - Institut für Bildverarbeitung / Computer Vision Laboratory::03514 - Van Gool, Luc / Van Gool, Luc
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02652 - Institut für Bildverarbeitung / Computer Vision Laboratory::03514 - Van Gool, Luc / Van Gool, Luc
en_US
ethz.date.deposited
2024-01-04T09:04:11Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2024-01-04T13:10:13Z
ethz.rosetta.lastUpdated
2024-02-03T08:35:39Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Generative%20models%20for%20controllable%20synthesis%20and%20manipulation%20in%202D%20and%203D&rft.date=2023&rft.au=Ntavelis,%20Evangelos&rft.genre=unknown&rft.btitle=Generative%20models%20for%20controllable%20synthesis%20and%20manipulation%20in%202D%20and%203D
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record