Generative Models for Image and Video Processing
OPEN ACCESS
Loading...
Author / Producer
Date
2021
Publication Type
Doctoral Thesis
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
The popularity of streaming services has significantly increased in recent years but so have the customers' expectations to regularly receive access to novel high-quality film content. Movie production, however, is still a very specialized and labour intensive process such that it is extremely challenging to efficiently produce novel high-quality content at scale. Therefore, this thesis focuses on developing machine learning methods with the goal of improving or simplifying a number of tasks that occur in movie production. The term 'Machine Learning' itself is widely used and combines many different subfields. In the scope of this work, we primarily focus on applications of generative models.
For the purpose of this thesis, we coarsely structure the movie production pipeline into the following steps: (1) Capturing and Editing, (2) Post-Processing, (3) Streaming and Distribution, as well as (4) Quality-Assessment and propose machine learning models for tasks in each of these steps.
Regarding capturing and editing, an application that has gained popularity recently is face swapping. This for example allows to replace a stand-in actor or stunt double with a celebrity, have long deceased actors appear in new content, or to portray a celebrity at a younger age. While currently employed solutions for achieving this are very labour intensive, we propose an algorithm for automatic neural face swapping in images and videos. Our progressively trained multi-way comb network is capable of rendering photo-realistic and temporally consistent results at megapixel resolution.
In the context of post-processing, upscaling is frequently applied for different reasons. For new content, it allows to save computation time by doing the bulk of previous operations in low resolution and for old legacy content it allows to enhance the image quality. Especially in the case of legacy content where additional degradations such as blur and noise are present, a more general image enhancement technique would be desirable. To this end, we propose using normalizing flows to model the distribution of target content and use this distribution as a prior in a maximum a posteriori formulation. We present experimental results for several different degradations on datasets varying in complexity and show competitive results when compared with state-of-the-art approaches.
Concerning distribution, streaming is starting to become the most common form of consuming media content. As a consequence, the significance of efficient compression schemes is growing. Therefore as a first step, we present a deep image compression method that is able to go from low bit-rates to near-lossless quality by leveraging normalizing flows to learn a bijective mapping from the image space to a latent representation. We demonstrate further advantages unique to our solution and compare our approach with state-of-the-art auto-encoder-based methods.
In addition to this, we demonstrate how to leverage knowledge distillation to obtain equally capable image decoders on a subset of images, at a fraction of the original number of parameters. We develop a student decoder with a reduced model size by a factor of 20 and achieve a 50% reduction in decoding time.
In the context of Quality Assessment, analyzing a viewer's behaviour is a common method to gain deeper insight into how audience engage with a movie plot. An important aspect is facial expressions, which can be related to the induced viewer's emotions. As a first step, we present a method for facial expression classification that can be used for audience understanding. We propose a deep generative model that learns to disentangle static and dynamic representations of data from unordered input. We demonstrate our method on synthetic and a real video data featuring various facial expressions.
Overall, this thesis shows that generative models are powerful tools that can be applied on different tasks during movie productions, resulting in more efficient high-quality pipelines.
Permanent link
Publication status
published
External links
Editor
Contributors
Examiner : Gross, Markus
Examiner : Smolic, Aljosa
Examiner : Schroers, Christopher
Book title
Journal / series
Volume
Pages / Article No.
Publisher
ETH Zurich
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Video compression; Image Compression; Face-Swapping; Image enhancement
Organisational unit
03420 - Gross, Markus / Gross, Markus