Generative Models for Image and Video Processing


Loading...

Author / Producer

Date

2021

Publication Type

Doctoral Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

The popularity of streaming services has significantly increased in recent years but so have the customers' expectations to regularly receive access to novel high-quality film content. Movie production, however, is still a very specialized and labour intensive process such that it is extremely challenging to efficiently produce novel high-quality content at scale. Therefore, this thesis focuses on developing machine learning methods with the goal of improving or simplifying a number of tasks that occur in movie production. The term 'Machine Learning' itself is widely used and combines many different subfields. In the scope of this work, we primarily focus on applications of generative models. For the purpose of this thesis, we coarsely structure the movie production pipeline into the following steps: (1) Capturing and Editing, (2) Post-Processing, (3) Streaming and Distribution, as well as (4) Quality-Assessment and propose machine learning models for tasks in each of these steps. Regarding capturing and editing, an application that has gained popularity recently is face swapping. This for example allows to replace a stand-in actor or stunt double with a celebrity, have long deceased actors appear in new content, or to portray a celebrity at a younger age. While currently employed solutions for achieving this are very labour intensive, we propose an algorithm for automatic neural face swapping in images and videos. Our progressively trained multi-way comb network is capable of rendering photo-realistic and temporally consistent results at megapixel resolution. In the context of post-processing, upscaling is frequently applied for different reasons. For new content, it allows to save computation time by doing the bulk of previous operations in low resolution and for old legacy content it allows to enhance the image quality. Especially in the case of legacy content where additional degradations such as blur and noise are present, a more general image enhancement technique would be desirable. To this end, we propose using normalizing flows to model the distribution of target content and use this distribution as a prior in a maximum a posteriori formulation. We present experimental results for several different degradations on datasets varying in complexity and show competitive results when compared with state-of-the-art approaches. Concerning distribution, streaming is starting to become the most common form of consuming media content. As a consequence, the significance of efficient compression schemes is growing. Therefore as a first step, we present a deep image compression method that is able to go from low bit-rates to near-lossless quality by leveraging normalizing flows to learn a bijective mapping from the image space to a latent representation. We demonstrate further advantages unique to our solution and compare our approach with state-of-the-art auto-encoder-based methods. In addition to this, we demonstrate how to leverage knowledge distillation to obtain equally capable image decoders on a subset of images, at a fraction of the original number of parameters. We develop a student decoder with a reduced model size by a factor of 20 and achieve a 50% reduction in decoding time. In the context of Quality Assessment, analyzing a viewer's behaviour is a common method to gain deeper insight into how audience engage with a movie plot. An important aspect is facial expressions, which can be related to the induced viewer's emotions. As a first step, we present a method for facial expression classification that can be used for audience understanding. We propose a deep generative model that learns to disentangle static and dynamic representations of data from unordered input. We demonstrate our method on synthetic and a real video data featuring various facial expressions. Overall, this thesis shows that generative models are powerful tools that can be applied on different tasks during movie productions, resulting in more efficient high-quality pipelines.

Publication status

published

Editor

Contributors

Examiner : Gross, Markus
Examiner : Smolic, Aljosa
Examiner : Schroers, Christopher

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Video compression; Image Compression; Face-Swapping; Image enhancement

Organisational unit

03420 - Gross, Markus / Gross, Markus check_circle

Notes

Funding

Related publications and datasets