Open access
Author
Date
2021Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
The popularity of streaming services has significantly increased in recent years but so have the customers' expectations to regularly receive access to novel high-quality film content. Movie production, however, is still a very specialized and labour intensive process such that it is extremely challenging to efficiently produce novel high-quality content at scale. Therefore, this thesis focuses on developing machine learning methods with the goal of improving or simplifying a number of tasks that occur in movie production. The term 'Machine Learning' itself is widely used and combines many different subfields. In the scope of this work, we primarily focus on applications of generative models.
For the purpose of this thesis, we coarsely structure the movie production pipeline into the following steps: (1) Capturing and Editing, (2) Post-Processing, (3) Streaming and Distribution, as well as (4) Quality-Assessment and propose machine learning models for tasks in each of these steps.
Regarding capturing and editing, an application that has gained popularity recently is face swapping. This for example allows to replace a stand-in actor or stunt double with a celebrity, have long deceased actors appear in new content, or to portray a celebrity at a younger age. While currently employed solutions for achieving this are very labour intensive, we propose an algorithm for automatic neural face swapping in images and videos. Our progressively trained multi-way comb network is capable of rendering photo-realistic and temporally consistent results at megapixel resolution.
In the context of post-processing, upscaling is frequently applied for different reasons. For new content, it allows to save computation time by doing the bulk of previous operations in low resolution and for old legacy content it allows to enhance the image quality. Especially in the case of legacy content where additional degradations such as blur and noise are present, a more general image enhancement technique would be desirable. To this end, we propose using normalizing flows to model the distribution of target content and use this distribution as a prior in a maximum a posteriori formulation. We present experimental results for several different degradations on datasets varying in complexity and show competitive results when compared with state-of-the-art approaches.
Concerning distribution, streaming is starting to become the most common form of consuming media content. As a consequence, the significance of efficient compression schemes is growing. Therefore as a first step, we present a deep image compression method that is able to go from low bit-rates to near-lossless quality by leveraging normalizing flows to learn a bijective mapping from the image space to a latent representation. We demonstrate further advantages unique to our solution and compare our approach with state-of-the-art auto-encoder-based methods.
In addition to this, we demonstrate how to leverage knowledge distillation to obtain equally capable image decoders on a subset of images, at a fraction of the original number of parameters. We develop a student decoder with a reduced model size by a factor of 20 and achieve a 50% reduction in decoding time.
In the context of Quality Assessment, analyzing a viewer's behaviour is a common method to gain deeper insight into how audience engage with a movie plot. An important aspect is facial expressions, which can be related to the induced viewer's emotions. As a first step, we present a method for facial expression classification that can be used for audience understanding. We propose a deep generative model that learns to disentangle static and dynamic representations of data from unordered input. We demonstrate our method on synthetic and a real video data featuring various facial expressions.
Overall, this thesis shows that generative models are powerful tools that can be applied on different tasks during movie productions, resulting in more efficient high-quality pipelines. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000528473Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichSubject
Video compression; Image Compression; Face-Swapping; Image enhancementOrganisational unit
03420 - Gross, Markus / Gross, Markus
More
Show all metadata
ETH Bibliography
yes
Altmetrics