Open access
Author
Date
2022Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Humans possess a comprehensive set of interaction capabilities at various levels of abstraction including physical activities, verbal and non-verbal cues, and abstract communication skills to interact with the physical world, express ourselves, and communicate with others. In the quest of digitizing humans, we must seek answers to the problems of how to represent humans and how to establish human-like interactions on digital mediums. A critical issue is that human activities exhibit complex and rich dynamic behavior that is non-linear, time-varying, and context-dependent, which are quantities that are typically infeasible to rigorously define.
In this thesis, we are primarily interested in modeling complex processes like how humans look, move, and communicate and in generating novel samples that are similar to the ones performed by humans. To do so, we propose using the deep generative modeling framework, which is capable of learning the underlying data generation process directly from observations. Over the course of this thesis, we showcase generative modeling strategies at various levels of abstraction and demonstrate how they can be used to model humans and synthesize plausible and realistic interactions. Specifically, we present three problems that are different in modality and complexity, yet related in terms of the modeling strategies. We first introduce the task of modeling free-form human actions like drawings and handwritten text. Our work focuses on personalization and generalization concepts by learning latent representations of writing style or drawing content. Second, we present the 3D human motion modeling task, where we aim to learn spatio-temporal representations to capture motion dynamics for both accurate short-term and plausible long-term motion predictions. Finally, we focus on learning an expressive representation space for the synthesis and animation of photo-realistic face avatars. Our proposed model is able to create a personalized 3D avatar from rich training data and animate it via impoverished observations at runtime.
Our results in different tasks support our hypothesis that deep generative models are able to learn structured representations and capture human dynamics from unstructured observations. Accordingly, the contributions in this thesis aim to demonstrate that the deep generative modeling framework is a promising instrument, paving the way for digitizing humans. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000578473Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichSubject
Generative models; 3D motion analysis; Temporal modeling; 3D Human reconstruction; Neural networksOrganisational unit
03979 - Hilliges, Otmar / Hilliges, Otmar
More
Show all metadata
ETH Bibliography
yes
Altmetrics