
Open access
Author
Date
2024Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Many machine learning models suffer from the well-known catastrophic forgetting problem, that is, they perform worse on previous tasks after they are trained on a new task. In the field of continual learning, researchers investigate training algorithms and architectures that can produce models which work well on both the old and new tasks, similar to the capability of humans to continuously learn new and improve on existing knowledge and skills.
So far, continual learning models predominantly target non-sequential data such as images, while little attention has been applied to the input temporal sequences, e.g., video or motion. Furthermore, the ability of generative models to learn and generate temporal sequences in within a continual learning setup is relatively unexplored, since continual learning research is predominately done on classification tasks.
This thesis focuses on two aspects of generative learning in the domain of continual learning. The first is the investigation of biologically-inspired incremental learning models that can produce temporal sequences given a conditional input, specifically, generating different human motion actions in a continual learning setup. The second is in the area of transfer learning between multiple modalities (text and visual sensor events) in a generative model. The visual sensor events come from a biologically-inspired Dynamic Vision Sensor event camera sensor that outputs sparse spatio-temporal events.
The recent new latent diffusion architectures haven proven to be powerful generative models that can leverage pretrained components to reduce computational resources required to train a generative model by operating in latent space. This thesis presents a new sparse autoencoder that can encode sensor events to an informative latent representation. This latent space is then used by a new latent diffusion model on a text-to-events objective.
This work advances the state-of-the-art by demonstrating the first pipeline for generating event sequences from a text prompt describing a dynamic scene, concretely, a person performing a gesture.
The thesis contributions include training algorithms on a brain-inspired generative model for generating human motions in a continual learning scenario, investigations into motion curriculum training over a set of tasks, a model and training technique for an autoencoder for spatially and temporally sparse event frames, and a novel text-to-events model for synthetic event stream generation.
This work advances the state-of-the-art by demonstrating the first pipeline for generating event sequences from a text prompt describing a dynamic scene, concretely, a person performing a gesture.
The thesis contributions include training algorithms on a brain-inspired generative model for generating human motions in a continual learning scenario, investigations into motion curriculum training over a set of tasks, a model and training technique for an autoencoder for spatially and temporally sparse event frames, and a novel text-to-events model for synthetic event stream generation. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000676385Publication status
publishedExternal links
Search print copy at ETH Library
Contributors
Examiner: Liu, Shih-Chii
Examiner: Delbrück, Tobias
Examiner: Yanik, Mehmet Fatih

Examiner: Siegelmann, Hava
Publisher
ETH ZurichSubject
Machine Learning; Artificial Intelligence; Continual learning; Generative AI; sequential model; Event-based vision; human motion model; Variational autoencoder; diffusion modelOrganisational unit
08836 - Delbrück, Tobias (Tit.-Prof.)
Related publications and datasets
More
Show all metadata
ETH Bibliography
yes
Altmetrics