A Spatio-temporal Transformer for 3D Human Motion Prediction
OPEN ACCESS
Loading...
Author / Producer
Date
2021
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
We propose a novel Transformer-based architecture for the task of generative modelling of 3D human motion. Previous work commonly relies on RNN-based models considering shorter forecast horizons reaching a stationary and often implausible state quickly. Recent studies show that implicit temporal representations in the frequency domain are also effective in making predictions for a predetermined horizon. Our focus lies on learning spatio-temporal representations autoregressively and hence generation of plausible future developments over both short and long term. The proposed model learns high dimensional embeddings for skeletal joints and how to compose a temporally coherent pose via a decoupled temporal and spatial selfattention mechanism. Our dual attention concept allows the model to access current and past information directly and to capture both the structural and the temporal dependencies explicitly. We show empirically that this effectively learns the underlying motion dynamics and reduces error accumulation over time observed in auto-regressive models. Our model is able to make accurate short-term predictions and generate plausible motion sequences over long horizons.
Permanent link
Publication status
published
Editor
Book title
2021 International Conference on 3D Vision (3DV)
Journal / series
Volume
Pages / Article No.
565 - 574
Publisher
IEEE
Event
9th International Conference on 3D Vision (3DV 2021)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
03979 - Hilliges, Otmar (ehemalig) / Hilliges, Otmar (former)
Notes
Conference lecture held on December 2, 2021
Funding
717054 - Optimization-based End-User Design of Interactive Technologies (EC)