Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Egocentric Activity
METADATA ONLY
Loading...
Author / Producer
Date
2020
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
We address the challenging task of anticipating human-object interaction in first person videos. Most existing methods either ignore how the camera wearer interacts with objects, or simply considers body motion as a separate modality. In contrast, we observe that the intentional hand movement reveals critical information about the future activity. Motivated by this observation, we adopt intentional hand movement as a feature representation, and propose a novel deep network that jointly models and predicts the egocentric hand motion, interaction hotspots and future action. Specifically, we consider the future hand motion as the motor attention, and model this attention using probabilistic variables in our deep model. The predicted motor attention is further used to select the discriminative spatial-temporal visual features for predicting actions and interaction hotspots. We present extensive experiments demonstrating the benefit of the proposed joint model. Importantly, our model produces new state-of-the-art results for action anticipation on both EGTEA Gaze+ and the EPIC-Kitchens datasets. Our project page is available at https://aptx4869lm.github.io/ForecastingHOI/.
Permanent link
Publication status
published
External links
Book title
Computer Vision – ECCV 2020
Journal / series
Volume
12346
Pages / Article No.
704 - 721
Publisher
Springer
Event
16th European Conference on Computer Vision (ECCV 2020) (virtual)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
First Person Vision; Action anticipation; Motor attention
Organisational unit
09686 - Tang, Siyu / Tang, Siyu
Notes
Due to the Coronavirus (COVID-19) the conference was conducted virtually.