We introduce a weakly supervised approach for learning human actions modeled as interactions between humans and objects. Our approach is human-centric: we first localize a human in the image and then determine the object relevant for the action and its spatial relation with the human. The model is learned automatically from a set of still images annotated only with the action label. Our approach relies on a human detector to initialize the model learning. For robustness to various degrees of visibility, we build a detector that learns to combine a set of existing part detectors. Starting from humans detected in a set of images depicting the action, our approach determines the action object and its spatial relation to the human. Its final output is a probabilistic model of the human-object interaction, i.e. the spatial relation between the human and the object. Show more
Journal / seriesRapport technique / Institut National de Recherche en Informatique et en Automatique, INRIA
SubjectAction recognition; Weak supervision
Organisational unit03803 - Ferrari, Vittorio (SNF-Professur)
NotesSee also http://e-citations.ethbib.ethz.ch/view/pub:62636.
MoreShow all metadata