Show simple item record

dc.contributor.author
Song, Jie
dc.contributor.supervisor
Hilliges, Otmar
dc.contributor.supervisor
Schiele, Bernt
dc.contributor.supervisor
Van Gool, Luc
dc.date.accessioned
2021-04-12T04:58:41Z
dc.date.available
2021-04-10T22:37:58Z
dc.date.available
2021-04-12T04:58:41Z
dc.date.issued
2020
dc.identifier.uri
http://hdl.handle.net/20.500.11850/478108
dc.identifier.doi
10.3929/ethz-b-000478108
dc.description.abstract
Automatically understanding the body pose from camera inputs promotes many real-life applications such as human activity recognition, autonomous driving, assistant robotics and sport analysis. Hence, this highly demanding task has attracted great interest from the computer vision community for decades and particularly, it has seen extraordinary progress over the recent years. The success can be credited to two main factors: effective appearance modeling by deep neural networks and the accessibility of large-scale annotated datasets. However, the current systems are not flawless that still many challenging issues are left to be alleviated especially when people are in complex articulations or several instances stay close, occluding each other. We argue that incorporating prior knowledge like the inherent structure of our body into the network design is equally essential. To this end, in this thesis, we study how to design efficient algorithms to jointly optimize the parameters of deep feature extractors and also the probabilistic inference models which encode priors. First, we address the problem of single person 2D pose estimation. We present a deep structured model to explicitly incorporate skeletal priors into network design to regularize predictions and to enforce temporal consistency. The inference is conducted by an embedded layer performing message passing along the loopy spatio-temporal graph edges. The entire architecture is able to be trained in an end-to-end manner. Second, we study the challenging case with an unknown number of people depicted in the image. We explore to connect deep networks with graph decomposition into a jointly trainable framework, by introducing an unconstrained binary cubic formulation. The cycle consistency constraints are directly formulated in the objective function. This new optimization problem can be viewed as a Conditional Random Field where the cycle constraints are represented as high-order potentials. The parameters for the CRF inference are optimized together with the front-end feature extractor. The final part of the thesis concerns the estimation of 3D human pose. Combining the refinement capabilities of iterative gradient-based optimization techniques with the robustness of neural networks, we propose a method to lift 2D pose to 3D, in which the statistical 3D human shape model has been leveraged to regularize the lifting.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.title
Structured Articulated Human Pose Estimation with Neural Networks
en_US
dc.type
Doctoral Thesis
dc.rights.license
In Copyright - Non-Commercial Use Permitted
dc.date.published
2021-04-12
ethz.size
141 p.
en_US
ethz.code.ddc
DDC - DDC::0 - Computer science, information & general works::004 - Data processing, computer science
en_US
ethz.identifier.diss
27151
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02658 - Inst. Intelligente interaktive Systeme / Inst. Intelligent Interactive Systems::03979 - Hilliges, Otmar / Hilliges, Otmar
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02658 - Inst. Intelligente interaktive Systeme / Inst. Intelligent Interactive Systems::03979 - Hilliges, Otmar / Hilliges, Otmar
en_US
ethz.date.deposited
2021-04-10T22:38:09Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2021-04-12T04:58:54Z
ethz.rosetta.lastUpdated
2022-03-29T06:28:50Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Structured%20Articulated%20Human%20Pose%20Estimation%20with%20Neural%20Networks&rft.date=2020&rft.au=Song,%20Jie&rft.genre=unknown&rft.btitle=Structured%20Articulated%20Human%20Pose%20Estimation%20with%20Neural%20Networks
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record