Yan Zhang
Loading...
18 results
Search Results
Publications 1 - 10 of 18
- Learning Motion Priors for 4D Human Body Capture in 3D ScenesItem type: Conference Paper
2021 IEEE/CVF International Conference on Computer Vision (ICCV)Zhang, Siwei; Zhang, Yan; Bogo, Federica; et al. (2021)Recovering high-quality 3D human motion in complex scenes from monocular videos is important for many applications, ranging from AR/VR to robotics. However, capturing realistic human-scene interactions, while dealing with occlusions and partial views, is challenging; current approaches are still far from achieving compelling results. We address this problem by proposing LEMO: LEarning human MOtion priors for 4D human body capture. By leveraging the large-scale motion capture dataset AMASS, we introduce a novel motion smoothness prior, which strongly reduces the jitters exhibited by poses recovered over a sequence. Furthermore, to handle contacts and occlusions occurring frequently in body-scene interactions, we design a contact friction term and a contact-aware motion infiller obtained via per-instance self-supervised training. To prove the effectiveness of the proposed motion priors, we combine them into a novel pipeline for 4D human body capture in 3D scenes. With our pipeline, we demonstrate high-quality 4D human body capture, reconstructing smooth motions and physically plausible body-scene interactions. The code and data are available at https://sanweiliti.github.io/LEMO/LEMO.html. - We are More than Our Joints: Predicting how 3D Bodies MoveItem type: Conference Paper
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Zhang, Yan; Black, Michael J.; Tang, Siyu (2021)A key step towards understanding human behavior is the prediction of 3D human motion. Successful solutions have many applications in human tracking, HCI, and graphics. Most previous work focuses on predicting a time series of future 3D joint locations given a sequence 3D joints from the past. This Euclidean formulation generally works better than predicting pose in terms of joint rotations. Body joint locations, however, do not fully constrain 3D human pose, leaving degrees of freedom (like rotation about a limb) undefined. Note that 3D joints can be viewed as a sparse point cloud. Thus the problem of human motion prediction can be seen as a problem of point cloud prediction. With this observation, we instead predict a sparse set of locations on the body surface that correspond to motion capture markers. Given such markers, we fit a parametric body model to recover the 3D body of the person. These sparse surface markers also carry detailed information about human movement that is not present in the joints, increasing the naturalness of the predicted motions. Using the AMASS dataset, we train MOJO (More than Our JOints), which is a novel variational autoencoder with a latent DCT space that generates motions from latent frequencies. MOJO preserves the full temporal resolution of the input motion, and sampling from the latent frequencies explicitly introduces high-frequency components into the generated motion. We note that motion prediction methods accumulate errors over time, resulting in joints or markers that diverge from true human bodies. To address this, we fit the SMPL-X body model to the predictions at each time step, projecting the solution back onto the space of valid bodies, before propagating the new markers in time. Quantitative and qualitative experiments show that our approach produces state-of-the-art results and realistic 3D body animations. The code is available for research purposes at https://yz-cnsdqz.github.io/MOJO/MOJO.html - LEAP: Learning Articulated Occupancy of PeopleItem type: Conference Paper
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Mihajlovic, Marko; Zhang, Yan; Black, Michael J.; et al. (2021)Substantial progress has been made on modeling rigid 3D objects using deep implicit representations. Yet, extending these methods to learn neural models of human shape is still in its infancy. Human bodies are complex and the key challenge is to learn a representation that generalizes such that it can express body shape deformations for unseen subjects in unseen, highly-articulated, poses. To address this challenge, we introduce LEAP (LEarning Articulated occupancy of People), a novel neural occupancy representation of the human body. Given a set of bone transformations (i.e. joint locations and rotations) and a query point in space, LEAP first maps the query point to a canonical space via learned linear blend skinning (LBS) functions and then efficiently queries the occupancy value via an occupancy network that models accurate identity- and pose-dependent deformations in the canonical space. Experiments show that our canonicalized occupancy estimation with the learned LBS functions greatly improves the generalization capability of the learned occupancy representation across various human shapes and poses, outperforming existing solutions in all settings. - Generating 3D People in Scenes Without PeopleItem type: Conference Paper
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Zhang, Yan; Hassan, Mohamed; Neumann, Heiko; et al. (2020)We present a fully automatic system that takes a 3D scene and generates plausible 3D human bodies that are posed naturally in that 3D scene. Given a 3D scene without people, humans can easily imagine how people could interact with the scene and the objects in it. However, this is a challenging task for a computer as solving it requires that (1) the generated human bodies to be semantically plausible within the 3D environment (e.g. people sitting on the sofa or cooking near the stove), and (2) the generated human-scene interaction to be physically feasible such that the human body and scene do not interpenetrate while, at the same time, body-scene contact supports physical interactions. To that end, we make use of the surface-based 3D human model SMPL-X. We first train a conditional variational autoencoder to predict semantically plausible 3D human poses conditioned on latent scene representations, then we further refine the generated 3D bodies using scene constraints to enforce feasible physical interaction. We show that our approach is able to synthesize realistic and expressive 3D human bodies that naturally interact with 3D environment. We perform extensive experiments demonstrating that our generative framework compares favorably with existing methods, both qualitatively and quantitatively. We believe that our scene-conditioned 3D human generation pipeline will be useful for numerous applications; e.g. to generate training data for human pose estimation, in video games and in VR/AR. Our project page for data and code can be seen at: {https://vlg.inf.ethz.ch/projects/PSI/}. - EgoGen: An Egocentric Synthetic Data GeneratorItem type: Conference Paper
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Pollefeys, Marc; Tang, Siyu; Li, Gen; et al. (2024)Understanding the world in first-person view is fundamental in Augmented Reality (AR). This immersive perspective brings dramatic visual changes and unique challenges compared to third-person views. Synthetic data has empowered third-person-view vision models, but its application to embodied egocentric perception tasks remains largely unexplored. A critical challenge lies in simulating natural human movements and behaviors that effectively steer the embodied cameras to capture a faithful egocentric representation of the 3D world. To address this challenge, we introduce EgoGen, a new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks. At the heart of EgoGen is a novel human motion synthesis model that directly leverages egocentric visual inputs of a virtual human to sense the 3D environment. Combined with collision-avoiding motion primitives and a two-stage reinforcement learning approach, our motion synthesis model offers a closed-loop solution where the embodied perception and movement of the virtual human are seamlessly coupled. Compared to previous works, our model eliminates the need for a pre-defined global path, and is directly applicable to dynamic environments. Combined with our easy-to-use and scalable data generation pipeline, we demonstrate EgoGen’s efficacy in three tasks: mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views. EgoGen will be fully open-sourced, offering a practical solution for creating realistic egocentric training data and aiming to serve as a useful tool for egocentric computer vision research. - Structure of the mu-opioid receptor-G(i) protein complexItem type: Journal Article
NatureKoehl, Antoine; Hu, Hongli; Maeda, Shoji; et al. (2018) - A Gaussian Process-based Self-Organizing Incremental Neural NetworkItem type: Conference Paper
2019 International Joint Conference on Neural Networks (IJCNN)Wang, Xiaoyu; Casiraghi, Giona; Zhang, Yan; et al. (2019) - Higher-order models capture changes in controllability of temporal networksItem type: Journal Article
Journal of Physics: ComplexityZhang, Yan; Garas, Antonios; Scholtes, Ingo (2021)In many complex systems, elements interact via time-varying network topologies.Recent research shows that temporal correlations in the chronological ordering of interactions crucially influence network properties and dynamical processes.How these correlations affect our ability to control systems with time-varying interactions remains unclear. In this work, we use higher-order network models to extend the framework of structural controllability to temporal networks, where the chronological ordering of interactions gives rise to time-respecting paths with non-Markovian characteristics.We study six empirical data sets and show that non-Markovian characteristics of real systems can both increase or decrease the minimum time needed to control the whole system.With both empirical data and synthetic models, we further show that spectral properties of generalisations of graph Laplacians to higher-order networks can be used to analytically capture the effect of temporal correlations on controllability. Our work highlights that (i) correlations in the chronological ordering of interactions are an important source of complexity that significantly influences the controllability of temporal networks, and (ii) higher-order network models are a powerful tool to understand the temporal-topological characteristics of empirical systems. - EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted DevicesItem type: Conference Paper
Lecture Notes in Computer Science ~ Computer Vision – ECCV 2022Zhang, Siwei; Ma, Qianli; Zhang, Yan; et al. (2022)Understanding social interactions from egocentric views is crucial for many applications, ranging from assistive robotics to AR/VR. Key to reasoning about interactions is to understand the body pose and motion of the interaction partner from the egocentric view. However, research in this area is severely hindered by the lack of datasets. Existing datasets are limited in terms of either size, capture/annotation modalities, ground-truth quality, or interaction diversity. We fill this gap by proposing EgoBody, a novel large-scale dataset for human pose, shape and motion estimation from egocentric views, during interactions in complex 3D scenes. We employ Microsoft HoloLens2 headsets to record rich egocentric data streams (including RGB, depth, eye gaze, head and hand tracking). To obtain accurate 3D ground truth, we calibrate the headset with a multi-Kinect rig and fit expressive SMPL-X body meshes to multi-view RGB-D frames, reconstructing 3D human shapes and poses relative to the scene, over time. We collect 125 sequences, spanning diverse interaction scenarios, and propose the first benchmark for 3D full-body pose and shape estimation of the interaction partner from egocentric views. We extensively evaluate state-of-the-art methods, highlight their limitations in the egocentric scenario, and address such limitations leveraging our high-quality annotations. Data and code are available at https://sanweiliti.github.io/egobody/egobody.html. - Degrees of Freedom Matter: Inferring Dynamics from Point TrajectoriesItem type: Conference Paper
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Zhang, Yan; Prokudin, Sergey; Mihajlovic, Marko; et al. (2024)Understanding the dynamics of generic 3D scenes is fundamentally challenging in computer vision, essential in enhancing applications related to scene reconstruction, motion tracking, and avatar creation. In this work, we address the task as the problem of inferring dense, long-range motion of 3D points. By observing a set of point trajectories, we aim to learn an implicit motion field parameterized by a neural network to predict the movement of novel points within the same domain, without relying on any data-driven or scene-specific priors. To achieve this, our approach builds upon the recently introduced dynamic point field model [48] that learns smooth deformation fields between the canonical frame and individual observation frames. However, temporal consistency between consecutive frames is neglected, and the number of required parameters increases linearly with the sequence length due to per-frame modeling. To address these shortcomings, we exploit the intrinsic regularization provided by SIREN [53], and modify the input layer to produce a spatiotemporally smooth motion field. Additionally, we analyze the motion field Jacobian matrix, and discover that the motion degrees of freedom (DOFs) in an infinitesimal area around a point and the network hidden variables have different behaviors to affect the model's representational power. This enables us to improve the model representation capability while retaining the model compactness. Furthermore, to reduce the risk of overfitting, we introduce a regularization term based on the assumption of piece-wise motion smoothness. Our experiments assess the model's performance in predicting unseen point trajectories and its application in temporal mesh alignment with guidance. The results demonstrate its superiority and effectiveness. The code and data for the project are publicly available1.
Publications 1 - 10 of 18