Modeling Dynamic Hand-Object Interactions with Applications to Human-Robot Handovers


Loading...

Author / Producer

Date

2024

Publication Type

Doctoral Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Humans constantly grasp, manipulate, and move objects in their daily lives. Interactive systems aim to assist humans in performing these tasks, both in the real and virtual world. Building systems capable of understanding how humans interact with objects and generating such hand-object interactions could enable new applications, such as simulating human behavior for Embodied AI and human-robot interaction, producing animations in virtual reality settings, and augmenting pose estimation models with synthetic data during training. However, current methods for synthesizing human hand-object interaction either neglect the dynamic aspects of these interactions, such as static prediction of a hand-object grasp, or rely on ground-truth hand and object poses during inference. To make these models truly effective in assisting humans, we require scalable solutions that go beyond the static prediction of grasps. In the first part of this dissertation, we introduce two novel tasks for modeling dynamic hand-object interactions in 4D (3D space + 1D time). First, we introduce the problem of dynamic grasp synthesis, going beyond the static generation of hand-object grasps. Dynamic grasp synthesis involves learning to grasp rigid objects with a single human hand and move them to a 6D target pose. We approach this task using physical simulation and reinforcement learning. Our approach involves splitting the interaction into a grasping stage and a motion synthesis stage, and a general reward function that combines incentives for grasp stability and human-like grasping. Second, since humans frequently perform bi-manual manipulation with articulated objects, we extend our approach to include both hands and complex articulations, which requires more fine-grained grasping and coordination between two hands. To address this, we introduce a learning curriculum and expand the observation space. Our experiments demonstrate that our methods significantly outperform baselines on these novel tasks in terms of grasp stability and physics metrics. Building on this foundation, the second part of this dissertation explores the application of hand-object interaction synthesis to the challenge of human-to-robot handovers. This task remains challenging due to the difficulty of accurately simulating realistic human behavior. To address this, our approach integrates captured human-object motion data into a physical simulation environment, enabling the simulation of realistic human motions. We then introduce the first framework for end-to-end training of robotic handover policies using a simulated human-in-the-loop. Our system employs a two-stage student-teacher training framework that gradually learns to adapt to human motions. Our experiments show that this approach significantly outperforms existing learning-based solutions in both simulated and real-world settings. One key limitation of using captured data of human-object motions is the restricted amount of available data. To address this, we integrate our method for synthesizing hand-object interactions with the training of robotic policies for human-to-robot handovers. Specifically, we increase the diversity in training objects for the robot by 100x and create a large scale synthetic test set. Our experiments reveal that training the robot on a broader distribution of objects and human motions leads to improved success rates in grasping unseen objects in simulation compared to our previous work. Moreover, we find in a qualitative user study that users cannot distinguish robotic policies trained on purely synthetic human motions and those trained with real human motions. This dissertation demonstrates the capability of generating dynamic 4D hand-object interactions. This paves the way to better understand and assist humans in performing such interactions, which we showcase in the context of human-to-robot handovers. To that end, we use hand-object motions generated through our models for training handover policies in simulation and transferring them to the real system. This highlights the potential of synthetic data for scalable, human-aware robotic systems in the future.

Publication status

published

Editor

Contributors

Examiner : Coros, Stelian
Examiner : Hwangbo, Jemin
Examiner : Yi, Li

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Human-Robot Interaction; dexterous grasping; Reinforcement Learning; Robotics

Organisational unit

09620 - Coros, Stelian / Coros, Stelian check_circle

Notes

Funding

Related publications and datasets