Modeling Dynamic Hand-Object Interactions with Applications to Human-Robot Handovers
OPEN ACCESS
Loading...
Author / Producer
Date
2024
Publication Type
Doctoral Thesis
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Humans constantly grasp, manipulate, and move objects in their daily lives. Interactive systems aim to assist humans in performing these tasks, both in the real and virtual world. Building systems capable of understanding how humans interact with objects and generating such hand-object interactions could enable new applications, such as simulating human behavior for Embodied AI and human-robot interaction, producing animations in virtual reality settings, and augmenting pose estimation models with synthetic data during training. However, current methods for synthesizing human hand-object interaction either neglect the dynamic aspects of these interactions, such as static prediction of a hand-object grasp, or rely on ground-truth hand and object poses during inference. To make these models truly effective in assisting humans, we require scalable solutions that go beyond the static prediction of grasps.
In the first part of this dissertation, we introduce two novel tasks for modeling dynamic hand-object interactions in 4D (3D space + 1D time). First, we introduce the problem of dynamic grasp synthesis, going beyond the static generation of hand-object grasps. Dynamic grasp synthesis involves learning to grasp rigid objects with a single human hand and move them to a 6D target pose. We approach this task using physical simulation and reinforcement learning. Our approach involves splitting the interaction into a grasping stage and a motion synthesis stage, and a general reward function that combines incentives for grasp stability and human-like grasping. Second, since humans frequently perform bi-manual manipulation with articulated objects, we extend our approach to include both hands and complex articulations, which requires more fine-grained grasping and coordination between two hands. To address this, we introduce a learning curriculum and expand the observation space. Our experiments demonstrate that our methods significantly outperform baselines on these novel tasks in terms of grasp stability and physics metrics.
Building on this foundation, the second part of this dissertation explores the application of hand-object interaction synthesis to the challenge of human-to-robot handovers. This task remains challenging due to the difficulty of accurately simulating realistic human behavior. To address this, our approach integrates captured human-object motion data into a physical simulation environment, enabling the simulation of realistic human motions. We then introduce the first framework for end-to-end training of robotic handover policies using a simulated human-in-the-loop. Our system employs a two-stage student-teacher training framework that gradually learns to adapt to human motions. Our experiments show that this approach significantly outperforms existing learning-based solutions in both simulated and real-world settings. One key limitation of using captured data of human-object motions is the restricted amount of available data. To address this, we integrate our method for synthesizing hand-object interactions with the training of robotic policies for human-to-robot handovers. Specifically, we increase the diversity in training objects for the robot by 100x and create a large scale synthetic test set. Our experiments reveal that training the robot on a broader distribution of objects and human motions leads to improved success rates in grasping unseen objects in simulation compared to our previous work. Moreover, we find in a qualitative user study that users cannot distinguish robotic policies trained on purely synthetic human motions and those trained with real human motions.
This dissertation demonstrates the capability of generating dynamic 4D hand-object interactions. This paves the way to better understand and assist humans in performing such interactions, which we showcase in the context of human-to-robot handovers. To that end, we use hand-object motions generated through our models for training handover policies in simulation and transferring them to the real system. This highlights the potential of synthetic data for scalable, human-aware robotic systems in the future.
Permanent link
Publication status
published
External links
Editor
Contributors
Examiner : Coros, Stelian
Examiner : Hwangbo, Jemin
Examiner : Yi, Li
Book title
Journal / series
Volume
Pages / Article No.
Publisher
ETH Zurich
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Human-Robot Interaction; dexterous grasping; Reinforcement Learning; Robotics
Organisational unit
09620 - Coros, Stelian / Coros, Stelian