End-to-End Collision Avoidance from Depth Input with Memory-based Deep Reinforcement Learning
Open access
Author
Date
2019Type
- Master Thesis
ETH Bibliography
yes
Altmetrics
Abstract
The main goal of this work is learning a local path planning policy for mobile robots from a single depth camera input. We formulate the end-to-end local planning problem as a Partially Observable Markov Decision Process and solve it using a Deep Reinforcement Learning algorithm. The main challenges of this setting comes from 1) the short-sightedness of reaction-based planners, and 2) the limited field-of-view of depth camera that significantly degrades the planner’s performance. We resolve these problems by memory-based Deep Reinforcement Learning. This framework represents a policy as a network with a memory unit that can remember past observations. As a result, the trained policy can generate collision-safe trajectories based on not only a current observation but also previous observations. We also address sample ineciency of end-to-end learning by 1) a two-stream feature extraction with pre-trained autoencoder and 2) Asymmetric Actor-Critic method. These methods were demonstrated to be effective for fast convergence by our ablation study results. Finally we bridge the reality gap between real depth image and simulated depth image by real-time depth completion algorithm and pre-training autoencoder with both real images and simulate images. In the quantitative evaluation, our policy with memory units outperforms standard CNN policy. Notably, the policy with Temporal Convolutional layers learned much faster than the policy with conventional LSTM. In the following real robot experiments, we deployed the trained policy to the quadrupedal robot ANYmal with Intel RealSense depth camera. Our policy generated collision-safe paths reactively in both stationary and dynamic environments. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000444961Publication status
publishedPublisher
ETH ZurichSubject
Robotics; Collision avoidance; End-to-end Learning; Reinforcement learning; Mobile robotics; Sim-to-realOrganisational unit
09570 - Hutter, Marco / Hutter, Marco
More
Show all metadata
ETH Bibliography
yes
Altmetrics