Real-Time Monocular Dense Mapping and Localisation for Autonomous Aerial Navigation


Loading...

Author / Producer

Date

2019

Publication Type

Doctoral Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

On the quest of automating the navigation of small Unmanned Aerial Vehicles (UAVs), the research community has focused on developing perception capabilities able to run onboard such platforms in real-time. Even though computer vision algorithms have been at the heart of recent advancements, their realistic employment onboard UAVs is still in its infancy. Inspired by challenges in autonomous aerial navigation, this thesis presents a collection of monocular algorithms developed for SLAM, as well as a novel fully autonomous system for aerial inspection. With real-time dense mapping posing an interesting problem onboard UAVs, this thesis begins by introducing three approaches with very distinctive trade-offs, using either visual-only or visual-inertial SLAM to estimate the current UAV pose and the 3D landmarks in the UAV's surroundings. The first mapping approach builds a 3D mesh of the local scene using the SLAM landmarks employing Delaunay Triangulation and a novel approach for geometric-based denoising and smoothing of this mesh. This algorithm is very efficient, consuming seven milliseconds in only one core of the onboard computer's CPU. Interpolated landmarks or a dense depth image of the current UAV's surroundings can be obtained by sampling the resulting mesh. Nevertheless, this mesh is only a rough representation of the underlying structure, working best in mostly planar scenes, while it is not capable of capturing details such small and thin structures (e.g. lampposts and tree leaves). For demonstrating the power of this mapping approach, this thesis also presents a pipeline to enable fully autonomous flights, which employ the mesh-based mapping algorithm and a novel aerial planner for inspection in unknown environments. This system is shown to achieve superior performance in large scale outdoor experiments. Inspired by the downsides of the meshing approach, the second mapping approach presented in this thesis is based on motion stereo, in essence, mimicking the visual output of a stereo camera by using two images from a moving monocular camera mounted on a UAV, informed with their correspondent SLAM camera poses. Employing superpixel-based extrapolation that consults the image intensities to guide the enhancement of the motion-stereo mapping, this real-time approach produces a denser and more accurate map of the environment than the mesh-based one, albeit at the expense of increased computational cost and slightly constrained camera motion to enable stereo-like scene views. Aiming to leverage the substantial body of literature in learning-based techniques for depth estimation, the last mapping approach presented here is based on a deep-learning approach. Similarly to the mesh-based approach, this method extrapolates the scene-depth of the field-of-view of the camera using the current SLAM landmarks and, similarly to the motion-stereo-based approach, it consults the colour values in the image to guide this extrapolation, this time eliminating the need for constraining the camera motion in any way. This approach uses convolutional neural networks to extrapolate the scene-depth, and at the same time, it predicts the confidence in these depth estimates. The main downside of this approach is that it requires an onboard GPU, so that the extra payload can impact the aircraft's flight time. With dense onboard scene-mapping promising to provide real-time information useful for obstacle avoidance, close-interaction of the UAV with its surroundings, and more generally, path-planning, inspired by the applications in aerial inspection, this thesis presents a novel localisation approach for UAVs employed in such scenarios. Namely, a visual-inertial relative pose estimation approach is presented, designed for aerial manipulation and close-range collaboration between robots. Using a master-slave framework, real-time tracking of one UAV (master) carrying a known constellation of LED markers is achieved by employing a second UAV (slave) carrying a camera and an inertial sensor able to view the master. In this way, autonomous aerial navigation can be achieved, as the slave can operate close to the structure of interest, potentially experiencing degraded vision, using the master as reference. With the focus on robust approaches to dense mapping and localisation that can aid autonomous aerial navigation, the approaches presented in this thesis have been evaluated on challenging indoor and outdoor scenarios and benchmarked against the state-of-the-art. While open questions remain concerning the ability of current systems to cope with the plethora of uncertainties in real missions, this thesis demonstrates that monocular sensing for localisation and mapping has a lot to offer in aerial navigation, demonstrating unprecedented robustness and reliability in a variety of challenging scenarios.

Publication status

published

Editor

Contributors

Examiner: Chli, Margarita
Examiner : Siegwart, Roland
Examiner : Civera, Javier

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Computer Vision; 3D reconstruction

Organisational unit

09559 - Chli, Margarita (ehemalig) / Chli, Margarita (former) check_circle

Notes

Funding

Related publications and datasets