Dense object-level robotic mapping
OPEN ACCESS
Author / Producer
Date
2022
Publication Type
Doctoral Thesis
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Autonomous robots operating in unstructured real-world settings cannot rely on an a priori map of their surroundings to support navigation and interaction planning; they must perceive the environment and reconstruct their own internal model of the surrounding space. The more sophisticated the task to automate is, the more expressive the acquired model needs to be. Specifically, robots that are to interact with their environment in meaningful ways require maps that extend beyond the traditional monolithic reconstruction of the observed scene geometry - they require maps that enable reasoning about the individual objects in the scene.
This thesis addresses the need for richer and more functional environment models by exploring a novel object-level mapping paradigm. The proposed perception pipeline reconstructs dense environment maps augmented with an understanding of the individual semantically meaningful objects found in the scene. The present research contributes to the topic of dense mapping at the level of objects in unstructured settings in two important ways.
The first contribution of this thesis focuses on building volumetric object-centric maps of the environment in an online, incremental fashion during scanning with a localized RGB-D camera. The proposed pipeline processes incoming frames to identify and segment individual object instances therein and fuses the resulting segmentation information into an incrementally built Truncated Signed Distance Field (TSDF) volume that densely reconstructs the observed scene geometry. The segmentation scheme deployed at each frame combines learning-based instance-aware semantic segmentation with a geometry-based convexity analysis of depth images. Such an approach makes it possible to segment semantically recognized objects from a pre-defined set of classes, as well as unknown object-like elements from previously unseen categories which are equally relevant for interaction planning in arbitrary real-world settings. Experimental evaluation within a real-world robotic setup demonstrates the ability of the proposed framework to reconstruct environment models that densely describe the geometry of the scene and contain information about the shape and pose of the individual objects therein. Further, the system achieves state-of-the-art 3D instance-aware semantic segmentation performance on a public real-world indoor dataset, while additionally being able to discover novel objects of unknown class and arbitrary shape.
The second part of this thesis extends the proposed object-level mapping paradigm to dynamic scenes to enable simultaneous tracking and reconstruction of multiple rigid objects of arbitrary shape moving in the foreground. The same geometric-semantic per-frame segmentation scheme is deployed at each incoming RGB-D image to identify individual object instances, and the 6 Degrees of Freedom (DoF) pose of each object is tracked in 3D space via point-to-plane Iterative Closest Point (ICP). A core contribution of this work is a novel object-aware volumetric map representation that can store at each voxel more than one implicit object surface. The first benefit of the proposed formulation is the ability to reconstruct the entire scene and all the objects therein within a single volume. Secondly, and more importantly, the novel map representation allows maintaining accurate surface reconstructions throughout occlusions caused by moving nearby objects. Experiments confirm that the proposed framework can successfully track the pose of multiple moving objects while simultaneously reconstructing their shape, and verify that the novel object-aware volumetric map formulation offers robustness to surface occlusions.
In all, this thesis validates the central hypothesis that physical objects provide the optimal functional unit for a high-level map of the environment. In the context of dense 3D reconstruction, object-awareness enables reasoning about the shape and pose of individual objects in the scene for autonomously planning high-level interaction tasks. In environments that exhibit dynamics, an object-oriented map representation facilitates tracking and reconstruction of multiple moving objects while addressing challenges such as surface occlusion. The proposed object-level mapping paradigm can both enrich existing methods and give rise to new robotic perception capabilities. Ultimately, the presented results have implications for allowing robots to venture further into the unstructured and ever evolving real world.
Permanent link
Publication status
published
External links
Editor
Contributors
Examiner : Siegwart, Roland
Examiner : Leutenegger, Stefan
Book title
Journal / series
Volume
Pages / Article No.
Publisher
ETH Zurich
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Semantic scene understanding; Computer vision; 3D Vision; 3D Reconstruction; Robotics
Organisational unit
03737 - Siegwart, Roland Y. (emeritus) / Siegwart, Roland Y. (emeritus)