- Doctoral Thesis
Rights / licenseIn Copyright - Non-Commercial Use Permitted
Robotic systems have shown impressive results at navigating in previously mapped areas, in particular in the domain of assisted (and autonomous) driving. As these systems do not perform physical interaction with the environment, the map representations are optimized for precise localization and not for rapidly changing scenes. Environment changes are only incorporated into these maps when observed repeatedly. On the other hand, when physical interaction between the robot and the environment is required, it is crucial that the map representation is at any time consistent with the world. For instance, the new location of a manipulated (or externally moved) object must be constantly updated in the map. In this thesis, we argue that object based maps are a more suitable map representation for this purpose. Our solutions build on the hypothesis that object based representations are able to deal with change and as they contain or gather knowledge about physical objects, they apprehend what parts of the environment can be jointly modified. This thesis aims to find such environment representations that are well suited for robotic mobile manipulation tasks. We start by creating a system that takes measurements from localized RGB-D cameras and integrates them into an instance based segmentation map. This is done by segmenting each incoming depth frame with a geometric approach into locally convex segments. These segments are integrated into a 3D voxel grid as a Truncated Signed DistanceField (TSDF) with an associated instance label. By updating these labels as new segments are integrated a consistent segmentation map is formed. Each segment is stored with its observed position in a 3D object model database, which represents the environment using object-like segments. But in addition to represent the environment, the database can be used to match and merge newly extracted map segments and complete the scene as repeating instances appear or if an instance has been observed in a previous session. To acquire such maps and to enable robots to interact with the environment, we show that it is beneficial to fuse information of multiple sensor modalities. For instance, cameras have shown to be a great source for creating sparse localization maps, whereas measurements from depth sensors can create dense reconstructions even in textureless regions of the environment. However, before using multiple sensors together, a challenging problem is to spatially and temporally align the sensor measurements. Hence, we focus on how to get robotic actuators and multiple sensors into a common place and time frame to allow the fusion of measurements and to let the robot act and interact in such a frame. We show how filtering and optimization techniques improve initial time-synchronizations and hand-eye calibrations. Next, we use the tools and techniques developed for the mapping and object discovery task in the context of manipulation. Using a set of rocks, we want to form vertical balancing towers with a robotic arm equipped with a wrist-mounted RGB-D camera. By identifying previously scanned rocks in a tabletop scene, we perform a set of simulation iterations using a physics engine with the detected objects to assess the stability of possible stack configurations. In a greedy manner, we select the next best rock to place and find and execute a grasping and placing motion. The segmentation map presented in this thesis allows to extract single geometric instances in a priori unknown environments. An incremental object database is built, which can match and merge re-observed or repeating object segments. These merged instances improve the raw extracted 3D models over time and, finally, the approach even allows to complete unobserved parts of the scene. Compelling results are exhibited in extracting and creating singulated object models from RGB-D images of household objects in cluttered warehouse distribution box scenes, furniture in indoor scenes, and cars in a parking garage. We show that our matching approach can be used to identify an object's pose in a scene accurately enough to solve delicate manipulation tasks. Together with a newly introduced greedy next best object target pose planning algorithm, we can stack stones to vertical balancing towers. We demonstrate that our new hand-eye calibration framework is applicable to many different robotic use cases. The integration of a time-alignment step takes away the burden of manually getting time-aligned pose sets, whereas filtering and optimization techniques improve calibration results in all evaluated datasets. Show more
External linksSearch print copy at ETH Library
SubjectRobotics; 3D Vision; Manipulation; Hand-eye calibration; Object modelling; Object matching
Organisational unit03737 - Siegwart, Roland Y. / Siegwart, Roland Y.
02284 - NFS Digitale Fabrikation / NCCR Digital Fabrication
MoreShow all metadata