On Learning and Geometry for Visual Localization and Mapping
OPEN ACCESS
Author / Producer
Date
2024
Publication Type
Doctoral Thesis
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Visual localization and mapping are important problems in Computer Vision with widespread use in many applications like Augmented Reality (AR) and Robotics. This problem has been extensively studied in the past decades, resulting in mature solutions based on correspondences across images, well-understood projective geometry, and 3D maps as sparse point clouds. Despite their complexity, such systems struggle with challenges that arise from real-world data. Deep learning offers a promising avenue to address these limitations and reach higher accuracy and robustness.
One strain of research involves replacing specific components of the existing algorithms with Deep Neural Networks (DNNs). While this has led to notable performance improvements, it has also increased system complexity. Additionally, these gains are often constrained because the components are trained with proxy objectives that do not fully capture the ultimate goal of localization. Alternatively, some research has focused on developing simpler black-box DNNs trained end-to-end to replace these complex systems. They have the potential to learn stronger priors but have so far demonstrated limited generalization and interpretability. The balance between generalization and end-to-end training necessitates hybrid algorithms that effectively combine learning capacity with our existing knowledge of 3D geometry.
In the first part of this thesis, we apply this hybrid design philosophy to the prevalent paradigm that is based on 3D maps. We introduce two new algorithms for mapping and localization, both based on the alignment of learned features across different views. To facilitate progress in this research area, we also introduce a new benchmark tailored for AR applications. In the second part, we explore the use of more compact and interpretable 2D maps also used by humans. We demonstrate that end-to-end training enables effectively learning to associate such maps with visual observations. We first develop a new algorithm for localizing images within a 2D semantic map. We then extend our approach to learn a new map representation optimized for visual localization. We introduce an algorithm to construct these 2D maps from visual inputs. Overall, this thesis makes a significant step towards localization and mapping algorithms that integrate robust data-driven priors about the real world.
Permanent link
Publication status
published
External links
Editor
Contributors
Examiner : Pollefeys, Marc
Examiner : Cremers, Daniel
Examiner : Snavely, Noah
Examiner : Malisiewicz, Tomasz
Book title
Journal / series
Volume
Pages / Article No.
Publisher
ETH Zurich
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
computer vision; machine learning; 3D geometry
Organisational unit
03766 - Pollefeys, Marc / Pollefeys, Marc