On Learning and Geometry for Visual Localization and Mapping


Author / Producer

Date

2024

Publication Type

Doctoral Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Visual localization and mapping are important problems in Computer Vision with widespread use in many applications like Augmented Reality (AR) and Robotics. This problem has been extensively studied in the past decades, resulting in mature solutions based on correspondences across images, well-understood projective geometry, and 3D maps as sparse point clouds. Despite their complexity, such systems struggle with challenges that arise from real-world data. Deep learning offers a promising avenue to address these limitations and reach higher accuracy and robustness. One strain of research involves replacing specific components of the existing algorithms with Deep Neural Networks (DNNs). While this has led to notable performance improvements, it has also increased system complexity. Additionally, these gains are often constrained because the components are trained with proxy objectives that do not fully capture the ultimate goal of localization. Alternatively, some research has focused on developing simpler black-box DNNs trained end-to-end to replace these complex systems. They have the potential to learn stronger priors but have so far demonstrated limited generalization and interpretability. The balance between generalization and end-to-end training necessitates hybrid algorithms that effectively combine learning capacity with our existing knowledge of 3D geometry. In the first part of this thesis, we apply this hybrid design philosophy to the prevalent paradigm that is based on 3D maps. We introduce two new algorithms for mapping and localization, both based on the alignment of learned features across different views. To facilitate progress in this research area, we also introduce a new benchmark tailored for AR applications. In the second part, we explore the use of more compact and interpretable 2D maps also used by humans. We demonstrate that end-to-end training enables effectively learning to associate such maps with visual observations. We first develop a new algorithm for localizing images within a 2D semantic map. We then extend our approach to learn a new map representation optimized for visual localization. We introduce an algorithm to construct these 2D maps from visual inputs. Overall, this thesis makes a significant step towards localization and mapping algorithms that integrate robust data-driven priors about the real world.

Publication status

published

Editor

Contributors

Examiner : Pollefeys, Marc
Examiner : Cremers, Daniel
Examiner : Snavely, Noah
Examiner : Malisiewicz, Tomasz

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

computer vision; machine learning; 3D geometry

Organisational unit

03766 - Pollefeys, Marc / Pollefeys, Marc

Notes

Funding

Related publications and datasets