Search
Results
-
The Drunkard’s Odometry: Estimating Camera Motion in Deforming Scenes
(2024)Advances in Neural Information Processing Systems 36Estimating camera motion in deformable scenes poses a complex and open research challenge. Most existing non-rigid structure from motion techniques assume to observe also static scene parts besides deforming scene parts in order to establish an anchoring reference. However, this assumption does not hold true in certain relevant application cases such as endoscopies. Deformable odometry and SLAM pipelines, which tackle the most challenging ...Conference Paper -
SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic Understanding
(2024)Advances in Neural Information Processing Systems 36Semantic 2D maps are commonly used by humans and machines for navigation purposes, whether it's walking or driving. However, these maps have limitations: they lack detail, often contain inaccuracies, and are difficult to create and maintain, especially in an automated fashion. Can we use raw imagery to automatically create better maps that can be easily interpreted by both humans and machines? We introduce SNAP, a deep network that learns ...Conference Paper -
OpenMask3D: Open-Vocabulary 3D Instance Segmentation
(2024)Advances in Neural Information Processing Systems 36We introduce the task of open-vocabulary 3D instance segmentation. Current approaches for 3D instance segmentation can typically only recognize object categories from a pre-defined closed set of classes that are annotated in the training datasets. This results in important limitations for real-world applications where one might need to perform tasks guided by novel, open-vocabulary queries related to a wide variety of objects. Recently, ...Conference Paper -
LABELMAKER: Automatic Semantic Label Generation from RGB-D Trajectories
(2024)2024 International Conference on 3D Vision (3DV)Semantic annotations are indispensable to train or evaluate perception models, yet very costly to acquire. This work introduces a fully automated 2D/3D labeling framework that, without any human intervention, can generate labels for RGB-D scans at equal (or better) level of accuracy than comparable manually annotated datasets such as ScanNet. Our approach is based on an ensemble of state-of-the-art segmentation models and 3D lifting through ...Conference Paper -
Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature
(2024)2024 International Conference on 3D Vision (3DV)Point cloud registration has seen recent success with several learning-based methods that focus on correspondence matching, and as such, optimize only for this objective. Following the learning step of correspondence matching, they evaluate the estimated rigid transformation with a RANSAC-like framework. While it is an indispensable component of these methods, it prevents a fully end-to-end training, leaving the objective to minimize the ...Conference Paper -
NICER-SLAM: Neural Implicit Scene Encoding for RGB SLAM
(2024)2024 International Conference on 3D Vision (3DV)Neural implicit representations have recently become popular in simultaneous localization and mapping (SLAM), especially in dense visual SLAM. However, existing works either rely on RGB-D sensors or require a separate monocular SLAM approach for camera tracking, and fail to produce high-fidelity 3D dense reconstructions. To address these shortcomings, we present NICER-SLAM, a dense RGB SLAM system that simultaneously optimizes for camera ...Conference Paper -
Handbook on Leveraging Lines for Two-View Relative Pose Estimation
(2024)2024 International Conference on 3D Vision (3DV)We propose an approach for estimating the relative pose between calibrated image pairs by jointly exploiting points, lines, and their coincidences in a hybrid manner. We investigate all possible configurations where these data modalities can be used together and review the minimal solvers available in the literature. Our hybrid framework combines the advantages of all configurations, enabling robust and accurate estimation in challenging ...Conference Paper -
OpenScene: 3D Scene Understanding with Open Vocabularies
(2023)2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision. We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space. This zero-shot approach enables task-agnostic training and open-vocabulary queries. For example, to perform SOTA zero-shot 3D semantic ...Conference Paper -
Four-view Geometry with Unknown Radial Distortion
(2023)2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)We present novel solutions to previously unsolved problems of relative pose estimation from images whose calibration parameters, namely focal lengths and radial distortion, are unknown. Our approach enables metric reconstruction without modeling these parameters. The minimal case for reconstruction requires 13 points in 4 views for both the calibrated and uncalibrated cameras. We describe and implement the first solution to these minimal ...Conference Paper -
Removing Objects From Neural Radiance Fields
(2023)2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Neural Radiance Fields (NeRFs) are emerging as a ubiquitous scene representation that allows for novel view synthesis. Increasingly, NeRFs will be shareable with other people. Before sharing a NeRF, though, it might be desirable to remove personal information or unsightly objects. Such removal is not easily achieved with the current NeRF editing frameworks. We propose a framework to remove objects from a NeRF representation created from ...Conference Paper