
Open access
Autor(in)
Datum
2019Typ
- Doctoral Thesis
ETH Bibliographie
yes
Altmetrics
Abstract
Scene understanding is one of the fastest growing areas in computer vision research. Such growth is mainly driven by the emergence of deep learning techniques that contributed to boosting performance on popular benchmarks for well-studied tasks, and to approaching tasks that have been very difficult to solve with traditional techniques. This dissertation examines how traditional low-level features such as boundaries and points help in tackling higher-level scene understanding tasks such as detection, segmentation, and 3D reconstruction.
First, we propose a hierarchical grouping algorithm that uses deeply learned boundaries and their orientation. We examine how grouping from predicted boundaries can help object detection and semantic segmentation when plugged into the corresponding pipelines.
Second, we use human-generated points for guided object segmentation. We show how to obtain segmented masks by using extreme points provided by humans, and how to speed up the time-consuming process of annotating for segmentation by using this technique.
Third, we show how automatically detected keypoints help 3D re- construction in a complicated environment for robot-assisted retinal surgery. The task is to provide visual guidance during surgery by using two stereo cameras mounted on the surgical microscope. We propose a method for calibration, 3D registration, and 3D reconstruction from a single pipeline, by detecting specific robot keypoints, and by obtaining 3D to 2D correspondences just by moving the robot.
Last, we examine the interplay of low-level and high-level tasks when trained jointly in a single neural network. We propose ways to overcome problems such as task interference and limited capacity as a result of jointly training for many different, unrelated tasks. We propose a universal network that can tackle all tasks, but only one task at a time.
All in all, we show how to predict low-level features and how they contribute to different pipelines a) in combination with deep networks trained for scene understanding b) as human-generated input, c) in combination with 3D reconstruction, and d) by jointly training them with higher-level tasks. Mehr anzeigen
Persistenter Link
https://doi.org/10.3929/ethz-b-000382817Publikationsstatus
publishedExterne Links
Printexemplar via ETH-Bibliothek suchen
Beteiligte
Referent: Van Gool, Luc
Referent: Tombari, Federico
Referent: Kokkinos, Iasonas
Referent: Pont-Tuset, Jordi
Verlag
ETH ZurichThema
Scene understanding; Segmentation; Multi-task learning; Boundary detection; 3D reconstructionOrganisationseinheit
03514 - Van Gool, Luc / Van Gool, Luc
ETH Bibliographie
yes
Altmetrics