Towards Practical Domain Adaptation for Scene Understanding

Gong, Rui

doi:10.3929/ethz-b-000667814

Download

Full text (PDF, 40.13Mb)

Open access

Author

Gong, Rui

Date

2024

Type

Doctoral Thesis

ETH Bibliography

yes

Altmetrics

Download

Full text (PDF, 40.13Mb)

Rights / license

In Copyright - Non-Commercial Use Permitted

Abstract

Scene understanding, which aims to understand visual scenes comprehensively, stands as a pivotal element within the field of computer vision. To empower machine with the human-like scene understanding ability, semantic segmentation emerges as a crucial tool, forming the essence of a broad range of applications, e.g., autonomous driving, robot vision and human-computer interaction. Over the past decade, semantic segmentation models have achieved significant success, propelled by the availability of large-scale datasets and the rapid advancement of deep learning techniques. However, generalization of these models to new and different domains remains limited. Training domain-robust models typically relies on the labor-intensive process of labeling extensive and diverse datasets, resulting in significant costs and hindering the practical deployment of these models in real-world applications. In such cases, domain adaptation aims at adapting the semantic segmentation model trained on the labeled source domain to the unlabeled target domain, thereby eliminating the need for labeling the target domain. Traditional domain adaptation typically relies on implicit or explicit assumptions, such as assuming a single data distribution for the source or target domain, or maintaining consistent taxonomies between them. However, these assumptions prove impractical in real-world applications. Moreover, prevailing domain adaptation frameworks depend on pseudo-labels assigned to the unlabeled target domain, introducing noise due to domain discrepancies. The presence of low-quality pseudo-labels inevitably impedes the adaptation process. To tackle these challenges, this dissertation introduces a set of domain adaptive semantic segmentation methods that tackle these challenges and close to practical scenarios, ultimately enhancing scene understanding. We propose four main contributions detailed below. Firstly, we propose a multi-source domain adaptation and label unification (mDALU) problem along with a novel method to address it. In the mDALU setting, there exist multiple source domains and an unlabeled target domain, with only a subset of classes labeled in each source domain. The objective of mDALU is to develop a model encompassing all classes in the target domain. Our approach comprises a two-stage adaptation process: a partially-supervised adaptation stage and a fully-supervised adaptation stage. In the partially-supervised stage, partial knowledge is transferred from multiple source domains to the target domain and integrated. In the fully-supervised stage, knowledge is transferred within a unified label space following a label completion process involving pseudo-labels. Secondly, we present a principled meta-learning based approach to tackle open compound domain adaptation (OCDA) problem, wherein the target domain is considered as a compound of multiple unknown sub-domains. Our approach comprises four essential steps: cluster, split, fuse, and update. These steps establish a hyper-network to uncover and integrate the knowledge from the unknown sub-domains in the target domain. Additionally, we incorporate a meta-learning strategy for online model updates during testing, achieved with just a single-gradient step. Thirdly, we propose a taxonomy adaptive cross-domain semantic segmentation (TACS) problem, addressing both image-level and label-level domain gaps. In particular, the label-level domain gap accommodates inconsistent taxonomies between the source and target domains (e.g., the ”person” class in the source domain being fine-grained as ”rider” and ”pedestrian” in the target domain). To tackle TACS comprehensively, we develop an approach that simultaneously handles image-level and label-level domain adaptation. At the label level, we utilize a bilateral mixed sampling strategy to augment the target domain and employ a relabelling method to harmonize and align the label spaces. To mitigate the image-level domain gap, we propose an uncertainty-rectified contrastive learning method, resulting in more domain-invariant and class-discriminative features. Lastly, we introduce a framework based on implicit neural representations to enhance domain adaptation performance. In greater detail, the pseudo-label learning mechanism underlies the majority of domain-adaptive semantic segmentation methods. Our proposal involves estimating rectification values for predicted pseudo-labels us- ing implicit neural representations, thereby enhancing the quality of pseudo-labels and facilitating the domain adaptation process. In a nutshell, we demonstrate that our proposed problems and approaches transcend traditional domain adaptation limitations, enriching practical domain adaptation. This advancement facilitates robust scene understanding and application in real-world scenarios. Show more

Permanent link

https://doi.org/10.3929/ethz-b-000667814

Publication status

published

External links

Search print copy at ETH Library

Contributors

Examiner: Van Gool, Luc
Examiner: Sebe, Nicu
Examiner: Yang, Ming-Hsuan

Publisher

ETH Zurich

Subject

Computer Vision; Artificial Intelligence; Scene Understanding; Domain Adaptation; Machine Learning; Deep Learning