
Open access
Autor(in)
Datum
2020Typ
- Doctoral Thesis
ETH Bibliographie
yes
Altmetrics
Abstract
The world is structured in countless ways. It may be prudent to enforce corresponding structural properties to a learning algorithm’s solution, such as incorporating prior beliefs, natural constraints, or causal structures. Doing so may translate to faster, more accurate, and more flexible models, which may directly relate to real-world impact. In this dissertation, we consider two different research areas that concern structuring a learning algorithm’s solution: when the structure is known and when it has to be discovered. First, we consider the case in which the desired structural properties are known, and we wish to express the solution of our learning algorithm as a sparse combination of elements from a set. We assume that this set is given in the form of a constraint for an optimization problem. Specifically, we consider convex combinations with additional affine constraints, linear combinations, and non-negative linear combinations. In the first case, we develop a stochastic optimization algorithm suitable to minimize non-smooth objectives with applications to Semidefinite Programs. In the case of linear combinations, we establish a connection in the analysis of Matching Pursuit and Coordinate Descent, which allows us to present a unified analysis of both algorithms. We also show the first accelerated convergence for both matching pursuit and steepest coordinate descent on convex objectives. On convex cones, we present the first principled definitions of non-negative MP algorithms which provably converge on convex and strongly convex objectives. Further, we consider the applications of greedy optimization to the problem of approximate probabilistic inference. We present an analysis of existing boosting variational inference approaches that yields novel theoretical insights and algorithmic simplifications. Second, we consider the case of learning the structural properties underlying a dataset by learning its factors of variation. This is an emerging field in representation learning that starts from the premise that real-world data is generated by a few explanatory factors of variation, which can be recovered by (unsupervised) learning algorithms. Recovering such factors should be useful for arbitrary downstream tasks. We challenge these ideas and provide a sober look at the common practices in the training and evaluation of such models. From the modeling perspective, we discuss under which conditions factors of variation can be disentangled and perform extensive empirical evaluations in the unsupervised, semi-supervised, and weakly-supervised settings. Regarding the evaluation, we discuss the biases and usefulness of the disentanglement metrics and the downstream benefits of disentanglement, particularly for fairness applications. Overall, we find that the unsupervised learning of disentangled representations is theoretically impossible, and unsupervised model selection appears challenging in practice. On the other hand, we also find that little and imprecise explicit supervision (in the order of 0.01-0.5% of the dataset) is sufficient to train and identify disentangled representations in the seven datasets we consider. Motivated by these results, we propose a new weakly-supervised disentanglement setting that is theoretically identifiable and does not require explicit observations of the factors of variation, providing useful representations for diverse tasks such as abstract visual reasoning, fairness, and strong generalization. Finally, we discuss the conceptual limits of disentangled representation and propose a novel paradigm based on attentive grouping. We propose a differentiable interface mapping perceptual features in a distributed representational format to a set of high-level, task-dependent variables that we evaluate on set prediction tasks. Mehr anzeigen
Persistenter Link
https://doi.org/10.3929/ethz-b-000474164Publikationsstatus
publishedExterne Links
Printexemplar via ETH-Bibliothek suchen
Beteiligte
Referent: Rätsch, Gunnar
Referent: Krause, Andreas
Referent: Schölkopf, Bernhard
Referent: Cevher, Volkan
Verlag
ETH ZurichOrganisationseinheit
09568 - Rätsch, Gunnar / Rätsch, Gunnar
Zugehörige Publikationen und Daten
Is previous version of: https://arxiv.org/abs/2111.13693
ETH Bibliographie
yes
Altmetrics