Show simple item record

dc.contributor.author
Locatello, Francesco
dc.contributor.supervisor
Rätsch, Gunnar
dc.contributor.supervisor
Krause, Andreas
dc.contributor.supervisor
Schölkopf, Bernhard
dc.contributor.supervisor
Cevher, Volkan
dc.date.accessioned
2021-12-14T10:44:21Z
dc.date.available
2021-03-12T11:50:54Z
dc.date.available
2021-03-12T13:05:29Z
dc.date.available
2021-12-14T10:44:21Z
dc.date.issued
2020
dc.identifier.uri
http://hdl.handle.net/20.500.11850/474164
dc.identifier.doi
10.3929/ethz-b-000474164
dc.description.abstract
The world is structured in countless ways. It may be prudent to enforce corresponding structural properties to a learning algorithm’s solution, such as incorporating prior beliefs, natural constraints, or causal structures. Doing so may translate to faster, more accurate, and more flexible models, which may directly relate to real-world impact. In this dissertation, we consider two different research areas that concern structuring a learning algorithm’s solution: when the structure is known and when it has to be discovered. First, we consider the case in which the desired structural properties are known, and we wish to express the solution of our learning algorithm as a sparse combination of elements from a set. We assume that this set is given in the form of a constraint for an optimization problem. Specifically, we consider convex combinations with additional affine constraints, linear combinations, and non-negative linear combinations. In the first case, we develop a stochastic optimization algorithm suitable to minimize non-smooth objectives with applications to Semidefinite Programs. In the case of linear combinations, we establish a connection in the analysis of Matching Pursuit and Coordinate Descent, which allows us to present a unified analysis of both algorithms. We also show the first accelerated convergence for both matching pursuit and steepest coordinate descent on convex objectives. On convex cones, we present the first principled definitions of non-negative MP algorithms which provably converge on convex and strongly convex objectives. Further, we consider the applications of greedy optimization to the problem of approximate probabilistic inference. We present an analysis of existing boosting variational inference approaches that yields novel theoretical insights and algorithmic simplifications. Second, we consider the case of learning the structural properties underlying a dataset by learning its factors of variation. This is an emerging field in representation learning that starts from the premise that real-world data is generated by a few explanatory factors of variation, which can be recovered by (unsupervised) learning algorithms. Recovering such factors should be useful for arbitrary downstream tasks. We challenge these ideas and provide a sober look at the common practices in the training and evaluation of such models. From the modeling perspective, we discuss under which conditions factors of variation can be disentangled and perform extensive empirical evaluations in the unsupervised, semi-supervised, and weakly-supervised settings. Regarding the evaluation, we discuss the biases and usefulness of the disentanglement metrics and the downstream benefits of disentanglement, particularly for fairness applications. Overall, we find that the unsupervised learning of disentangled representations is theoretically impossible, and unsupervised model selection appears challenging in practice. On the other hand, we also find that little and imprecise explicit supervision (in the order of 0.01-0.5% of the dataset) is sufficient to train and identify disentangled representations in the seven datasets we consider. Motivated by these results, we propose a new weakly-supervised disentanglement setting that is theoretically identifiable and does not require explicit observations of the factors of variation, providing useful representations for diverse tasks such as abstract visual reasoning, fairness, and strong generalization. Finally, we discuss the conceptual limits of disentangled representation and propose a novel paradigm based on attentive grouping. We propose a differentiable interface mapping perceptual features in a distributed representational format to a set of high-level, task-dependent variables that we evaluate on set prediction tasks.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.title
Enforcing and Discovering Structure in Machine Learning
en_US
dc.type
Doctoral Thesis
dc.rights.license
In Copyright - Non-Commercial Use Permitted
dc.date.published
2021-03-12
ethz.size
282 p.
en_US
ethz.code.ddc
DDC - DDC::0 - Computer science, information & general works::004 - Data processing, computer science
en_US
ethz.identifier.diss
27248
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09568 - Rätsch, Gunnar / Rätsch, Gunnar
en_US
ethz.relation.isPreviousVersionOf
https://arxiv.org/abs/2111.13693
ethz.date.deposited
2021-03-12T11:51:04Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2021-03-12T13:05:40Z
ethz.rosetta.lastUpdated
2022-03-29T16:35:14Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Enforcing%20and%20Discovering%20Structure%20in%20Machine%20Learning&rft.date=2020&rft.au=Locatello,%20Francesco&rft.genre=unknown&rft.btitle=Enforcing%20and%20Discovering%20Structure%20in%20Machine%20Learning
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record