Show simple item record

dc.contributor.author
Sutter, Thomas M.
dc.contributor.supervisor
Vogt, Julia E.
dc.contributor.supervisor
Mandt, Stephan
dc.contributor.supervisor
Rätsch, Gunnar
dc.date.accessioned
2023-10-04T05:58:47Z
dc.date.available
2023-10-03T12:53:20Z
dc.date.available
2023-10-04T05:58:47Z
dc.date.issued
2023
dc.identifier.uri
http://hdl.handle.net/20.500.11850/634822
dc.identifier.doi
10.3929/ethz-b-000634822
dc.description.abstract
Humans naturally integrate various senses to understand our surroundings, enabling us to compensate for partially missing sensory input.On the contrary, machine learning models excel at harnessing extensive datasets but face challenges in handling missing data effectively. While utilizing multiple data types provides a more comprehensive perspective, it also raises the likelihood of encountering missing values, underscoring the significance of proper missing data management in machine learning techniques. In this thesis, we advocate for developing machine learning models that emulate the human approach of merging diverse sensory inputs into a unified representation, demonstrating resilience in the face of missing input sources. Generating labels for multiple data types is laborious and often costly, resulting in a scarcity of fully annotated multimodal datasets. On the other hand, multimodal data naturally possesses a form of weak supervision. We understand that these samples describe the same event and assume that certain underlying generative factors are shared among the group members, providing a form of weak guidance. Our thesis focuses on learning from data characterized by weak supervision, delving into the interrelationships among group members. We start by exploring novel techniques for machine learning models capable of processing multimodal inputs while effectively handling missing data. Our emphasis is on variational autoencoders (VAE) for learning from weakly supervised data. We introduce a generalized formulation of probabilistic aggregation functions, designed to overcome the limitations of previous methods, and we show how this generalized formulation correlates with performance enhancements. At a higher level, we investigate the impact of implicit assumptions regarding group structure on a model's learning behavior and efficacy. We find that the assumption of a single shared latent space is overly restrictive for generating coherent and high-quality samples. To overcome this limitation, we introduce modality-specific latent subspaces within multimodal VAEs, reflecting a more flexible modeling approach. While we observe that greater flexibility in modeling assumptions, or assumptions aligned with the actual data generation process, leads to improved performance, we still depend on prior knowledge concerning the relationship of a group of multimodal or weakly supervised samples. As the number of group members grows, their underlying relationships become potentially more intricate, increasing the risk of overly rigid assumptions. Therefore, in the final section, we shift our focus to minimizing the assumptions required when learning from weakly supervised data and simultaneously deducing the group structure during the learning process. In this context, we introduce a novel differentiable formulation of a random partition model, which follows a two-stage process. In the first step, we estimate the number of elements using a newly proposed differentiable formulation of the hypergeometric distribution. In the second step, we allocate the appropriate number of elements to each subset. We can demonstrate that our differentiable random partition model can learn shared and independent generative factors in the weakly supervised setting. We aspire that this thesis and its contributions will enhance future applications in multimodal machine learning and reduce the assumptions necessary for learning from weakly supervised data in general.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.subject
Machine Learning
en_US
dc.subject
Computer Science
en_US
dc.title
Imposing and Uncovering Group Structure in Weakly-Supervised Learning
en_US
dc.type
Doctoral Thesis
dc.rights.license
In Copyright - Non-Commercial Use Permitted
dc.date.published
2023-10-04
ethz.size
212 p.
en_US
ethz.code.ddc
DDC - DDC::6 - Technology, medicine and applied sciences::600 - Technology (applied sciences)
en_US
ethz.identifier.diss
29385
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09670 - Vogt, Julia / Vogt, Julia
en_US
ethz.relation.cites
10.3929/ethz-b-000392472
ethz.relation.cites
10.3929/ethz-b-000520268
ethz.relation.cites
10.3929/ethz-b-000588775
ethz.relation.cites
20.500.11850/520270
ethz.relation.cites
20.500.11850/459226
ethz.relation.cites
20.500.11850/594122
ethz.date.deposited
2023-10-03T12:53:20Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2023-10-04T05:59:02Z
ethz.rosetta.lastUpdated
2023-10-04T05:59:02Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Imposing%20and%20Uncovering%20Group%20Structure%20in%20Weakly-Supervised%20Learning&rft.date=2023&rft.au=Sutter,%20Thomas%20M.&rft.genre=unknown&rft.btitle=Imposing%20and%20Uncovering%20Group%20Structure%20in%20Weakly-Supervised%20Learning
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record