Show simple item record

dc.contributor.author
Horn, Max
dc.contributor.supervisor
Borgwardt, Karsten M.
dc.contributor.supervisor
Vogt, Julia
dc.contributor.supervisor
Müller, Christian L.
dc.date.accessioned
2023-03-09T16:22:24Z
dc.date.available
2023-03-09T12:37:40Z
dc.date.available
2023-03-09T16:22:24Z
dc.date.issued
2023
dc.identifier.uri
http://hdl.handle.net/20.500.11850/602440
dc.identifier.doi
10.3929/ethz-b-000602440
dc.description.abstract
Machine learning has the potential to revolutionize the fields of biology and healthcare by providing new tools to help scientists and clinicians do research and decide what would be the right treatment for patients. However, while recent approaches in representation learning give the impression of being universal black-box solutions to all problems, research has shown that this is not generally true. Even though models can perform well in a black-box fashion, they often suffer from low generalization and are sensitive to distribution shifts. This highlights the need for developing approaches that are informed by their downstream application and tailored to incorporate symmetries of the problem into the model architecture. These inductive biases are essential for performance on new data and for models to remain robust even when the data distribution changes. Nevertheless, constructing good models is only half of the solution. To be sure that models would translate well into clinical applications they also need to be evaluated appropriately with this goal in mind. In this thesis, I address the above points while taking a detailed look at structured data types present at the intersection of biology, medicine, and machine learning. In terms of algorithmic contributions, I first present a new non-linear dimensionality reduction algorithm that aims to preserve multi-scale relations. The cost reduction of genome sequencing and the ability to sequence individual cells has led to exponentially increasing high-dimensional data in the life sciences. Such data cannot be intuitively understood, making dimensionality reduction approaches, which can capture the nested relationships present in biology, essential. Second, I develop methods for clinical applications where irregularly-sampled data are present. Conventional machine learning models either require the conversion of such data into fixed-size representations or the imputation of missing values prior to their application. I present two approaches tailored for irregularly-sampled data that do not require such preprocessing steps. The first is a new kernel for peaks derived from MALDI-TOF spectra, whereas the second is a deep learning model that can be applied to irregularly-sampled time series by phrasing them as sets of observations. Third, I present an extension to graph neural networks that allow the models to account for global information instead of requiring nodes to only exchange information with their neighbors. Graphs are an important data structure for pharmacology as they are often used to represent small molecules. In order to address the appropriate evaluation of such models, I present a detailed study of medical time series models with a focus on their capability to transfer to other datasets in the context of a sepsis early prediction task. Further, I show that the conventional approach for the evaluation of graph generative models is highly sensitive to the selection of hyperparameters which can lead to biased performance estimates. Summarizing, my thesis addresses many problems at the intersection of machine learning, healthcare, and biology. It demonstrates how models can be improved by including more (domain-specific) knowledge and where to pay attention when evaluating said models.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.subject
Machine Learning
en_US
dc.subject
Dimensionality reduction
en_US
dc.subject
Time Series
en_US
dc.subject
Graphs
en_US
dc.subject
Healthcare
en_US
dc.title
Representation Learning for Dimensionality Reduction, Irregularly-Sampled Sequences and Graphs
en_US
dc.type
Doctoral Thesis
dc.rights.license
In Copyright - Non-Commercial Use Permitted
dc.date.published
2023-03-09
ethz.size
198 p.
en_US
ethz.code.ddc
DDC - DDC::0 - Computer science, information & general works::004 - Data processing, computer science
en_US
ethz.code.ddc
DDC - DDC::6 - Technology, medicine and applied sciences::600 - Technology (applied sciences)
en_US
ethz.identifier.diss
28721
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02060 - Dep. Biosysteme / Dep. of Biosystems Science and Eng.::09486 - Borgwardt, Karsten M. (ehemalig) / Borgwardt, Karsten M. (former)
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02060 - Dep. Biosysteme / Dep. of Biosystems Science and Eng.::09486 - Borgwardt, Karsten M. (ehemalig) / Borgwardt, Karsten M. (former)
en_US
ethz.date.deposited
2023-03-09T12:37:40Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2023-03-09T16:22:25Z
ethz.rosetta.lastUpdated
2024-02-02T20:50:13Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Representation%20Learning%20for%20Dimensionality%20Reduction,%20Irregularly-Sampled%20Sequences%20and%20Graphs&rft.date=2023&rft.au=Horn,%20Max&rft.genre=unknown&rft.btitle=Representation%20Learning%20for%20Dimensionality%20Reduction,%20Irregularly-Sampled%20Sequences%20and%20Graphs
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record