Machine Learning on Clinical Time Series: Classification and Representation Learning

Open access
Author
Date
2022Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
The life sciences of the digital era are driven by its most fundamental and irreplaceable currency: data. The advent of big data and machine learning (ML) algorithms has promised to revolutionise biomedical sciences and medical practice by means of automated diagnostics, data-driven disease subtyping and personalised treatments. However, while ML in health has become a vibrant field, in many cases the translation into practice has turned out to be more challenging than expected, or to put it more bluntly: the revolution is still pending. In this dissertation, we identify a set of challenges that arise when trying to leverage ML on clinical data, specifically for time series classification problems. Even though raw patient data are now being routinely collected in unprecedented amounts of electronic health records, typically, this data first needs to be carefully curated, preprocessed and annotated in order to arrive at a dataset that may be used in a ML pipeline to solve a down-stream prediction problem. Due to the complexity of this process already for a single dataset, external validations—albeit crucial—are frequently missing in existing studies. In the first part of this thesis, we consider the classification of clinical time series, in particular the application domain of sepsis prediction, where the goal is to early detect sepsis, a potentially fatal complication to infections. We propose mitigation strategies to the aforementioned issues by creating a large, multi-centric cohort of intensive care unit (ICU) patients with temporally annotated sepsis labels. This allowed us to perform the first international development and validation of sepsis prediction models using ML. Along the way, we found that federated learning and model sharing (as opposed to data sharing) leads to convincing performance—without requiring to physically export sensitive patient data outside the source site. Moreover, we encountered clinical time series of vital and laboratory measurements that were irregularly spaced and, for a given time step, incompletely observed. Throughout this thesis, we addressed informative missingness of data using Gaussian process models. After an application-focused first part, the second part of this dissertation considers the model’s inner workings more closely. Starting with irregularly sampled time series, we investigate path signatures, a powerful transform (that can be used as a neural network layer) to encode paths of data at virtually no loss of information. In particular, we explore how these signatures may be used to learn time series representations that lead to beneficial classification performance. We thereby uncover that the way the signature “interprets” raw data has drastic implications that are reflected in down-stream performance. We then propose a novel variant of Gaussian process adapters that lead to more robustness in signature-based models. Finally, after having considered model’s implicit interpretation of data, in the final chapter, we explore how models can learn and preserve structures that are available in the raw (and potentially high-dimensional) input data. For this, we leverage concepts from topological data analysis, and propose topological autoencoders, a novel deep learning architecture that can preserve complex structures and shapes of intangibly high-dimensional data in low-dimensional visualisations. In summary, we hope that our contributions to clinical time series classification will pave the way for the deployment of robust and validated models that create clinical value for the monitored patients. Moreover, we envision that our findings in temporal and topological representation learning will illuminate the analysis and understanding of the ever more accumulating wealth of large and high-dimensional biomedical datasets. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000532377Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichSubject
Machine learning; Deep learning; Healthcare; Time series analysis; Sepsis; Early warning systems; Topological data analysis; Dimensionality reductionOrganisational unit
09486 - Borgwardt, Karsten M. (ehemalig) / Borgwardt, Karsten M. (former)
More
Show all metadata
ETH Bibliography
yes
Altmetrics