
Open access
Author
Date
2022Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
The following thesis presents three independent studies which were carried out as part of the author's doctoral studies in the Computational Biology Group at the Department of Biosystems Science and Engineering at ETH Zurich in Basel.
These projects deal with the development of statistical methods for the detection of pathway dysregulations, and the processing and analysis of next-generation sequencing data with a particular focus on the importance of benchmarking the methods' performances in a sustainable way.
The first two studies are based on the fact that cancer is a heterogeneous disease where the same phenotype can arise from different mutational patterns and propose novel methods for the computation of pathway enrichments.
The first study takes a causal approach and computes edge-specific pathway dysregulations while the second study computes global pathway dysregulation scores while accounting for term-term relations.
Both studies include an extensive benchmark workflow which tests both the performance on synthetic and real data sets as well as runs exploratory analyses.
The third study describes the development of a pipeline for the analysis of viral high-throughput sequencing data and an extensive benchmark of global haplotype reconstruction methods.
The dissertation is organized in the following way.
The first chapter provides an overview of different workflow management systems which can be used to create reproducible benchmarking workflows, a comment on the distinction between reproducible and sustainable data science, and their relevance in the fields of cancer genomics as well as virology.
The second chapter presents \emph{dce}, a computational method for the edge-specific detection of pathway dysregulations using a causal framework.
The third chapter presents \emph{pareg}, a regression-based method which addresses the issue of large and redundant pathway databases by incorporating term-term relations into the enrichment computation. It accomplishes this goal by adding regularization terms to the loss function of a generalized linear model.
The fourth chapter presents a scalable, reproducible and transparent pipeline for the analysis of viral sequencing data as well as a benchmark of global haplotype reconstruction methods.
The fifth chapter concludes the thesis by summarizing its findings as well as suggesting potential future directions. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000571292Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichOrganisational unit
03790 - Beerenwinkel, Niko / Beerenwinkel, Niko
More
Show all metadata
ETH Bibliography
yes
Altmetrics