Beyond reproducibility: Knocking on sustainability's door
OPEN ACCESS
Loading...
Author / Producer
Date
2022
Publication Type
Doctoral Thesis
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Abstract
The following thesis presents three independent studies which were carried out as part of the author's doctoral studies in the Computational Biology Group at the Department of Biosystems Science and Engineering at ETH Zurich in Basel.
These projects deal with the development of statistical methods for the detection of pathway dysregulations, and the processing and analysis of next-generation sequencing data with a particular focus on the importance of benchmarking the methods' performances in a sustainable way.
The first two studies are based on the fact that cancer is a heterogeneous disease where the same phenotype can arise from different mutational patterns and propose novel methods for the computation of pathway enrichments.
The first study takes a causal approach and computes edge-specific pathway dysregulations while the second study computes global pathway dysregulation scores while accounting for term-term relations.
Both studies include an extensive benchmark workflow which tests both the performance on synthetic and real data sets as well as runs exploratory analyses.
The third study describes the development of a pipeline for the analysis of viral high-throughput sequencing data and an extensive benchmark of global haplotype reconstruction methods.
The dissertation is organized in the following way.
The first chapter provides an overview of different workflow management systems which can be used to create reproducible benchmarking workflows, a comment on the distinction between reproducible and sustainable data science, and their relevance in the fields of cancer genomics as well as virology.
The second chapter presents \emph{dce}, a computational method for the edge-specific detection of pathway dysregulations using a causal framework.
The third chapter presents \emph{pareg}, a regression-based method which addresses the issue of large and redundant pathway databases by incorporating term-term relations into the enrichment computation. It accomplishes this goal by adding regularization terms to the loss function of a generalized linear model.
The fourth chapter presents a scalable, reproducible and transparent pipeline for the analysis of viral sequencing data as well as a benchmark of global haplotype reconstruction methods.
The fifth chapter concludes the thesis by summarizing its findings as well as suggesting potential future directions.
Permanent link
Publication status
published
External links
Editor
Contributors
Book title
Journal / series
Volume
Pages / Article No.
Publisher
ETH Zurich
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
03790 - Beerenwinkel, Niko / Beerenwinkel, Niko