Gunnar Rätsch
Loading...
Last Name
Rätsch
First Name
Gunnar
ORCID
Organisational unit
09568 - Rätsch, Gunnar / Rätsch, Gunnar
176 results
Search Results
Publications 1 - 10 of 176
- A Sober Look at the Unsupervised Learning of Disentangled Representations and their EvaluationItem type: Working Paper
arXivLocatello, Francesco; Bauer, Stefan; Lucic, Mario; et al. (2020)The idea behind the \emph{unsupervised} learning of \emph{disentangled} representations is that real-world data is generated by a few explanatory factors of variation which can be recovered by unsupervised learning algorithms. In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data. Then, we train over 14000 models covering most prominent methods and evaluation metrics in a reproducible large-scale experimental study on eight data sets. We observe that while the different methods successfully enforce properties "encouraged" by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision. Furthermore, different evaluation metrics do not always agree on what should be considered "disentangled" and exhibit systematic differences in the estimation. Finally, increased disentanglement does not seem to necessarily lead to a decreased sample complexity of learning for downstream tasks. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision, investigate concrete benefits of enforcing disentanglement of the learned representations, and consider a reproducible experimental setup covering several data sets. - A Bayesian Nonparametric Approach to Discover Clinico-Genetic Associations across Cancer TypesItem type: Working Paper
bioRxivPradier, Melanie F.; Hyland, Stephanie L.; Stark, Stefan G.; et al. (2019)Motivation Personalized medicine aims at combining genetic, clinical, and environmental data to improve medical diagnosis and disease treatment, tailored to each patient. This paper presents a Bayesian nonparametric (BNP) approach to identify genetic associations with clinical/environmental features in cancer. We propose an unsupervised approach to generate data-driven hypotheses and bring potentially novel insights about cancer biology. Our model combines somatic mutation information at gene-level with features extracted from the Electronic Health Record. We propose a hierarchical approach, the hierarchical Poisson factor analysis (H-PFA) model, to share information across patients having different types of cancer. To discover statistically significant associations, we combine Bayesian modeling with bootstrapping techniques and correct for multiple hypothesis testing. Results Using our approach, we empirically demonstrate that we can recover well-known associations in cancer literature. We compare the results of H-PFA with two other classical methods in the field: case-control (CC) setups, and linear mixed models (LMMs). - Deep Mean Functions for Meta-Learning in Gaussian ProcessesItem type: Working Paper
arXivFortuin, Vincent; Rätsch, Gunnar (2019) - Generalizable Single-Source Cross-modality Medical Image Segmentation via Invariant Causal MechanismsItem type: Working Paper
arXivChen, Boqi; Zhu, Yuanzhi; Ao, Yunke; et al. (2024)Single-source domain generalization (SDG) aims to learn a model from a single source domain that can generalize well on unseen target domains. This is an important task in computer vision, particularly relevant to medical imaging where domain shifts are common. In this work, we consider a challenging yet practical setting: SDG for cross-modality medical image segmentation. We combine causality-inspired theoretical insights on learning domain-invariant representations with recent advancements in diffusion-based augmentation to improve generalization across diverse imaging modalities. Guided by the ``intervention-augmentation equivariant'' principle, we use controlled diffusion models (DMs) to simulate diverse imaging styles while preserving the content, leveraging rich generative priors in large-scale pretrained DMs to comprehensively perturb the multidimensional style variable. Extensive experiments on challenging cross-modality segmentation tasks demonstrate that our approach consistently outperforms state-of-the-art SDG methods across three distinct anatomies and imaging modalities. The source code is available at https://github.com/ratschlab/ICMSeg. - Deep Multiple Instance Learning for Taxonomic Classification of Metagenomic read setsItem type: Working Paper
arXivGeorgiou, Andreas; Fortuin, Vincent; Mustafa, Harun; et al. (2019) - On Matching Pursuit and Coordinate DescentItem type: Working Paper
arXivLocatello, Francesco; Raj, Anant; Karimireddy, Sai P.; et al. (2018) - Multi-modal Graph Learning over UMLS Knowledge GraphsItem type: Conference Paper
Proceedings of Machine Learning Research ~ Proceedings of the 3rd Machine Learning for Health SymposiumBurger, Manuel; Rätsch, Gunnar; Kuznetsova, Rita (2023)Clinicians are increasingly looking towards machine learning to gain insights about patient progression. We propose a novel approach named Multi-Modal UMLS Graph Learning (MMUGL) for learning meaningful representations of medical concepts using graph neural networks over knowledge graphs based on the unified medical language system. These concept representations are aggregated to represent a patient visit and then fed into a sequence model to perform predictions at the granularity of multiple hospital visits of a patient. We improve performance by incorporating prior medical knowledge and considering multiple modalities. We compare our method to existing architectures proposed to learn representations at different granularities on the MIMIC-III dataset and show that our approach outperforms these methods. The results demonstrate the significance of multi-modal medical concept representations based on prior medical knowledge. We provide our code on GitHub https://github.com/ratschlab/mmugl . - Topology-based sparsification of graph annotationsItem type: Journal Article
BioinformaticsDanciu, Daniel; Karasikov, Mikhail; Mustafa, Harun; et al. (2021)Motivation: Since the amount of published biological sequencing data is growing exponentially, efficient methods for storing and indexing this data are more needed than ever to truly benefit from this invaluable resource for biomedical research. Labeled de Bruijn graphs are a frequently-used approach for representing large sets of sequencing data. While significant progress has been made to succinctly represent the graph itself, efficient methods for storing labels on such graphs are still rapidly evolving. Results: In this article, we present RowDiff, a new technique for compacting graph labels by leveraging expected similarities in annotations of vertices adjacent in the graph. RowDiff can be constructed in linear time relative to the number of vertices and labels in the graph, and in space proportional to the graph size. In addition, construction can be efficiently parallelized and distributed, making the technique applicable to graphs with trillions of nodes. RowDiff can be viewed as an intermediary sparsification step of the original annotation matrix and can thus naturally be combined with existing generic schemes for compressed binary matrices. Experiments on 10 000 RNA-seq datasets show that RowDiff combined with multi-BRWT results in a 30% reduction in annotation footprint over Mantis-MST, the previously known most compact annotation representation. Experiments on the sparser Fungi subset of the RefSeq collection show that applying RowDiff sparsification reduces the size of individual annotation columns stored as compressed bit vectors by an average factor of 42. When combining RowDiff with a multi-BRWT representation, the resulting annotation is 26 times smaller than Mantis-MST. - Mutant SF3B1 promotes malignancy in PDACItem type: Journal Article
eLifeSimmler, Patrik; Ioannidi, Eleonora I.; Mengis, Tamara; et al. (2023)The splicing factor SF3B1 is recurrently mutated in various tumors, including pancreatic ductal adenocarcinoma (PDAC). The impact of the hotspot mutation SF3B1ᴷ⁷⁰⁰ᴱ on the PDAC pathogenesis, however, remains elusive. Here, we demonstrate that Sf3b1ᴷ⁷⁰⁰ᴱ alone is insufficient to induce malignant transformation of the murine pancreas, but that it increases aggressiveness of PDAC if it co-occurs with mutated KRAS and p53. We further show that Sf3b1ᴷ⁷⁰⁰ᴱ already plays a role during early stages of pancreatic tumor progression and reduces the expression of TGF-β1-responsive epithelial–mesenchymal transition (EMT) genes. Moreover, we found that SF3B1ᴷ⁷⁰⁰ᴱ confers resistance to TGF-β1-induced cell death in pancreatic organoids and cell lines, partly mediated through aberrant splicing of Map3k7. Overall, our findings demonstrate that SF3B1ᴷ⁷⁰⁰ᴱ acts as an oncogenic driver in PDAC, and suggest that it promotes the progression of early stage tumors by impeding the cellular response to tumor suppressive effects of TGF-β. - ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learningItem type: Journal Article
PLoS Computational BiologyMineeva, Olga; Danciu, Daniel; Schölkopf, Bernhard; et al. (2023)The number of published metagenome assemblies is rapidly growing due to advances in sequencing technologies. However, sequencing errors, variable coverage, repetitive genomic regions, and other factors can produce misassemblies, which are challenging to detect for taxonomically novel genomic data. Assembly errors can affect all downstream analyses of the assemblies. Accuracy for the state of the art in reference-free misassembly prediction does not exceed an AUPRC of 0.57, and it is not clear how well these models generalize to real-world data. Here, we present the Residual neural network for Misassembled Contig identification (ResMiCo), a deep learning approach for reference-free identification of misassembled contigs. To develop ResMiCo, we first generated a training dataset of unprecedented size and complexity that can be used for further benchmarking and developments in the field. Through rigorous validation, we show that ResMiCo is substantially more accurate than the state of the art, and the model is robust to novel taxonomic diversity and varying assembly methods. ResMiCo estimated 7% misassembled contigs per metagenome across multiple real-world datasets. We demonstrate how ResMiCo can be used to optimize metagenome assembly hyperparameters to improve accuracy, instead of optimizing solely for contiguity. The accuracy, robustness, and ease-of-use of ResMiCo make the tool suitable for general quality control of metagenome assemblies and assembly methodology optimization.
Publications 1 - 10 of 176