Journal: Bioinformatics

Loading...

Abbreviation

Bioinformatics

Publisher

Oxford University Press

Journal Volumes

ISSN

1367-4803
1460-2059

Description

Search Results

Publications1 - 10 of 228
  • ISMB/ECCB 2015
    Item type: Other Journal Item
    Moreau, Yves; Beerenwinkel, Niko (2015)
    Bioinformatics
  • Firtina, Can; Ghiasi, Nika Mansouri; Lindegger, Joël; et al. (2023)
    Bioinformatics
    Nanopore sequencers generate electrical raw signals in real-time while sequencing long genomic strands. These raw signals can be analyzed as they are generated, providing an opportunity for real-time genome analysis. An important feature of nanopore sequencing, Read Until, can eject strands from sequencers without fully sequencing them, which provides opportunities to computationally reduce the sequencing time and cost. However, existing works utilizing Read Until either (i) require powerful computational resources that may not be available for portable sequencers or (ii) lack scalability for large genomes, rendering them inaccurate or ineffective. We propose RawHash, the first mechanism that can accurately and efficiently perform real-time analysis of nanopore raw signals for large genomes using a hash-based similarity search. To enable this, RawHash ensures the signals corresponding to the same DNA content lead to the same hash value, regardless of the slight variations in these signals. RawHash achieves an accurate hash-based similarity search via an effective quantization of the raw signals such that signals corresponding to the same DNA content have the same quantized value and, subsequently, the same hash value. We evaluate RawHash on three applications: (i) read mapping, (ii) relative abundance estimation, and (iii) contamination analysis. Our evaluations show that RawHash is the only tool that can provide high accuracy and high throughput for analyzing large genomes in real-time. When compared to the state-of-the-art techniques, UNCALLED and Sigmap, RawHash provides (i) 25.8× and 3.4× better average throughput and (ii) significantly better accuracy for large genomes, respectively. Source code is available at https://github.com/CMU-SAFARI/RawHash.
  • Lippert, Christoph; Ghahramani, Zoubin; Borgwardt, Karsten M. (2010)
    Bioinformatics
  • Gehlenborg, Nils; Yan, Wei; Lee, Inyoul Y.; et al. (2009)
    Bioinformatics
  • Wissel, David; Janakarajan, Nikita; Schulte, Julius; et al. (2024)
    Bioinformatics
    Motivation Sparse survival models are statistical models that select a subset of predictor variables while modeling the time until an event occurs, which can subsequently help interpretability and transportability. The subset of important features is often obtained with regularized models, such as the Cox Proportional Hazards model with Lasso regularization, which limit the number of non-zero coefficients. However, such models can be sensitive to the choice of regularization hyperparameter.Results In this work, we develop a software package and demonstrate how knowledge distillation, a powerful technique in machine learning that aims to transfer knowledge from a complex teacher model to a simpler student model, can be leveraged to learn sparse survival models while mitigating this challenge. For this purpose, we present sparsesurv, a Python package that contains a set of teacher-student model pairs, including the semi-parametric accelerated failure time and the extended hazards models as teachers, which currently do not have Python implementations. It also contains in-house survival function estimators, removing the need for external packages. Sparsesurv is validated against R-based Elastic Net regularized linear Cox proportional hazards models as implemented in the commonly used glmnet package. Our results reveal that knowledge distillation-based approaches achieve competitive discriminative performance relative to glmnet across the regularization path while making the choice of the regularization hyperparameter significantly easier. All of these features, combined with a sklearn-like API, make sparsesurv an easy-to-use Python package that enables survival analysis for high-dimensional datasets through fitting sparse survival models via knowledge distillation.Availability and implementation sparsesurv is freely available under a BSD 3 license on GitHub (https://github.com/BoevaLab/sparsesurv) and The Python Package Index (PyPi) (https://pypi.org/project/sparsesurv/).
  • Bengtsson-Palme, Johan; Richardson, Rodney T.; Meola, Marco; et al. (2018)
    Bioinformatics
    Motivation Correct taxonomic identification of DNA sequences is central to studies of biodiversity using both shotgun metagenomic and metabarcoding approaches. However, no genetic marker gives sufficient performance across all the biological kingdoms, hampering studies of taxonomic diversity in many groups of organisms. This has led to the adoption of a range of genetic markers for DNA metabarcoding. While many taxonomic classification software tools can be re-trained on these genetic markers, they are often designed with assumptions that impair their utility on genes other than the SSU and LSU rRNA. Here, we present an update to Metaxa2 that enables the use of any genetic marker for taxonomic classification of metagenome and amplicon sequence data. Results We evaluated the Metaxa2 Database Builder on 11 commonly used barcoding regions and found that while there are wide differences in performance between different genetic markers, our software performs satisfactorily provided that the input taxonomy and sequence data are of high quality.
  • Rakitsch, Barbara; Lippert, Christoph; Stegle, Oliver; et al. (2013)
    Bioinformatics
    Motivation: Exploring the genetic basis of heritable traits remains one of the central challenges in biomedical research. In traits with simple Mendelian architectures, single polymorphic loci explain a significant fraction of the phenotypic variability. However, many traits of interest seem to be subject to multifactorial control by groups of genetic loci. Accurate detection of such multivariate associations is non-trivial and often compromised by limited statistical power. At the same time, confounding influences, such as population structure, cause spurious association signals that result in false-positive findings.Results: We propose linear mixed models LMM-Lasso, a mixed model that allows for both multi-locus mapping and correction for confounding effects. Our approach is simple and free of tuning parameters; it effectively controls for population structure and scales to genome-wide datasets. LMM-Lasso simultaneously discovers likely causal variants and allows for multi-marker–based phenotype prediction from genotype. We demonstrate the practical use of LMM-Lasso in genome-wide association studies in Arabidopsis thaliana and linkage mapping in mouse, where our method achieves significantly more accurate phenotype prediction for 91% of the considered phenotypes. At the same time, our model dissects the phenotypic variability into components that result from individual single nucleotide polymorphism effects and population structure. Enrichment of known candidate genes suggests that the individual associations retrieved by LMM-Lasso are likely to be genuine.Availability: Code available under http://webdav.tuebingen.mpg.de/u/karsten/Forschung/research.html.Contact:  rakitsch@tuebingen.mpg.de, ippert@microsoft.com or stegle@ebi.ac.ukSupplementary information:  Supplementary data are available at Bioinformatics online.
  • Pellizzoni, Paolo; Muzio, Giulia; Borgwardt, Karsten (2023)
    Bioinformatics
    Motivation Complex phenotypes, such as many common diseases and morphological traits, are controlled by multiple genetic factors, namely genetic mutations and genes, and are influenced by environmental conditions. Deciphering the genetics underlying such traits requires a systemic approach, where many different genetic factors and their interactions are considered simultaneously. Many association mapping techniques available nowadays follow this reasoning, but have some severe limitations. In particular, they require binary encodings for the genetic markers, forcing the user to decide beforehand whether to use, e.g. a recessive or a dominant encoding. Moreover, most methods cannot include any biological prior or are limited to testing only lower-order interactions among genes for association with the phenotype, potentially missing a large number of marker combinations. Results We propose HOGImine, a novel algorithm that expands the class of discoverable genetic meta-markers by considering higher-order interactions of genes and by allowing multiple encodings for the genetic variants. Our experimental evaluation shows that the algorithm has a substantially higher statistical power compared to previous methods, allowing it to discover genetic mutations statistically associated with the phenotype at hand that could not be found before. Our method can exploit prior biological knowledge on gene interactions, such as protein–protein interaction networks, genetic pathways, and protein complexes, to restrict its search space. Since computing higher-order gene interactions poses a high computational burden, we also develop a more efficient search strategy and support computation to make our approach applicable in practice, leading to substantial runtime improvements compared to state-of-the-art methods. Availability and implementation Code and data are available at https://github.com/BorgwardtLab/HOGImine
  • Dmitrenko, Andrei; Reid, Michelle; Zamboni, Nicola (2023)
    Bioinformatics
    Motivation: Untargeted metabolomics by mass spectrometry is the method of choice for unbiased analysis of molecules in complex samples of biological, clinical or environmental relevance. The exceptional versatility and sensitivity of modern high-resolution instruments allows profiling of thousands of known and unknown molecules in parallel. Inter-batch differences constitute a common and unresolved problem in untargeted metabolomics, and hinder the analysis of multi-batch studies or the intercomparison of experiments. Results: We present a new method, Regularized Adversarial Learning Preserving Similarity (RALPS), for the normal- ization of multi-batch untargeted metabolomics data. RALPS builds on deep adversarial learning with a three-term loss function that mitigates batch effects while preserving biological identity, spectral properties and coefficients of variation. Using two large metabolomics datasets, we showcase the superior performance of RALPS as compared with six state-of-the-art methods for batch correction. Further, we demonstrate that RALPS scales well, is robust, deals with missing values and can handle different experimental designs. Availability and implementation: https://github.com/zamboni-lab/RALPS. Contact: nzamboni@ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online.
  • Luo, Xiang Ge; Kuipers, Jack; Rupp, Kevin; et al. (2025)
    Bioinformatics
    Motivation The complex dynamics of cancer evolution, driven by mutation and selection, underlies the molecular heterogeneity observed in tumors. The evolutionary histories of tumors of different patients can be encoded as mutation trees and reconstructed in high resolution from single-cell sequencing data, offering crucial insights for studying fitness effects of and epistasis among mutations. Existing models, however, either fail to separate mutation and selection or neglect the evolutionary histories encoded by the tumor phylogenetic trees.Results We introduce FiTree, a tree-structured multi-type branching process model with epistatic fitness parameterization and a Bayesian inference scheme to learn fitness landscapes from single-cell tumor mutation trees. Through simulations, we demonstrate that FiTree outperforms state-of-the-art methods in inferring the fitness landscape underlying tumor evolution. Applying FiTree to a single-cell acute myeloid leukemia dataset, we identify epistatic fitness effects consistent with known biological findings and quantify uncertainty in predicting future mutational events. The new model unifies probabilistic graphical models of cancer progression with population genetics, offering a principled framework for understanding tumor evolution and informing therapeutic strategies.Availability and implementation The Python package FiTree and the analysis workflows are available at https://github.com/cbg-ethz/FiTree.
Publications1 - 10 of 228