Journal: NAR Genomics and Bioinformatics

Loading...

Abbreviation

NAR Genom. Bioinform.

Publisher

Oxford University Press

Journal Volumes

ISSN

2631-9268

Description

Search Results

Publications 1 - 10 of 11
  • Prummer, Michael; Bertolini, Anne; Bosshard, Lars; et al. (2023)
    NAR Genomics and Bioinformatics
    Identifying cell types based on expression profiles is a pillar of single cell analysis. Existing machine-learning methods identify predictive features from annotated training data, which are often not available in early-stage studies. This can lead to overfitting and inferior performance when applied to new data. To address these challenges we present scROSHI, which utilizes previously obtained cell type-specific gene lists and does not require training or the existence of annotated data. By respecting the hierarchical nature of cell type relationships and assigning cells consecutively to more specialized identities, excellent prediction performance is achieved. In a benchmark based on publicly available PBMC data sets, scROSHI outperforms competing methods when training data are limited or the diversity between experiments is large.
  • Rapsomaniki, Maria Anna; Maxouri, Stella; Nathanailidou, Patroula; et al. (2021)
    NAR Genomics and Bioinformatics
    DNA replication is a complex and remarkably robust process: despite its inherent uncertainty, manifested through stochastic replication timing at a single-cell level, multiple control mechanisms ensure its accurate and timely completion across a population. Disruptions in these mechanisms lead to DNA rereplication, closely connected to genomic instability and oncogenesis. Here, we present a stochastic hybrid model of DNA re-replication that accurately portrays the interplay between discrete dynamics, continuous dynamics and uncertainty. Using experimental data on the fission yeast genome, model simulations show how different regions respond to rereplication and permit insight into the key mechanisms affecting re-replication dynamics. Simulated and experimental population-level profiles exhibit a good correlation along the genome, robust to model parameters, validating our approach. At a single-cell level, copy numbers of individual loci are affected by intrinsic properties of each locus, in cis effects from adjoining loci and in trans effects from distant loci. In silico analysis and single-cell imaging reveal that cell-to-cell heterogeneity is inherent in re-replication and can lead to genome plasticity and a plethora of genotypic variations.
  • Altenhoff, Adrian Michael; Nevers, Yannis; Tran, Vinh; et al. (2024)
    NAR Genomics and Bioinformatics
    The Quest for Orthologs (QfO) orthology benchmark service (https://orthology.benchmarkservice.org) hosts a wide range of standardized benchmarks for orthology inference evaluation. It is supported and maintained by the QfO consortium, and is used to gather ortholog predictions and to examine strengths and weaknesses of newly developed and existing orthology inference methods. The web server allows different inference methods to be compared in a standardized way using the same proteome data. The benchmark results are useful for developing new methods and can help researchers to guide their choice of orthology method for applications in comparative genomics and phylogenetic analysis. We here present a new release of the Orthology Benchmark Service with a new benchmark based on feature architecture similarity as well as updated reference proteomes. We further provide a meta-analysis of the public predictions from 18 different orthology assignment methods to reveal how they relate in terms of ortholog predictions and benchmark performance. These results can guide users of orthologs to the best suited method for their purpose.
  • Mak, Lauren; Tierney, Braden; Wei, Wei; et al. (2026)
    NAR Genomics and Bioinformatics
    Computational analysis of large-scale metagenomics sequencing datasets provides valuable isolate-level taxonomic and functional insights from complex microbial communities. However, the ever-expanding ecosystem of metagenomics-specific methods and file formats makes designing scalable workflows and seamlessly exploring output data increasingly challenging. Although one-click bioinformatics pipelines can help organize these tools into workflows, they face compatibility and maintainability challenges that can prevent replication. To address the gap in easily extensible yet robustly distributable metagenomics workflows, we have developed the Core Analysis Modular Pipeline (CAMP), a module-based metagenomics analysis system written in Snakemake, with a standardized module and directory architecture. Each module can run independently or in sequence to produce target data formats (e.g. short-read preprocessing alone or followed by de novo assembly), and provides output summary statistics reports and Jupyter notebook-based visualizations. We applied CAMP to a set of 10 metagenomics samples, demonstrating how a modular analysis system with built-in data visualization facilitates rich seamless communication between outputs from different analytical purposes. The CAMP ecosystem (module template and analysis modules) can be found at https://github.com/Meta-CAMP.
  • Fuhrmann, Lara; Langer, Benjamin; Topolsky, Ivan; et al. (2024)
    NAR Genomics and Bioinformatics
    RNA viruses exist as large heterogeneous populations within their host. The structure and diversity of virus populations affects disease progression and treatment outcomes. Next-generation sequencing allows detailed viral population analysis, but inferring diversity from error-prone reads is challenging. Here, we present VILOCA (VIral LOcal haplotype reconstruction and mutation CAlling for short and long read data), a method for mutation calling and reconstruction of local haplotypes from short- and long-read viral sequencing data. Local haplotypes refer to genomic regions that have approximately the length of the input reads. VILOCA recovers local haplotypes by using a Dirichlet process mixture model to cluster reads around their unobserved haplotypes and leveraging quality scores of the sequencing reads. We assessed the performance of VILOCA in terms of mutation calling and haplotype reconstruction accuracy on simulated and experimental Illumina, PacBio and Oxford Nanopore data. On simulated and experimental Illumina data, VILOCA performed better or similar to existing methods. On the simulated long-read data, VILOCA is able to recover on average 82% of the ground truth mutations with perfect precision compared to only 69% recall and 68% precision of the second-best method. In summary, VILOCA provides significantly improved accuracy in mutation and haplotype calling, especially for long-read sequencing data, and therefore facilitates the comprehensive characterization of heterogeneous within-host viral populations.
  • Mädler, Sophia Clara; Julien-Laferriere, Alice; Wyss, Luis; et al. (2021)
    NAR Genomics and Bioinformatics
    Single-cell RNA sequencing (scRNA-seq) revolutionized our understanding of disease biology. The promise it presents to also transform translational research requires highly standardized and robust software workflows. Here, we present the toolkit Besca, which streamlines scRNA-seq analyses and their use to deconvolute bulk RNA-seq data according to current best practices. Beyond a standard workflow covering quality control, filtering, and clustering, two complementary Besca modules, utilizing hierarchical cell signatures and supervised machine learning, automate cell annotation and provide harmonized nomenclatures. Subsequently, the gene expression profiles can be employed to estimate cell type proportions in bulk transcriptomics data. Using multiple, diverse scRNA-seq datasets, some stemming from highly heterogeneous tumor tissue, we show how Besca aids acceleration, interoperability, reusability and interpretability of scRNA-seq data analyses, meeting crucial demands in translational research and beyond.
  • Firtina, Can; Park, Jisung; Alser, Mohammed; et al. (2023)
    NAR Genomics and Bioinformatics
    Generating the hash values of short subsequences, called seeds, enables quickly identifying similarities between genomic sequences by matching seeds with a single lookup of their hash values. However, these hash values can be used only for finding exact-matching seeds as the conventional hashing methods assign distinct hash values for different seeds, including highly similar seeds. Finding only exact-matching seeds causes either (i) increasing the use of the costly sequence alignment or (ii) limited sensitivity. We introduce BLEND, the first efficient and accurate mechanism that can identify both exact-matching and highly similar seeds with a single lookup of their hash values, called fuzzy seed matches. BLEND (i) utilizes a technique called SimHash, that can generate the same hash value for similar sets, and (ii) provides the proper mechanisms for using seeds as sets with the SimHash technique to find fuzzy seed matches efficiently. We show the benefits of BLEND when used in read overlapping and read mapping. For read overlapping, BLEND is faster by 2.4×-83.9× (on average 19.3×), has a lower memory footprint by 0.9×-14.1× (on average 3.8×), and finds higher quality overlaps leading to accurate de novo assemblies than the state-of-the-art tool, minimap2. For read mapping, BLEND is faster by 0.8×-4.1× (on average 1.7×) than minimap2. Source code is available at https://github.com/CMU-SAFARI/BLEND.
  • Yermanos, Alexander; Agrafiotis, Andreas; Kuhn, Raphael; et al. (2021)
    NAR Genomics and Bioinformatics
    High-throughput single-cell sequencing (scSeq) technologies are revolutionizing the ability to molecularly profile B and T lymphocytes by offering the opportunity to simultaneously obtain information on adaptive immune receptor repertoires (VDJ repertoires) and transcriptomes. An integrated quantification of immune repertoire parameters, such as germline gene usage, clonal expansion, somatic hypermutation and transcriptional states opens up new possibilities for the high-resolution analysis of lymphocytes and the inference of antigen-specificity. While multiple tools now exist to investigate gene expression profiles from scSeq of transcriptomes, there is a lack of software dedicated to single-cell immune repertoires. Here, we present Platypus, an open-source software platform providing a user-friendly interface to investigate B-cell receptor and T-cell receptor repertoires from scSeq experiments. Platypus provides a framework to automate and ease the analysis of single-cell immune repertoires while also incorporating transcriptional information involving unsupervised clustering, gene expression and gene ontology. To showcase the capabilities of Platypus, we use it to analyze and visualize single-cell immune repertoires and transcriptomes from B and T cells from convalescent COVID-19 patients, revealing unique insight into the repertoire features and transcriptional profiles of clonally expanded lymphocytes. Platypus will expedite progress by facilitating the analysis of single-cell immune repertoire and transcriptome sequencing.
  • Pettersen, Jens Sivkær; Nielsen, Flemming Damgaard; Andreassen, Patrick Rosendahl; et al. (2024)
    NAR Genomics and Bioinformatics
    Two-component systems are key signal-transduction systems that enable bacteria to respond to a wide variety of environmental stimuli. The human pathogen, Streptococcus pneumoniae (pneumococcus) encodes 13 two-component systems and a single orphan response regulator, most of which are significant for pneumococcal pathogenicity. Mapping the regulatory networks governed by these systems is key to understand pneumococcal host adaptation. Here we employ a novel bioinformatic approach to predict the regulons of each two-component system based on publicly available whole-genome sequencing data. By employing pangenome-wide association studies (panGWAS) to predict genotype-genotype associations for each two-component system, we predicted regulon genes of 11 of the pneumococcal two-component systems. Through validation via next-generation RNA-sequencing on response regulator overexpression mutants, several top candidate genes predicted by the panGWAS analysis were confirmed as regulon genes. The present study presents novel details on multiple pneumococcal two-component systems, including an expansion of regulons, identification of candidate response regulator binding motifs, and identification of candidate response regulator-regulated small non-coding RNAs. We also demonstrate a use for panGWAS as a complementary tool in target gene identification via identification of genotype-to-genotype links. Expanding our knowledge on two-component systems in pathogens is crucial to understanding how these bacteria sense and respond to their host environment, which could prove useful in future drug development.
  • Story, Benjamin; Velten, Lars; Mönke, Gregor; et al. (2024)
    NAR Genomics and Bioinformatics
    Clonal cell population dynamics play a critical role in both disease and de v elopment. Due to high mitochondrial mutation rates under both healthy and diseased conditions, mitochondrial genomic variability is a particularly useful resource in facilitating the identification of clonal population str uct ure. Here we present mitoClone2, an all-inclusive R package allowing for the identification of clonal populations through integration of mitochondrial heteroplasmic variants discovered from single-cell sequencing experiments. Our package streamlines the investigation of this phenomenon by providing: built-in compatibility with commonly used tools for the delineation of clonal str uct ure, the ability to directly use multiple x ed BAM files as input, annotations for both human and mouse mitochondrial genomes, and helper functions for calling , filtering , clustering, and visualizing variants.
Publications 1 - 10 of 11