Search

Show Advanced FiltersHide Advanced Filters

Use the advanced filters to refine the search results.

Results

Now showing items 1-8 of 8

MetaGraph-MLA: Label-guided alignment to variable-order De Bruijn graphs

Mustafa, Harun; Karasikov, Mikhail; Rätsch, Gunnar; et al. (2022)

bioRxiv

The amount of data stored in genomic sequence databases is growing exponentially, far exceeding traditional indexing strategies’ processing capabilities. Many recent indexing methods organize sequence data into a sequence graph to succinctly represent large genomic data sets from reference genome and sequencing read set databases. These methods typically use De Bruijn graphs as the graph model or the underlying index model, with auxiliary ...

Working Paper
Aligning Distant Sequences to Graphs using Long Seed Sketches

Joudaki, Amir; Meterez, Alexandru; Mustafa, Harun; et al. (2022)

bioRxiv

Sequence-to-graph alignment is an important step in applications such as variant genotyping, read error correction and genome assembly. When a query sequence requires a substantial number of edits to align, approximate alignment tools that follow the seed-and-extend approach require shorter seeds to get any matches. However, in large graphs with high variation, relying on a shorter seed length leads to an exponential increase in spurious ...

Working Paper
Lossless Indexing with Counting de Bruijn Graphs

Karasikov, Mikhail; Mustafa, Harun; Rätsch, Gunnar; et al. (2021)

bioRxiv

High-throughput sequencing data is rapidly accumulating in public repositories. Making this resource accessible for interactive analysis at scale requires efficient approaches for its storage and indexing. There have recently been remarkable advances in solving the experiment discovery problem and building compressed representations of annotated de Bruijn graphs where k-mer sets can be efficiently indexed and interactively queried. However, ...

Working Paper
Using Genome Graph Topology to Guide Annotation Matrix Sparsification

Danciu, Daniel; Karasikov, Mikhail; Mustafa, Harun; et al. (2020)

bioRxiv

Since the amount of published biological sequencing data is growing exponentially, efficient methods for storing and indexing this data are more needed than ever to truly benefit from this invaluable resource for biomedical research. Labeled de Bruijn graphs are a frequently-used approach for representing large sets of sequencing data. While significant progress has been made to succinctly represent the graph itself, efficient methods for ...

Working Paper
MetaGraph: Indexing and Analysing Nucleotide Archives at Petabase-scale

Karasikov, Mikhail; Mustafa, Harun; Danciu, Daniel; et al. (2020)

bioRxiv

The amount of biological sequencing data available in public repositories is growing exponentially, forming an invaluable biomedical research resource. Yet, making all this sequencing data searchable and easily accessible to life science and data science researchers is an unsolved problem. We present MetaGraph, a versatile framework for the scalable analysis of extensive sequence repositories. MetaGraph efficiently indexes vast collections ...

Working Paper
Sparse Binary Relation Representations for Genome Graph Annotation

Karasikov, Mikhail; Mustafa, Harun; Joudaki, Amir; et al. (2018)

bioRxiv

High-throughput DNA sequencing data is accumulating in public repositories, and efficient approaches for storing and indexing such data are in high demand. In recent research, several graph data structures have been proposed to represent large sets of sequencing data and allow for efficient query of sequences. In particular, the concept of colored de Bruijn graphs has been explored by several groups. While there has been good progress ...

Working Paper
Efficient graph-color compression with neighborhood-informed Bloom filters

Schilken, Ingo; Mustafa, Harun; Rätsch, Gunnar; et al. (2017)

bioRxiv

Technological advancements in high throughput DNA sequencing have led to an exponential growth of sequencing data being produced and stored as a byproduct of biomedical research. Despite its public availability, a majority of this data remains inaccessible to the research com- munity through a lack efficient data representation and indexing solutions. One of the available techniques to represent read data on a more abstract level is its ...

Working Paper
Metannot: A succinct data structure for compression of colors in dynamic de Bruijn graphs

Mustafa, Harun; Kahles, André; Karasikov, Mikhail; et al. (2017)

bioRxiv

Much of the DNA and RNA sequencing data available is in the form of high-throughput sequencing (HTS) reads and is currently unindexed by established sequence search databases. Recent succinct data structures for indexing both reference sequences and HTS data, along with associated metadata, have been based on either hashing or graph models, but many of these structures are static in nature, and thus, not well-suited as backends for dynamic ...

Working Paper

Research Collection

Search

Results

MetaGraph-MLA: Label-guided alignment to variable-order De Bruijn graphs ﻿

Aligning Distant Sequences to Graphs using Long Seed Sketches ﻿

Lossless Indexing with Counting de Bruijn Graphs ﻿

Using Genome Graph Topology to Guide Annotation Matrix Sparsification ﻿

MetaGraph: Indexing and Analysing Nucleotide Archives at Petabase-scale ﻿

Sparse Binary Relation Representations for Genome Graph Annotation ﻿

Efficient graph-color compression with neighborhood-informed Bloom filters ﻿

Metannot: A succinct data structure for compression of colors in dynamic de Bruijn graphs ﻿

Refine by

MetaGraph-MLA: Label-guided alignment to variable-order De Bruijn graphs

Aligning Distant Sequences to Graphs using Long Seed Sketches

Lossless Indexing with Counting de Bruijn Graphs

Using Genome Graph Topology to Guide Annotation Matrix Sparsification

MetaGraph: Indexing and Analysing Nucleotide Archives at Petabase-scale

Sparse Binary Relation Representations for Genome Graph Annotation

Efficient graph-color compression with neighborhood-informed Bloom filters

Metannot: A succinct data structure for compression of colors in dynamic de Bruijn graphs