Show simple item record

dc.contributor.author
Schilken, Ingo
dc.contributor.author
Mustafa, Harun
dc.contributor.author
Rätsch, Gunnar
dc.contributor.author
Eickhoff, Carsten
dc.contributor.author
Kahles, Andre
dc.date.accessioned
2018-01-30T11:12:26Z
dc.date.available
2018-01-29T12:51:39Z
dc.date.available
2018-01-30T10:25:36Z
dc.date.available
2018-01-30T11:12:26Z
dc.date.issued
2017
dc.identifier.other
10.1101/239806
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/236144
dc.identifier.doi
10.3929/ethz-b-000236144
dc.description.abstract
Technological advancements in high throughput DNA sequencing have led to an exponential growth of sequencing data being produced and stored as a byproduct of biomedical research. Despite its public availability, a majority of this data remains inaccessible to the research com- munity through a lack efficient data representation and indexing solutions. One of the available techniques to represent read data on a more abstract level is its transformation into an assem- bly graph. Although the sequence information is now accessible, any contextual annotation and metadata is lost. We present a new approach for a compressed representation of a graph coloring based on a set of Bloom filters. By dropping the requirement of a fully lossless compression and using the topological information of the underlying graph to decide on false positives, we can reduce the memory requirements for a given set of colors per edge by three orders of magnitude. As insertion and query on a Bloom filter are constant time operations, the complexity to compress and decompress an edge color is linear in the number of color bits. Representing individual colors as independent filters, our approach is fully dynamic and can be easily parallelized. These properties allow for an easy upscaling to the problem sizes common in the biomedical domain. A prototype implementation of our method is available in Java.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
Cold Spring Harbor Laboratory
en_US
dc.rights.uri
http://creativecommons.org/licenses/by-nc/4.0/
dc.title
Efficient graph-color compression with neighborhood-informed Bloom filters
en_US
dc.type
Working Paper
dc.rights.license
Creative Commons Attribution-NonCommercial 4.0 International
ethz.journal.title
bioRxiv
ethz.pages.start
239806
en_US
ethz.size
11 p.
en_US
ethz.grant
Scalable Genome Graph Data Structures for Metagenomics and Genome Annotation
en_US
ethz.publication.place
Cold Spring Harbor, NY
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09568 - Rätsch, Gunnar / Rätsch, Gunnar
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09568 - Rätsch, Gunnar / Rätsch, Gunnar
ethz.grant.agreementno
167331
ethz.grant.fundername
SNF
ethz.grant.funderDoi
10.13039/501100001711
ethz.grant.program
ethz.date.deposited
2018-01-29T12:51:40Z
ethz.source
BATCH
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2018-01-30T10:25:40Z
ethz.rosetta.lastUpdated
2018-11-06T07:39:07Z
ethz.rosetta.exportRequired
true
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Efficient%20graph-color%20compression%20with%20neighborhood-informed%20Bloom%20filters&rft.jtitle=bioRxiv&rft.date=2017&rft.spage=239806&rft.au=Schilken,%20Ingo&Mustafa,%20Harun&R%C3%A4tsch,%20Gunnar&Eickhoff,%20Carsten&Kahles,%20Andre&rft.genre=preprint&
 Search via SFX

Files in this item

Thumbnail

Publication type

Show simple item record