Show simple item record

dc.contributor.author
Joudaki, Amir
dc.contributor.author
Meterez, Alexandru
dc.contributor.author
Mustafa, Harun
dc.contributor.author
Groot Koerkamp, Ragnar
dc.contributor.author
Kahles, André
dc.contributor.author
Rätsch, Gunnar
dc.date.accessioned
2024-01-31T08:59:13Z
dc.date.available
2024-01-31T08:59:13Z
dc.date.issued
2023-07
dc.identifier.issn
1088-9051
dc.identifier.issn
1549-5469
dc.identifier.other
10.1101/gr.277659.123
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/656730
dc.identifier.doi
10.3929/ethz-b-000629439
dc.description.abstract
Sequence-to-graph alignment is crucial for applications such as variant genotyping, read error correction, and genome assembly. We propose a novel seeding approach that relies on long inexact matches rather than short exact matches, and show that it yields a better time-accuracy trade-off in settings with up to a [Formula: see text] mutation rate. We use sketches of a subset of graph nodes, which are more robust to indels, and store them in a k-nearest neighbor index to avoid the curse of dimensionality. Our approach contrasts with existing methods and highlights the important role that sketching into vector space can play in bioinformatics applications. We show that our method scales to graphs with 1 billion nodes and has quasi-logarithmic query time for queries with an edit distance of [Formula: see text] For such queries, longer sketch-based seeds yield a [Formula: see text] increase in recall compared with exact seeds. Our approach can be incorporated into other aligners, providing a novel direction for sequence-to-graph alignment.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
Cold Spring Harbor Laboratory Press
en_US
dc.rights.uri
http://creativecommons.org/licenses/by-nc/4.0/
dc.title
Aligning distant sequences to graphs using long seed sketches
en_US
dc.type
Journal Article
dc.rights.license
Creative Commons Attribution-NonCommercial 4.0 International
dc.date.published
2023-04-18
ethz.journal.title
Genome Research
ethz.journal.volume
33
en_US
ethz.journal.issue
7
en_US
ethz.journal.abbreviated
Genome res.
ethz.pages.start
1208
en_US
ethz.pages.end
1217
en_US
ethz.version.deposit
publishedVersion
en_US
ethz.grant
Dynamic reference indexes for selective sequencing with application to diagnostics
en_US
ethz.grant
Scalable Genome Graph Data Structures for Metagenomics and Genome Annotation
en_US
ethz.grant
A unifying theoretical framework for optimal sequence sketching: Towards fast, accurate, and interpretable computation on biological sequences
en_US
ethz.identifier.wos
ethz.identifier.scopus
ethz.publication.place
Woodbury, NY
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09568 - Rätsch, Gunnar / Rätsch, Gunnar
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09568 - Rätsch, Gunnar / Rätsch, Gunnar
ethz.grant.agreementno
200550
ethz.grant.agreementno
167331
ethz.grant.agreementno
ETH-17 21-1
ethz.grant.fundername
SNF
ethz.grant.fundername
SNF
ethz.grant.fundername
ETHZ
ethz.grant.funderDoi
10.13039/501100001711
ethz.grant.funderDoi
10.13039/501100001711
ethz.grant.funderDoi
10.13039/501100003006
ethz.grant.program
Projekte MINT
ethz.grant.program
NFP 75: Gesuch
ethz.grant.program
ETH Grants
ethz.relation.isNewVersionOf
10.3929/ethz-b-000595157
ethz.date.deposited
2023-09-03T03:47:08Z
ethz.source
BATCH
ethz.source
SCOPUS
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2024-01-31T08:59:24Z
ethz.rosetta.lastUpdated
2024-02-03T09:09:04Z
ethz.rosetta.versionExported
true
dc.identifier.olduri
http://hdl.handle.net/20.500.11850/656242
dc.identifier.olduri
http://hdl.handle.net/20.500.11850/629439
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Aligning%20distant%20sequences%20to%20graphs%20using%20long%20seed%20sketches&rft.jtitle=Genome%20Research&rft.date=2023-07&rft.volume=33&rft.issue=7&rft.spage=1208&rft.epage=1217&rft.issn=1088-9051&1549-5469&rft.au=Joudaki,%20Amir&Meterez,%20Alexandru&Mustafa,%20Harun&Groot%20Koerkamp,%20Ragnar&Kahles,%20Andr%C3%A9&rft.genre=article&rft_id=info:doi/10.1101/gr.277659.123&
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record