Open access
Date
2022-09-23Type
- Working Paper
ETH Bibliography
yes
Altmetrics
Abstract
Motivation Sequence alignment has been a core problem in computational biology for the last half-century. It is an open problem whether exact pairwise alignment is possible in linear time for related sequences (Medvedev, 2022b).Methods We solve exact global pairwise alignment with respect to edit distance by using the A* shortest path algorithm on the edit graph. In order to efficiently align long sequences with high error rate, we extend the seed heuristic for A* (Ivanov et al., 2022) with match chaining, inexact matches, and the novel match pruning optimization. We prove the correctness of our algorithm and provide an efficient implementation in A*PA.Results We evaluate A*PA on synthetic data (random sequences of length n with uniform mutations with error rate e) and on real long ONT reads of human data. On the synthetic data with e=5% and n<=107 bp, A*PA exhibits a near-linear empirical runtime scaling of n1.08 and achieves >250 speedup compared to the leading exact aligners EDLIB and BIWFA. Even for a high error rate of e=15%, the empirical scaling is n1.28 for n<=107 bp. On two real datasets, A*PA is the fastest aligner for 58% of the alignments when the reads contain only sequencing errors, and for 17% of the alignments when the reads also include biological variation.Availability github.com/RagnarGrootKoerkamp/astar-pairwise-alignerContact ragnar.grootkoerkampatinf.ethz.ch, peshoatinf.ethz.chCompeting Interest StatementThe authors have declared no competing interest. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000595180Publication status
publishedExternal links
Journal / series
bioRxivPages / Article No.
Publisher
Cold Spring Harbor LaboratoryEdition / version
v2Organisational unit
09568 - Rätsch, Gunnar / Rätsch, Gunnar
Funding
ETH-17 21-1 - A unifying theoretical framework for optimal sequence sketching: Towards fast, accurate, and interpretable computation on biological sequences (ETHZ)
Related publications and datasets
Is previous version of: https://doi.org/10.3929/ethz-b-000666109
More
Show all metadata
ETH Bibliography
yes
Altmetrics