Towards unified approximate pattern matching for hamming and L1 Distance


Loading...

Date

2018

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Computing the distance between a given pattern of length n and a text of length m is defined as calculating, for every m-substring of the text, the distance between the pattern and the sub- string. This naturally generalizes the standard notion of exact pattern matching to incorporate dissimilarity score. For both Hamming and L1 distance only relatively slow O(n√m) solutions are known for this generalization. This can be overcome by relaxing the question. For Hamming distance, the usual relaxation is to consider the k-bounded variant, where distances exceeding k are reported as ∞, while for L1 distance asking for a(1±ε)-approximation seems more natural. For k -bounded Hamming distance, Amir et al. [J. Algorithms 2004] showed an O(n√k) time algorithm, and Clifford et al. [SODA 2016] designed an O((m+k2) ·n/m) time solution. We provide a smooth time trade-off between these bounds by exhibiting an O((m+k√m) ·n/m) time algorithm. We complement the trade-off with a matching conditional lower bound, show-ing that a significantly faster combinatorial algorithm is not possible, unless the combinatorial matrix multiplication conjecture fails. We also exhibit a series of reductions that together allow us to achieve essentially the same complexity for k-bounded L1 distance. Finally, for (1±ε)- approximate L1 distance, the running time of the best previously known algorithm of Lipsky and Porat [Algorithmica 2011] was O(ε−2n). We improve this to O(ε−1n), thus essentially matching the complexity of the best known algorithm for (1±ε) -approximate Hamming distance.

Publication status

published

Editor

Book title

Leibniz International Proceedings in Informatics (LIPIcs)

Journal / series

Volume

107

Pages / Article No.

62

Publisher

Schloss Dagstuhl – Leibniz-Zentrum für Informatik

Event

45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

approximate pattern matching; conditional lower bounds; L1 distance; Hamming distance

Organisational unit

03340 - Widmayer, Peter (emeritus) / Widmayer, Peter (emeritus) check_circle

Notes

Funding

Related publications and datasets