Towards unified approximate pattern matching for hamming and L1 Distance
OPEN ACCESS
Loading...
Author / Producer
Date
2018
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Computing the distance between a given pattern of length n and a text of length m is defined as calculating, for every m-substring of the text, the distance between the pattern and the sub-
string. This naturally generalizes the standard notion of exact pattern matching to incorporate
dissimilarity score. For both Hamming and L1 distance only relatively slow O(n√m) solutions
are known for this generalization. This can be overcome by relaxing the question. For Hamming
distance, the usual relaxation is to consider the k-bounded variant, where distances exceeding k are reported as ∞, while for L1 distance asking for a(1±ε)-approximation seems more natural. For k
-bounded Hamming distance, Amir et al. [J. Algorithms 2004] showed an O(n√k) time algorithm, and Clifford et al. [SODA 2016] designed an O((m+k2) ·n/m) time solution. We provide a smooth time trade-off between these bounds by exhibiting an O((m+k√m) ·n/m) time algorithm. We complement the trade-off with a matching conditional lower bound, show-ing that a significantly faster combinatorial algorithm is not possible, unless the combinatorial matrix multiplication conjecture fails. We also exhibit a series of reductions that together allow us to achieve essentially the same complexity for k-bounded L1 distance. Finally, for (1±ε)- approximate L1 distance, the running time of the best previously known algorithm of Lipsky and Porat [Algorithmica 2011] was O(ε−2n). We improve this to O(ε−1n), thus essentially matching the complexity of the best known algorithm for
(1±ε) -approximate Hamming distance.
Permanent link
Publication status
published
External links
Editor
Book title
Leibniz International Proceedings in Informatics (LIPIcs)
Journal / series
Volume
107
Pages / Article No.
62
Publisher
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Event
45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
approximate pattern matching; conditional lower bounds; L1 distance; Hamming distance
Organisational unit
03340 - Widmayer, Peter (emeritus) / Widmayer, Peter (emeritus)