- Conference Paper
Rights / licenseCreative Commons Attribution 3.0 Unported
Computing the distance between a given pattern of length n and a text of length m is defined as calculating, for every m-substring of the text, the distance between the pattern and the sub- string. This naturally generalizes the standard notion of exact pattern matching to incorporate dissimilarity score. For both Hamming and L1 distance only relatively slow O(n√m) solutions are known for this generalization. This can be overcome by relaxing the question. For Hamming distance, the usual relaxation is to consider the k-bounded variant, where distances exceeding k are reported as ∞, while for L1 distance asking for a(1±ε)-approximation seems more natural. For k -bounded Hamming distance, Amir et al. [J. Algorithms 2004] showed an O(n√k) time algorithm, and Clifford et al. [SODA 2016] designed an O((m+k2) ·n/m) time solution. We provide a smooth time trade-off between these bounds by exhibiting an O((m+k√m) ·n/m) time algorithm. We complement the trade-off with a matching conditional lower bound, show-ing that a significantly faster combinatorial algorithm is not possible, unless the combinatorial matrix multiplication conjecture fails. We also exhibit a series of reductions that together allow us to achieve essentially the same complexity for k-bounded L1 distance. Finally, for (1±ε)- approximate L1 distance, the running time of the best previously known algorithm of Lipsky and Porat [Algorithmica 2011] was O(ε−2n). We improve this to O(ε−1n), thus essentially matching the complexity of the best known algorithm for (1±ε) -approximate Hamming distance Show more
Book titleLeibniz International Proceedings in Informatics (LIPIcs)
Pages / Article No.
PublisherSchloss Dagstuhl, Leibniz-Zentrum für Informatik
Subjectapproximate pattern matching; conditional lower bounds; L1 distance; Hamming distance
MoreShow all metadata