ABSTRACT
We introduce fast-decodable indexing schemes for edit distance which can be used to speed up edit distance computations to near-linear time if one of the strings is indexed by an indexing string I. In particular, for every length n and every ε >0, one can in near linear time construct a string I ∈ Σ′n with |Σ′| = Oε(1), such that, indexing any string S ∈ Σn, symbol-by-symbol, with I results in a string S′ ∈ Σ″n where Σ″ = Σ × Σ′ for which edit distance computations are easy, i.e., one can compute a (1+ε)-approximation of the edit distance between S′ and any other string in O(n (logn)) time.
Our indexing schemes can be used to improve the decoding complexity of state-of-the-art error correcting codes for insertions and deletions. In particular, they lead to near-linear time decoding algorithms for the insertion-deletion codes of [Haeupler, Shahrasbi; STOC ‘17] and faster decoding algorithms for list-decodable insertion-deletion codes of [Haeupler, Shahrasbi, Sudan; ICALP ‘18]. Interestingly, the latter codes are a crucial ingredient in the construction of fast-decodable indexing schemes.
- Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. 2015. Tight Hardness Results for LCS and Other Sequence Similarity Measures. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS). 59–78. Google ScholarDigital Library
- Amir Abboud, Thomas Dueholm Hansen, Virginia Vassilevska Williams, and Ryan Williams. 2016.Google Scholar
- Simulating branching programs with edit distance and friends: or: a polylog shaved is a lower bound made. In Proceedings of the Annual Symposium on Theory of Computing (STOC). 375–388. Google ScholarDigital Library
- Amir Abboud, Virginia Vassilevska Williams, and Oren Weimann. 2014. Consequences of Faster Alignment of Sequences. In Proceedings of the International Colloquium on Automata, Languages, and Programming (ICALP). 39–51.Google ScholarCross Ref
- Noga Alon, Jeff Edmonds, and Michael Luby. 1995. Linear time erasure codes with nearly optimal recovery. In Foundations of Computer Science, 1995. Proceedings., 36th Annual Symposium on. IEEE, 512–519. Google ScholarDigital Library
- Alexandr Andoni and Robert Krauthgamer. 2008.Google Scholar
- The Smoothed Complexity of Edit Distance. In Proceedings of the International Colloquium on Automata, Languages, and Programming (ICALP). 357–369. Google ScholarDigital Library
- Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. 2010.Google Scholar
- Polylogarithmic approximation for edit distance and the asymmetric query complexity. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 377–386.Google Scholar
- Alexandr Andoni and Krzysztof Onak. 2012.Google Scholar
- Approximating edit distance in near-linear time. SIAM J. Comput. 41, 6 (2012), 1635–1648.Google Scholar
- Arturs Backurs and Piotr Indyk. 2015.Google Scholar
- Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In Proceedings of the fortyseventh annual ACM symposium on Theory of computing. ACM, 51–58. Google ScholarDigital Library
- Ziv Bar-Yossef, TS Jayram, Robert Krauthgamer, and Ravi Kumar. 2004.Google Scholar
- Approximating edit distance efficiently. In Foundations of Computer Science, 2004. Proceedings. 45th Annual IEEE Symposium on. IEEE, 550–559.Google Scholar
- Tuğkan Batu, Funda Ergun, and Cenk Sahinalp. 2006. Oblivious string embeddings and edit distance approximations. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 792–801. Google ScholarDigital Library
- Mahdi Boroujeni, Soheil Ehsani, Mohammad Ghodsi, MohammadTaghi HajiAghayi, and Saeed Seddighin. 2018. Approximating Edit Distance in Truly Subquadratic Time: Quantum and MapReduce. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 1170–1189.Google ScholarCross Ref
- Karl Bringmann and Marvin Künnemann. 2015. Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS). 79–97. Google ScholarDigital Library
- Karl Bringmann and Marvin Künnemann. 2018. Multivariate Fine-Grained Complexity of Longest Common Subsequence. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 1216–1235. Google ScholarDigital Library
- Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucky, and Michael Saks. 2018. Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS). 979–990.Google ScholarCross Ref
- Kuan Cheng, Bernhard Haeupler, Xin Li, Amirbehshad Shahrasbi, and Ke Wu. 2019. Synchronization Strings: Efficient and Fast Deterministic Constructions over Small Alphabets. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 2185–2204. Google ScholarDigital Library
- Václav Chvátal, David A. Klarner, and Donald E. Knuth. 1972.Google Scholar
- Selected combinatorial research problems. Technical Report. Computer Science Department, Stanford University.Google Scholar
- Pawel Gawrychowski. 2012. Faster Algorithm for Computing the Edit Distance between SLP-Compressed Strings. In String Processing and Information Retrieval - 19th International Symposium, SPIRE 2012, Cartagena de Indias, Colombia, October 21-25, 2012. Proceedings. 229–236. Google ScholarDigital Library
- Shafi Goldwasser and Dhiraj Holden. 2017. The Complexity of Problems in P Given Correlated Instances. In 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, January 9-11, 2017, Berkeley, CA, USA. 13:1–13:19.Google Scholar
- Venkatesan Guruswami and Piotr Indyk. 2005. Linear-time encodable/decodable codes with near-optimal rate. IEEE Transactions on Information Theory 51, 10 (2005), 3393–3400. Google ScholarDigital Library
- Venkatesan Guruswami and Ray Li. 2016. Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes. In Information Theory (ISIT), 2016 IEEE International Symposium on. IEEE, 620–624.Google ScholarCross Ref
- Venkatesan Guruswami and Carol Wang. 2017. Deletion codes in the high-noise and high-rate regimes. IEEE Transactions on Information Theory 63, 4 (2017), 1961–1970. Google ScholarDigital Library
- Bernhard Haeupler and Amirbehshad Shahrasbi. 2017. Synchronization Strings: Codes for Insertions and Deletions Approaching the Singleton Bound. In Proceedings of the Annual Symposium on Theory of Computing (STOC). Google ScholarDigital Library
- Bernhard Haeupler and Amirbehshad Shahrasbi. 2018. Synchronization strings: explicit constructions, local decoding, and applications. In Proceedings of the Annual Symposium on Theory of Computing (STOC). 841–854. Google ScholarDigital Library
- Bernhard Haeupler, Amirbehshad Shahrasbi, and Madhu Sudan. 2018. Synchronization Strings: List Decoding for Insertions and Deletions. In 45th International Colloquium on Automata, Languages, and Programming (ICALP).Google Scholar
- Bernhard Haeupler, Amirbehshad Shahrasbi, and Ellen Vitercik. 2018. Synchronization Strings: Channel Simulations and Interactive Coding for Insertions and Deletions. In 45th International Colloquium on Automata, Languages, and Programming (ICALP). 75:1–75:14.Google Scholar
- Brett Hemenway, Noga Ron-Zewi, and Mary Wootters. 2017. Local List Recovery of High-rate Tensor Codes and Applications. Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS) (2017).Google ScholarCross Ref
- Daniel S. Hirschberg. 1977. Algorithms for the Longest Common Subsequence Problem. J. ACM 24, 4 (1977), 664–675. Google ScholarDigital Library
- James W Hunt and Thomas G Szymanski. 1977. A fast algorithm for computing longest common subsequences. Commun. ACM 20, 5 (1977), 350–353. Google ScholarDigital Library
- William Kuszmaul. 2019.Google Scholar
- Efficiently Approximating Edit Distance Between Pseudorandom Strings. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 1165–1180.Google Scholar
- Gad M Landau, Eugene W Myers, and Jeanette P Schmidt. 1998.Google Scholar
- Incremental string comparison. SIAM J. Comput. 27, 2 (1998), 557–582. Google ScholarDigital Library
- Vladimir Levenshtein. 1965. Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163 4 (1965), 845–848.Google Scholar
- William J Masek and Michael S Paterson. 1980. A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20, 1 (1980), 18–31.Google ScholarCross Ref
- Aviad Rubinstein. 2018. Approximating Edit Distance. https://theorydish.blog/ 2018/07/20/approximatingeditdistance/.Google Scholar
- Leonard J. Schulman and David Zuckerman. 1999. Asymptotically good codes correcting insertions, deletions, and transpositions. IEEE transactions on information theory 45, 7 (1999), 2552–2557. Google ScholarDigital Library
- Michael Sipser and Daniel A Spielman. 1994. Expander codes. In Foundations of Computer Science, 1994 Proceedings., 35th Annual Symposium on. IEEE, 566–576. Google ScholarDigital Library
- Daniel A Spielman. 1996. Linear-time encodable and decodable error-correcting codes. IEEE Transactions on Information Theory 42, 6 (1996), 1723–1731. Google ScholarCross Ref
Index Terms
- Near-linear time insertion-deletion codes and (1+ε)-approximating edit distance via indexing
Recommendations
Approximating Edit Distance Within Constant Factor in Truly Sub-quadratic Time
Edit distance is a measure of similarity of two strings based on the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. The edit distance can be computed exactly using a dynamic ...
Constant-factor approximation of near-linear edit distance in near-linear time
STOC 2020: Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of ComputingWe show that the edit distance between two strings of length n can be computed via a randomized algorithm within a factor of f(є) in n 1+є time as long as the edit distance is at least n 1−δ for some δ(є) > 0.
Approximating edit distance in near-linear time
STOC '09: Proceedings of the forty-first annual ACM symposium on Theory of computingWe show how to compute the edit distance between two strings of length n up to a factor of 2(O-tilde(sqrt(log n))) in n(1+o(1)) time. This is the first sub-polynomial approximation algorithm for this problem that runs in near-linear time, improving on ...
Comments