skip to main content
10.1145/3313276.3316371acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Open Access

Near-linear time insertion-deletion codes and (1+ε)-approximating edit distance via indexing

Published:23 June 2019Publication History

ABSTRACT

We introduce fast-decodable indexing schemes for edit distance which can be used to speed up edit distance computations to near-linear time if one of the strings is indexed by an indexing string I. In particular, for every length n and every ε >0, one can in near linear time construct a string I ∈ Σ′n with |Σ′| = Oε(1), such that, indexing any string S ∈ Σn, symbol-by-symbol, with I results in a string S′ ∈ Σ″n where Σ″ = Σ × Σ′ for which edit distance computations are easy, i.e., one can compute a (1+ε)-approximation of the edit distance between S′ and any other string in O(n (logn)) time.

Our indexing schemes can be used to improve the decoding complexity of state-of-the-art error correcting codes for insertions and deletions. In particular, they lead to near-linear time decoding algorithms for the insertion-deletion codes of [Haeupler, Shahrasbi; STOC ‘17] and faster decoding algorithms for list-decodable insertion-deletion codes of [Haeupler, Shahrasbi, Sudan; ICALP ‘18]. Interestingly, the latter codes are a crucial ingredient in the construction of fast-decodable indexing schemes.

References

  1. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. 2015. Tight Hardness Results for LCS and Other Sequence Similarity Measures. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS). 59–78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amir Abboud, Thomas Dueholm Hansen, Virginia Vassilevska Williams, and Ryan Williams. 2016.Google ScholarGoogle Scholar
  3. Simulating branching programs with edit distance and friends: or: a polylog shaved is a lower bound made. In Proceedings of the Annual Symposium on Theory of Computing (STOC). 375–388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Amir Abboud, Virginia Vassilevska Williams, and Oren Weimann. 2014. Consequences of Faster Alignment of Sequences. In Proceedings of the International Colloquium on Automata, Languages, and Programming (ICALP). 39–51.Google ScholarGoogle ScholarCross RefCross Ref
  5. Noga Alon, Jeff Edmonds, and Michael Luby. 1995. Linear time erasure codes with nearly optimal recovery. In Foundations of Computer Science, 1995. Proceedings., 36th Annual Symposium on. IEEE, 512–519. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alexandr Andoni and Robert Krauthgamer. 2008.Google ScholarGoogle Scholar
  7. The Smoothed Complexity of Edit Distance. In Proceedings of the International Colloquium on Automata, Languages, and Programming (ICALP). 357–369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. 2010.Google ScholarGoogle Scholar
  9. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 377–386.Google ScholarGoogle Scholar
  10. Alexandr Andoni and Krzysztof Onak. 2012.Google ScholarGoogle Scholar
  11. Approximating edit distance in near-linear time. SIAM J. Comput. 41, 6 (2012), 1635–1648.Google ScholarGoogle Scholar
  12. Arturs Backurs and Piotr Indyk. 2015.Google ScholarGoogle Scholar
  13. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In Proceedings of the fortyseventh annual ACM symposium on Theory of computing. ACM, 51–58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ziv Bar-Yossef, TS Jayram, Robert Krauthgamer, and Ravi Kumar. 2004.Google ScholarGoogle Scholar
  15. Approximating edit distance efficiently. In Foundations of Computer Science, 2004. Proceedings. 45th Annual IEEE Symposium on. IEEE, 550–559.Google ScholarGoogle Scholar
  16. Tuğkan Batu, Funda Ergun, and Cenk Sahinalp. 2006. Oblivious string embeddings and edit distance approximations. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 792–801. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mahdi Boroujeni, Soheil Ehsani, Mohammad Ghodsi, MohammadTaghi HajiAghayi, and Saeed Seddighin. 2018. Approximating Edit Distance in Truly Subquadratic Time: Quantum and MapReduce. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 1170–1189.Google ScholarGoogle ScholarCross RefCross Ref
  18. Karl Bringmann and Marvin Künnemann. 2015. Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS). 79–97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Karl Bringmann and Marvin Künnemann. 2018. Multivariate Fine-Grained Complexity of Longest Common Subsequence. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 1216–1235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucky, and Michael Saks. 2018. Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS). 979–990.Google ScholarGoogle ScholarCross RefCross Ref
  21. Kuan Cheng, Bernhard Haeupler, Xin Li, Amirbehshad Shahrasbi, and Ke Wu. 2019. Synchronization Strings: Efficient and Fast Deterministic Constructions over Small Alphabets. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 2185–2204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Václav Chvátal, David A. Klarner, and Donald E. Knuth. 1972.Google ScholarGoogle Scholar
  23. Selected combinatorial research problems. Technical Report. Computer Science Department, Stanford University.Google ScholarGoogle Scholar
  24. Pawel Gawrychowski. 2012. Faster Algorithm for Computing the Edit Distance between SLP-Compressed Strings. In String Processing and Information Retrieval - 19th International Symposium, SPIRE 2012, Cartagena de Indias, Colombia, October 21-25, 2012. Proceedings. 229–236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Shafi Goldwasser and Dhiraj Holden. 2017. The Complexity of Problems in P Given Correlated Instances. In 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, January 9-11, 2017, Berkeley, CA, USA. 13:1–13:19.Google ScholarGoogle Scholar
  26. Venkatesan Guruswami and Piotr Indyk. 2005. Linear-time encodable/decodable codes with near-optimal rate. IEEE Transactions on Information Theory 51, 10 (2005), 3393–3400. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Venkatesan Guruswami and Ray Li. 2016. Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes. In Information Theory (ISIT), 2016 IEEE International Symposium on. IEEE, 620–624.Google ScholarGoogle ScholarCross RefCross Ref
  28. Venkatesan Guruswami and Carol Wang. 2017. Deletion codes in the high-noise and high-rate regimes. IEEE Transactions on Information Theory 63, 4 (2017), 1961–1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Bernhard Haeupler and Amirbehshad Shahrasbi. 2017. Synchronization Strings: Codes for Insertions and Deletions Approaching the Singleton Bound. In Proceedings of the Annual Symposium on Theory of Computing (STOC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Bernhard Haeupler and Amirbehshad Shahrasbi. 2018. Synchronization strings: explicit constructions, local decoding, and applications. In Proceedings of the Annual Symposium on Theory of Computing (STOC). 841–854. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Bernhard Haeupler, Amirbehshad Shahrasbi, and Madhu Sudan. 2018. Synchronization Strings: List Decoding for Insertions and Deletions. In 45th International Colloquium on Automata, Languages, and Programming (ICALP).Google ScholarGoogle Scholar
  32. Bernhard Haeupler, Amirbehshad Shahrasbi, and Ellen Vitercik. 2018. Synchronization Strings: Channel Simulations and Interactive Coding for Insertions and Deletions. In 45th International Colloquium on Automata, Languages, and Programming (ICALP). 75:1–75:14.Google ScholarGoogle Scholar
  33. Brett Hemenway, Noga Ron-Zewi, and Mary Wootters. 2017. Local List Recovery of High-rate Tensor Codes and Applications. Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS) (2017).Google ScholarGoogle ScholarCross RefCross Ref
  34. Daniel S. Hirschberg. 1977. Algorithms for the Longest Common Subsequence Problem. J. ACM 24, 4 (1977), 664–675. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. James W Hunt and Thomas G Szymanski. 1977. A fast algorithm for computing longest common subsequences. Commun. ACM 20, 5 (1977), 350–353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. William Kuszmaul. 2019.Google ScholarGoogle Scholar
  37. Efficiently Approximating Edit Distance Between Pseudorandom Strings. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 1165–1180.Google ScholarGoogle Scholar
  38. Gad M Landau, Eugene W Myers, and Jeanette P Schmidt. 1998.Google ScholarGoogle Scholar
  39. Incremental string comparison. SIAM J. Comput. 27, 2 (1998), 557–582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Vladimir Levenshtein. 1965. Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163 4 (1965), 845–848.Google ScholarGoogle Scholar
  41. William J Masek and Michael S Paterson. 1980. A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20, 1 (1980), 18–31.Google ScholarGoogle ScholarCross RefCross Ref
  42. Aviad Rubinstein. 2018. Approximating Edit Distance. https://theorydish.blog/ 2018/07/20/approximatingeditdistance/.Google ScholarGoogle Scholar
  43. Leonard J. Schulman and David Zuckerman. 1999. Asymptotically good codes correcting insertions, deletions, and transpositions. IEEE transactions on information theory 45, 7 (1999), 2552–2557. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Michael Sipser and Daniel A Spielman. 1994. Expander codes. In Foundations of Computer Science, 1994 Proceedings., 35th Annual Symposium on. IEEE, 566–576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Daniel A Spielman. 1996. Linear-time encodable and decodable error-correcting codes. IEEE Transactions on Information Theory 42, 6 (1996), 1723–1731. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Near-linear time insertion-deletion codes and (1+ε)-approximating edit distance via indexing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        STOC 2019: Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing
        June 2019
        1258 pages
        ISBN:9781450367059
        DOI:10.1145/3313276

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 June 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,469of4,586submissions,32%

        Upcoming Conference

        STOC '24
        56th Annual ACM Symposium on Theory of Computing (STOC 2024)
        June 24 - 28, 2024
        Vancouver , BC , Canada

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader