skip to main content
research-article

Practical linear-time O(1)-workspace suffix sorting for constant alphabets

Published:05 August 2013Publication History
Skip Abstract Section

Abstract

This article presents an O(n)-time algorithm called SACA-K for sorting the suffixes of an input string T[0, n-1] over an alphabet A[0, K-1]. The problem of sorting the suffixes of T is also known as constructing the suffix array (SA) for T. The theoretical memory usage of SACA-K is n log K + n log n + K log n bits. Moreover, we also have a practical implementation for SACA-K that uses n bytes + (n + 256) words and is suitable for strings over any alphabet up to full ASCII, where a word is log n bits. In our experiment, SACA-K outperforms SA-IS that was previously the most time- and space-efficient linear-time SA construction algorithm (SACA). SACA-K is around 33% faster and uses a smaller deterministic workspace of K words, where the workspace is the space needed beyond the input string and the output SA. Given K=O(1), SACA-K runs in linear time and O(1) workspace. To the best of our knowledge, such a result is the first reported in the literature with a practical source code publicly available.

References

  1. Burkhardt, S. and Kärkkäinen, J. 2003. Fast lightweight suffix array construction and checking. In Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 2676, Spriger Verlag, Berlin Heidelberg, 55--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Dementiev, R., Kärkkäinen, J., Mehnert, J., and Sanders, P. 2008. Better external memory suffix array construction. ACM J. Exp. Algor. 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ferragina, P., Gagie, T., and Manzini, G. 2012. Lightweight data indexing and compression in external memory. Algorithmica 63, 3, 707--730. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Fischer, J. 2011. Inducing the LCP-array. In Algorithms and Data Structures, Lecture Notes in Computer Science, vol. 6844, Spriger Verlag, Berlin Heidelberg, 374--385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Franceschini, G. and Muthukrishnan, S. 2007. In-place suffix sorting. In Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 4596, Spriger Verlag, Berlin Heidelberg, 533--545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hon, W. K., Sadakane, K., and Sung, W. K. 2003. Breaking a time-and-space barrier for constructing full-text indices. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS'03). 251--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Itoh, H. and Tanaka, H. 1999. An efficient method for in memory construction of suffix arrays. In Proceedings of the String Processing and Information Retrieval Symposium and International Workshop on Group-ware (SPIRE'99). 81--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kärkkäinen, J., Sanders, P., and Burkhardt, S. 2006. Linear work suffix array construction. JACM 53, 6, 918--936. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kim, D. K., Jo, J., Park, H., and Park, K. 2005. Constructing suffix arrays in linear time. J. Disc. Algor. 3, 2--4, 126--142.Google ScholarGoogle ScholarCross RefCross Ref
  10. Ko, P. and Aluru, S. 2005. Space-efficient linear time construction of suffix arrays. J. Disc. Algor. 3, 2--4, 143--156.Google ScholarGoogle ScholarCross RefCross Ref
  11. Larsson, N. J. and Sadakane, K. 1999. faster suffix sorting. Tech. rep. LU-CS-TR:99-214, LUNDFD6/(NFCS-3140)/1--20/(1999). Department of Computer Science, Lund University, Sweden.Google ScholarGoogle Scholar
  12. Manber, U. and Myers, G. 1993. Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22, 5, 935--948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Maniscalco, M. A. and Puglisi, S. J. 2006. Faster lightweight suffix array construction. In Proceedings of the 17th Australasian Workshop on Combinatorial Algorithms. 16--29.Google ScholarGoogle Scholar
  14. Manzini, G. and Ferragina, P. 2004. Engineering a lightweight suffix array construction algorithm. Algorithmica 40, 1, 33--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nong, G., Zhang, S., and Chan, W. H. 2011. Two efficient algorithms for linear time suffix array construction. IEEE Trans. Comput. 60, 10, 1471--1484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Okanohara, D. and Sadakane, K. 2009. A linear-time burrows-wheeler transform using induced sorting. In Proceedings of the 16th International Symposium on string Processing and Information Retrieval (SPIRE'09). Lecture Notes in Computer Science, vol. 5721, Spriger Verlag, Berlin Heidelberg, 90--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Puglisi, S. J., Smyth, W. F., and Turpin, A. H. 2007. A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39, 2, 1--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sadakane, K. 1998. A fast algorithm for making suffix arrays and for Burrows-Wheeler transformation. In Proceedings of the Data Comprission Conference (DCC'98). 129--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Schürmann, K. B. and Stoye, J. 2005. An incomplex algorithm for fast suffix array construction. In Proceedings of the 7th Workshop on Algorithm Engineering and Experiments and the 2nd Workshop on Analytic Algorithms and Combinations (ALENEX/ANALCO'05). 77--85.Google ScholarGoogle Scholar

Index Terms

  1. Practical linear-time O(1)-workspace suffix sorting for constant alphabets

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Information Systems
          ACM Transactions on Information Systems  Volume 31, Issue 3
          July 2013
          202 pages
          ISSN:1046-8188
          EISSN:1558-2868
          DOI:10.1145/2493175
          Issue’s Table of Contents

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 August 2013
          • Accepted: 1 March 2013
          • Revised: 1 January 2013
          • Received: 1 June 2012
          Published in tois Volume 31, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader