Abstract
This article presents an O(n)-time algorithm called SACA-K for sorting the suffixes of an input string T[0, n-1] over an alphabet A[0, K-1]. The problem of sorting the suffixes of T is also known as constructing the suffix array (SA) for T. The theoretical memory usage of SACA-K is n log K + n log n + K log n bits. Moreover, we also have a practical implementation for SACA-K that uses n bytes + (n + 256) words and is suitable for strings over any alphabet up to full ASCII, where a word is log n bits. In our experiment, SACA-K outperforms SA-IS that was previously the most time- and space-efficient linear-time SA construction algorithm (SACA). SACA-K is around 33% faster and uses a smaller deterministic workspace of K words, where the workspace is the space needed beyond the input string and the output SA. Given K=O(1), SACA-K runs in linear time and O(1) workspace. To the best of our knowledge, such a result is the first reported in the literature with a practical source code publicly available.
- Burkhardt, S. and Kärkkäinen, J. 2003. Fast lightweight suffix array construction and checking. In Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 2676, Spriger Verlag, Berlin Heidelberg, 55--69. Google ScholarDigital Library
- Dementiev, R., Kärkkäinen, J., Mehnert, J., and Sanders, P. 2008. Better external memory suffix array construction. ACM J. Exp. Algor. 12. Google ScholarDigital Library
- Ferragina, P., Gagie, T., and Manzini, G. 2012. Lightweight data indexing and compression in external memory. Algorithmica 63, 3, 707--730. Google ScholarDigital Library
- Fischer, J. 2011. Inducing the LCP-array. In Algorithms and Data Structures, Lecture Notes in Computer Science, vol. 6844, Spriger Verlag, Berlin Heidelberg, 374--385. Google ScholarDigital Library
- Franceschini, G. and Muthukrishnan, S. 2007. In-place suffix sorting. In Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 4596, Spriger Verlag, Berlin Heidelberg, 533--545. Google ScholarDigital Library
- Hon, W. K., Sadakane, K., and Sung, W. K. 2003. Breaking a time-and-space barrier for constructing full-text indices. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS'03). 251--260. Google ScholarDigital Library
- Itoh, H. and Tanaka, H. 1999. An efficient method for in memory construction of suffix arrays. In Proceedings of the String Processing and Information Retrieval Symposium and International Workshop on Group-ware (SPIRE'99). 81--88. Google ScholarDigital Library
- Kärkkäinen, J., Sanders, P., and Burkhardt, S. 2006. Linear work suffix array construction. JACM 53, 6, 918--936. Google ScholarDigital Library
- Kim, D. K., Jo, J., Park, H., and Park, K. 2005. Constructing suffix arrays in linear time. J. Disc. Algor. 3, 2--4, 126--142.Google ScholarCross Ref
- Ko, P. and Aluru, S. 2005. Space-efficient linear time construction of suffix arrays. J. Disc. Algor. 3, 2--4, 143--156.Google ScholarCross Ref
- Larsson, N. J. and Sadakane, K. 1999. faster suffix sorting. Tech. rep. LU-CS-TR:99-214, LUNDFD6/(NFCS-3140)/1--20/(1999). Department of Computer Science, Lund University, Sweden.Google Scholar
- Manber, U. and Myers, G. 1993. Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22, 5, 935--948. Google ScholarDigital Library
- Maniscalco, M. A. and Puglisi, S. J. 2006. Faster lightweight suffix array construction. In Proceedings of the 17th Australasian Workshop on Combinatorial Algorithms. 16--29.Google Scholar
- Manzini, G. and Ferragina, P. 2004. Engineering a lightweight suffix array construction algorithm. Algorithmica 40, 1, 33--50. Google ScholarDigital Library
- Nong, G., Zhang, S., and Chan, W. H. 2011. Two efficient algorithms for linear time suffix array construction. IEEE Trans. Comput. 60, 10, 1471--1484. Google ScholarDigital Library
- Okanohara, D. and Sadakane, K. 2009. A linear-time burrows-wheeler transform using induced sorting. In Proceedings of the 16th International Symposium on string Processing and Information Retrieval (SPIRE'09). Lecture Notes in Computer Science, vol. 5721, Spriger Verlag, Berlin Heidelberg, 90--101. Google ScholarDigital Library
- Puglisi, S. J., Smyth, W. F., and Turpin, A. H. 2007. A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39, 2, 1--31. Google ScholarDigital Library
- Sadakane, K. 1998. A fast algorithm for making suffix arrays and for Burrows-Wheeler transformation. In Proceedings of the Data Comprission Conference (DCC'98). 129--38. Google ScholarDigital Library
- Schürmann, K. B. and Stoye, J. 2005. An incomplex algorithm for fast suffix array construction. In Proceedings of the 7th Workshop on Algorithm Engineering and Experiments and the 2nd Workshop on Analytic Algorithms and Combinations (ALENEX/ANALCO'05). 77--85.Google Scholar
Index Terms
- Practical linear-time O(1)-workspace suffix sorting for constant alphabets
Recommendations
Induced Sorting Suffixes in External Memory
We present in this article an external memory algorithm, called disk SA-IS (DSA-IS), to exactly emulate the induced sorting algorithm SA-IS previously proposed for sorting suffixes in RAM. DSA-IS is a new disk-friendly method for sequentially retrieving ...
Optimal suffix sorting and LCP array construction for constant alphabets
We show how the longest common prefix (LCP) array can be generated as a by-product of the suffix array construction algorithm SACA-K (Nong, 2013). Our algorithm builds on Fischer's proposal (Fischer, WADS'11), and also runs in linear time, but uses only ...
Optimal in-place suffix sorting
AbstractThe suffix array is a fundamental data structure for many applications that involve string searching and data compression. We obtain the first in-place suffix array construction algorithms that are optimal both in time and space for (...
Comments