Abstract
A suffix array represents the suffixes of a string in sorted order. Being a simpler and more compact alternative to suffix trees, it is an important tool for full text indexing and other string processing tasks. We introduce the skew algorithm for suffix array construction over integer alphabets that can be implemented to run in linear time using integer sorting as its only nontrivial subroutine:
-
1.
recursively sort suffixes beginning at positions i mod 3 ≠ 0.
-
2.
sort the remaining suffixes using the information obtained in step one.
-
3.
merge the two sorted sequences obtained in steps one and two.
The algorithm is much simpler than previous linear time algorithms that are all based on the more complicated suffix tree data structure. Since sorting is a well studied problem, we obtain optimal algorithms for several other models of computation, e.g. external memory with parallel disks, cache oblivious, and parallel. The adaptations for BSP and EREW-PRAM are asymptotically faster than the best previously known algorithms.
Partially supported by the Future and Emerging Technologies programme of the EU under contract number IST-1999-14186(ALCOM-FT).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. The enhanced suffix array and its applications to genome analysis. In Proc. 2nd Workshop on Algorithms in Bioinformatics, volume 2452 of LNCS, pages 449–463. Springer, 2002.
M. I. Abouelhoda, E. Ohlebusch, and S. Kurtz. Optimal exact string matching based on suffix arrays. In Proc. 9th Symposium on String Processing and Information Retrieval, volume 2476 of LNCS, pages 31–43. Springer, 2002.
S. Alstrup, C. Gavoille, H. Kaplan, and T. Rauhe. Nearest common ancestors: A survey and a new distributed algorithm. In Proc. 14th Annual Symposium on Parallel Algorithms and Architectures, pages 258–264. ACM, 2002.
M. A. Bender and M. Farach-Colton. The LCA problem revisited. In Proc. 4th Latin American Symposium on Theoretical INformatics, volume 1776 of LNCS, pages 88–94. Springer, 2000.
S. Burkhardt and J. Kärkkäinen. Fast lightweight suffix array construction and checking. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Springer, June 2003. To appear.
M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, SRC (digital, Palo Alto), May 1994.
A. Chan and F. Dehne. A note on coarse grained parallel integer sorting. Parallel Processing Letters, 9(4):533–538, 1999.
R. Cole. Parallel merge sort. SIAM J. Comput., 17(4):770–785, 1988.
A. Crauser and P. Ferragina. Theoretical and experimental study on the construction of suffix arrays in external memory. Algorithmica, 32(1):1–35, 2002.
R. Dementiev and P. Sanders. Asynchronous parallel disk sorting. In Proc. 15th. Annual Symposium on Parallelism in Algorithms and Architectures. ACM, 2003. To appear.
M. Farach. Optimal suffix tree construction with large alphabets. In Proc. 38th Annual Symposium on Foundations of Computer Science, pages 137–143. IEEE, 1997.
M. Farach, P. Ferragina, and S. Muthukrishnan. Overcoming the memory bottleneck in suffix tree construction. In Proc. 39th Annual Symposium on Foundations of Computer Science, pages 174–183. IEEE, 1998.
M. Farach and S. Muthukrishnan. Optimal logarithmic time randomized suffix tree construction. In Proc. 23th International Conference on Automata, Languages and Programming, pages 550–561. IEEE, 1996.
M. Farach-Colton, P. Ferragina, and S. Muthukrishnan. On the sorting-complexity of suffix tree construction. J. ACM, 47(6):987–1011, 2000.
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. 40th Annual Symposium on Foundations of Computer Science, pages 285–298. IEEE, 1999.
N. Futamura, S. Aluru, and S. Kurtz. Parallel suffix sorting. In Proc. 9th International Conference on Advanced Computing and Communications, pages 76–81. Tata McGraw-Hill, 2001.
A. V. Gerbessiotis and C. J. Siniolakis. Merging on the BSP model. Parallel Computing, 27:809–822, 2001.
G. Gonnet, R. Baeza-Yates, and T. Snider. New indices for text: PAT trees and PAT arrays. In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms. Prentice-Hall, 1992.
M. T. Goodrich. Communication-efficient parallel sorting. SIAM J. Comput., 29(2):416–432, 1999.
R. Grossi and G. F. Italiano. Suffix trees and their applications in string algorithms. Rapporto di Ricerca CS-96-14, Università “Ca’ Foscari” di Venezia, Italy, 1996.
D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.
T. Hagerup and R. Raman. Waste makes haste: Tight bounds for loose parallel sorting. In Proc. 33rd Annual Symposium on Foundations of Computer Science, pages 628–637. IEEE, 1992.
T. Hagerup and C. Rüb. Optimal merging and sorting on the EREW-PRAM. Information Processing Letters, 33:181–185, 1989.
D. Harel and R. E. Tarjan. Fast algorithms for finding nearest common ancestors. SIAM J. Comput., 13:338–355, 1984.
J. Jájá. An Introduction to Parallel Algorithms. Addison Wesley, 1992.
J. Kärkkäinen. Suffix cactus: A cross between suffix tree and suffix array. In Z. Galil and E. Ukkonen, editors, Proc. 6th Annual Symposium on Combinatorial Pattern Matching, volume 937 of LNCS, pages 191–204. Springer, 1995.
T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proc. 12th Annual Symposium on Combinatorial Pattern Matching, volume 2089 of LNCS, pages 181–192. Springer, 2001.
D. K. Kim, J. S. Sim, H. Park, and K. Park. Linear-time construction of suffix arrays. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Springer, June 2003. To appear.
P. Ko and S. Aluru. Space efficient linear time construction of suffix arrays. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Springer, June 2003. To appear.
N. J. Larsson and K. Sadakane. Faster suffix sorting. Technical report LU-CSTR: 99-214, Dept. of Computer Science, Lund University, Sweden, 1999.
U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935–948, Oct. 1993.
E. M. McCreight. A space-economic suffix tree construction algorithm. J. ACM, 23(2):262–272, 1976.
M. H. Nodine and J. S. Vitter. Deterministic distribution sort in shared and distributed memory multiprocessors. In Proc. 5th Annual Symposium on Parallel Algorithms and Architectures, pages 120–129. ACM, 1993.
M. H. Nodine and J. S. Vitter. Greed sort: An optimal sorting algorithm for multiple disks. J. ACM, 42(4):919–933, 1995.
S. Rajasekaran and J. H. Reif. Optimal and sublogarithmic time randomized parallel sorting algorithms. SIAM J. Comput., 18(3):594–607, 1989.
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249–260, 1995.
L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 22(8):103–111, Aug. 1990.
J. S. Vitter and E. A. M. Shriver. Algorithms for parallel memory, I: Two level memories. Algorithmica, 12(2/3):110–147, 1994.
P. Weiner. Linear pattern matching algorithm. In Proc. 14th Symposium on Switching. and Automata Theory, pages 1–11. IEEE, 1973.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kärkkäinen, J., Sanders, P. (2003). Simple Linear Work Suffix Array Construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds) Automata, Languages and Programming. ICALP 2003. Lecture Notes in Computer Science, vol 2719. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45061-0_73
Download citation
DOI: https://doi.org/10.1007/3-540-45061-0_73
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40493-4
Online ISBN: 978-3-540-45061-0
eBook Packages: Springer Book Archive