Skip to main content

Simple Linear Work Suffix Array Construction

  • Conference paper
  • First Online:
Automata, Languages and Programming (ICALP 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2719))

Included in the following conference series:

Abstract

A suffix array represents the suffixes of a string in sorted order. Being a simpler and more compact alternative to suffix trees, it is an important tool for full text indexing and other string processing tasks. We introduce the skew algorithm for suffix array construction over integer alphabets that can be implemented to run in linear time using integer sorting as its only nontrivial subroutine:

  1. 1.

    recursively sort suffixes beginning at positions i mod 3 ≠ 0.

  2. 2.

    sort the remaining suffixes using the information obtained in step one.

  3. 3.

    merge the two sorted sequences obtained in steps one and two.

The algorithm is much simpler than previous linear time algorithms that are all based on the more complicated suffix tree data structure. Since sorting is a well studied problem, we obtain optimal algorithms for several other models of computation, e.g. external memory with parallel disks, cache oblivious, and parallel. The adaptations for BSP and EREW-PRAM are asymptotically faster than the best previously known algorithms.

Partially supported by the Future and Emerging Technologies programme of the EU under contract number IST-1999-14186(ALCOM-FT).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. The enhanced suffix array and its applications to genome analysis. In Proc. 2nd Workshop on Algorithms in Bioinformatics, volume 2452 of LNCS, pages 449–463. Springer, 2002.

    Chapter  Google Scholar 

  2. M. I. Abouelhoda, E. Ohlebusch, and S. Kurtz. Optimal exact string matching based on suffix arrays. In Proc. 9th Symposium on String Processing and Information Retrieval, volume 2476 of LNCS, pages 31–43. Springer, 2002.

    Chapter  Google Scholar 

  3. S. Alstrup, C. Gavoille, H. Kaplan, and T. Rauhe. Nearest common ancestors: A survey and a new distributed algorithm. In Proc. 14th Annual Symposium on Parallel Algorithms and Architectures, pages 258–264. ACM, 2002.

    Google Scholar 

  4. M. A. Bender and M. Farach-Colton. The LCA problem revisited. In Proc. 4th Latin American Symposium on Theoretical INformatics, volume 1776 of LNCS, pages 88–94. Springer, 2000.

    Google Scholar 

  5. S. Burkhardt and J. Kärkkäinen. Fast lightweight suffix array construction and checking. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Springer, June 2003. To appear.

    Google Scholar 

  6. M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, SRC (digital, Palo Alto), May 1994.

    Google Scholar 

  7. A. Chan and F. Dehne. A note on coarse grained parallel integer sorting. Parallel Processing Letters, 9(4):533–538, 1999.

    Article  Google Scholar 

  8. R. Cole. Parallel merge sort. SIAM J. Comput., 17(4):770–785, 1988.

    Article  MATH  MathSciNet  Google Scholar 

  9. A. Crauser and P. Ferragina. Theoretical and experimental study on the construction of suffix arrays in external memory. Algorithmica, 32(1):1–35, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  10. R. Dementiev and P. Sanders. Asynchronous parallel disk sorting. In Proc. 15th. Annual Symposium on Parallelism in Algorithms and Architectures. ACM, 2003. To appear.

    Google Scholar 

  11. M. Farach. Optimal suffix tree construction with large alphabets. In Proc. 38th Annual Symposium on Foundations of Computer Science, pages 137–143. IEEE, 1997.

    Google Scholar 

  12. M. Farach, P. Ferragina, and S. Muthukrishnan. Overcoming the memory bottleneck in suffix tree construction. In Proc. 39th Annual Symposium on Foundations of Computer Science, pages 174–183. IEEE, 1998.

    Google Scholar 

  13. M. Farach and S. Muthukrishnan. Optimal logarithmic time randomized suffix tree construction. In Proc. 23th International Conference on Automata, Languages and Programming, pages 550–561. IEEE, 1996.

    Google Scholar 

  14. M. Farach-Colton, P. Ferragina, and S. Muthukrishnan. On the sorting-complexity of suffix tree construction. J. ACM, 47(6):987–1011, 2000.

    Article  MATH  MathSciNet  Google Scholar 

  15. M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. 40th Annual Symposium on Foundations of Computer Science, pages 285–298. IEEE, 1999.

    Google Scholar 

  16. N. Futamura, S. Aluru, and S. Kurtz. Parallel suffix sorting. In Proc. 9th International Conference on Advanced Computing and Communications, pages 76–81. Tata McGraw-Hill, 2001.

    Google Scholar 

  17. A. V. Gerbessiotis and C. J. Siniolakis. Merging on the BSP model. Parallel Computing, 27:809–822, 2001.

    Article  MATH  MathSciNet  Google Scholar 

  18. G. Gonnet, R. Baeza-Yates, and T. Snider. New indices for text: PAT trees and PAT arrays. In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms. Prentice-Hall, 1992.

    Google Scholar 

  19. M. T. Goodrich. Communication-efficient parallel sorting. SIAM J. Comput., 29(2):416–432, 1999.

    Article  MathSciNet  Google Scholar 

  20. R. Grossi and G. F. Italiano. Suffix trees and their applications in string algorithms. Rapporto di Ricerca CS-96-14, Università “Ca’ Foscari” di Venezia, Italy, 1996.

    Google Scholar 

  21. D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.

    Google Scholar 

  22. T. Hagerup and R. Raman. Waste makes haste: Tight bounds for loose parallel sorting. In Proc. 33rd Annual Symposium on Foundations of Computer Science, pages 628–637. IEEE, 1992.

    Google Scholar 

  23. T. Hagerup and C. Rüb. Optimal merging and sorting on the EREW-PRAM. Information Processing Letters, 33:181–185, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  24. D. Harel and R. E. Tarjan. Fast algorithms for finding nearest common ancestors. SIAM J. Comput., 13:338–355, 1984.

    Article  MATH  MathSciNet  Google Scholar 

  25. J. Jájá. An Introduction to Parallel Algorithms. Addison Wesley, 1992.

    Google Scholar 

  26. J. Kärkkäinen. Suffix cactus: A cross between suffix tree and suffix array. In Z. Galil and E. Ukkonen, editors, Proc. 6th Annual Symposium on Combinatorial Pattern Matching, volume 937 of LNCS, pages 191–204. Springer, 1995.

    Google Scholar 

  27. T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proc. 12th Annual Symposium on Combinatorial Pattern Matching, volume 2089 of LNCS, pages 181–192. Springer, 2001.

    Google Scholar 

  28. D. K. Kim, J. S. Sim, H. Park, and K. Park. Linear-time construction of suffix arrays. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Springer, June 2003. To appear.

    Google Scholar 

  29. P. Ko and S. Aluru. Space efficient linear time construction of suffix arrays. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Springer, June 2003. To appear.

    Google Scholar 

  30. N. J. Larsson and K. Sadakane. Faster suffix sorting. Technical report LU-CSTR: 99-214, Dept. of Computer Science, Lund University, Sweden, 1999.

    Google Scholar 

  31. U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935–948, Oct. 1993.

    Article  MATH  MathSciNet  Google Scholar 

  32. E. M. McCreight. A space-economic suffix tree construction algorithm. J. ACM, 23(2):262–272, 1976.

    Article  MATH  MathSciNet  Google Scholar 

  33. M. H. Nodine and J. S. Vitter. Deterministic distribution sort in shared and distributed memory multiprocessors. In Proc. 5th Annual Symposium on Parallel Algorithms and Architectures, pages 120–129. ACM, 1993.

    Google Scholar 

  34. M. H. Nodine and J. S. Vitter. Greed sort: An optimal sorting algorithm for multiple disks. J. ACM, 42(4):919–933, 1995.

    Article  MathSciNet  Google Scholar 

  35. S. Rajasekaran and J. H. Reif. Optimal and sublogarithmic time randomized parallel sorting algorithms. SIAM J. Comput., 18(3):594–607, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  36. E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249–260, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  37. L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 22(8):103–111, Aug. 1990.

    Article  Google Scholar 

  38. J. S. Vitter and E. A. M. Shriver. Algorithms for parallel memory, I: Two level memories. Algorithmica, 12(2/3):110–147, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  39. P. Weiner. Linear pattern matching algorithm. In Proc. 14th Symposium on Switching. and Automata Theory, pages 1–11. IEEE, 1973.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kärkkäinen, J., Sanders, P. (2003). Simple Linear Work Suffix Array Construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds) Automata, Languages and Programming. ICALP 2003. Lecture Notes in Computer Science, vol 2719. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45061-0_73

Download citation

  • DOI: https://doi.org/10.1007/3-540-45061-0_73

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40493-4

  • Online ISBN: 978-3-540-45061-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics