Skip to main content

Strategic Pattern Search in Factor-Compressed Text

  • Conference paper
Book cover String Processing and Information Retrieval (SPIRE 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8799))

Included in the following conference series:

  • 627 Accesses

Abstract

We consider the problem of pattern-search in compressed text in a context in which: (a) the text is stored as a sequence of factors against a static phrase-book; (b) decoding of factors is from right-to-left; and (c) extraction of each symbol in each factor requires Θ(logσ) time, where σ is the size of the original alphabet. To determine possible alignments given information about decoded characters we introduce two Boyer-Moore-like searching mechanisms, including one that makes use of a suffix array constructed over the pattern. The new mechanisms decode fewer than half the symbols that are required by a sequential left-to-right search such as the Knuth-Morris-Pratt approach, a saving that translates directly into improved execution time. Experiments with a two-level suffix array index structure for 4 GB of English text demonstrate the usefulness of the new techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Boyer, R.S., Moore, J.S.: A fast string searching algorithm. C. ACM 20, 1075–1091 (1977)

    Article  Google Scholar 

  2. Colussi, L.: Fastest pattern matching in strings. J. Alg. 16, 163–189 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  3. Faro, S., Lecroq, T.: The exact online string matching problem: A review of the most recent results. ACM Comput. Surv. 45(2), 13:1–13:42 (2013)

    Google Scholar 

  4. Ferragina, P., Grossi, R.: The string B-tree: A new data structure for search in external memory and its applications. J. ACM 46(2), 236–280 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  5. Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)

    Article  MathSciNet  Google Scholar 

  6. Gog, S., Beller, T., Moffat, A., Petri, M.: From theory to practice: Plug and play with succinct data structures. In: Proc. Symp. Experimental Algorithms, pp. 326–337 (2014)

    Google Scholar 

  7. Gog, S., Moffat, A.: Adding compression and blended search to a compact two-level suffix array. In: Proc. Symp. String Processing and Inf. Retrieval, pp. 141–152 (2013)

    Google Scholar 

  8. Gog, S., Moffat, A., Culpepper, J.S., Turpin, A., Wirth, A.: Large-scale pattern search using reduced-space on-disk suffix arrays. IEEE Trans. Knowledge and Data Engineering 26(8), 1 (2014)

    Article  Google Scholar 

  9. Horspool, R.N.: Practical fast searching in strings. Soft. Prac. & Exp. 10(6), 501–506 (1980)

    Article  Google Scholar 

  10. Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comp. 6(1), 323–350 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  11. Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences. Cambridge University Press (2002)

    Google Scholar 

  12. Raita, T.: Tuning the Boyer-Moore-Horspool string searching algorithms. Soft. Prac. & Exp. 22(10), 879–884 (1992)

    Article  Google Scholar 

  13. Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. ACM-SIAM Symp. Discrete Algorithms, pp. 233–242 (2002)

    Google Scholar 

  14. Sinha, R., Puglisi, S.J., Moffat, A., Turpin, A.: Improving suffix array locality for fast pattern matching on disk. In: Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 661–672 (2008)

    Google Scholar 

  15. Smith, P.D.: Experiments with a very fast substring search algorithm. Soft. Prac. & Exp. 21(10), 1065–1074 (1991)

    Article  Google Scholar 

  16. Sunday, D.M.: A very fast substring search algorithm. C. ACM 33(8), 132–142 (1990)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Gog, S., Moffat, A., Petri, M. (2014). Strategic Pattern Search in Factor-Compressed Text. In: Moura, E., Crochemore, M. (eds) String Processing and Information Retrieval. SPIRE 2014. Lecture Notes in Computer Science, vol 8799. Springer, Cham. https://doi.org/10.1007/978-3-319-11918-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11918-2_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11917-5

  • Online ISBN: 978-3-319-11918-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics