Skip to main content

An In-place Framework for Exact and Approximate Shortest Unique Substring Queries

  • Conference paper
  • First Online:
Algorithms and Computation (ISAAC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9472))

Included in the following conference series:

Abstract

We revisit the exact shortest unique substring (SUS) finding problem, and propose its approximate version where mismatches are allowed, due to its applications in subfields such as computational biology. We design a generic in-place framework that fits to solve both the exact and approximate k-mismatch SUS finding, using the minimum 2n memory words plus n bytes space, where n is the input string size. By using the in-place framework, we can find the exact and approximate k-mismatch SUS for every string position using a total of O(n) and \(O(n^2)\) time, respectively, regardless of the value of k. Our framework does not involve any compressed or succinct data structures and thus is practical and easy to implement.

Authors are listed in alphabetical order. Due to the page limit, a run example for Sect. 5 and all missing proofs and pseudocode can be found in the arXiv version of this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It is our arbitrary decision to resolve the ties by picking the rightmost choice. Our solution can also be easily modified to find the leftmost choice.

  2. 2.

    Note that the work of [4] studies a different problem and its computation is of the query-answer model, and thus is not comparable with [5, 7, 8] and ours.

  3. 3.

    In actual run, \(t_i^{-1}\) saves the largest number in that list, as we will see more clearly later.

  4. 4.

    For each \(j' < j\), if \(I_{j'}\) covers i, \(I_j\) would also cover i; in such a case, \(B[j] + 1 \ge B[j'] + 1\). For each \(j' \in [{\texttt {pred}}[i], i-1]\), \(I_{j'}\) is longer than \(I_i\).

  5. 5.

    Intuitively, \({\texttt {pred}}\) defines the shortcuts so that we can skip some intervals in \(I_{<i}\) to compute \(t_i\).

References

  1. Adaş, B., Bayraktar, E., Faro, S., Moustafa, I.E., Külekci, M.O.: Nucleotide sequence alignment and compression via shortest unique substring. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015, Part II. LNCS, vol. 9044, pp. 363–374. Springer, Heidelberg (2015)

    Google Scholar 

  2. Derrien, T., Estell, J., Marco Sola, S., Knowles, D.G., Raineri, E., Guig, R., Ribeca, P.: Fast computation and applications of genome mappability. PLoS ONE 7(1), e30377 (2012)

    Article  Google Scholar 

  3. Haubold, B., Pierstorff, N., Möller, F., Wiehe, T.: Genome comparison without alignment using shortest unique substrings. BMC Bioinform. 6, 123 (2005)

    Article  Google Scholar 

  4. Hu, X., Pei, J., Tao, Y.: Shortest Unique Queries on Strings. In: Moura, E., Crochemore, M. (eds.) SPIRE 2014. LNCS, vol. 8799, pp. 161–172. Springer, Heidelberg (2014)

    Google Scholar 

  5. İleri, A.M., Külekci, M.O., Xu, B.: A simple yet time-optimal and linear-space algorithm for shortest unique substring queries. Theor. Comput. Sci. 562, 621–633 (2015). Also in CPM 2014

    Article  MathSciNet  MATH  Google Scholar 

  6. Nong, G.: Practical linear-time \(O(1)\)-workspace suffix sorting for constant alphabets. ACM Trans. Inf. Syst. (TOIS) 31(3), 15:1–15:15 (2013)

    Article  MathSciNet  Google Scholar 

  7. Pei, J., Wu, W.C.H., Yeh, M.Y.: On shortest unique substring queries. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 937–948 (2013)

    Google Scholar 

  8. Tsuruta, K., Inenaga, S., Bannai, H., Takeda, M.: Shortest unique substrings queries in optimal time. In: Geffert, V., Preneel, B., Rovan, B., Štuller, J., Tjoa, A.M. (eds.) SOFSEM 2014. LNCS, vol. 8327, pp. 503–513. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bojian Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hon, WK., Thankachan, S.V., Xu, B. (2015). An In-place Framework for Exact and Approximate Shortest Unique Substring Queries. In: Elbassioni, K., Makino, K. (eds) Algorithms and Computation. ISAAC 2015. Lecture Notes in Computer Science(), vol 9472. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48971-0_63

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48971-0_63

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48970-3

  • Online ISBN: 978-3-662-48971-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics