Skip to main content

Compressing Integer Sequences

  • Living reference work entry
  • First Online:
Encyclopedia of Algorithms
  • 243 Accesses

Years and Authors of Summarized Original Work

1966; Golomb

1974; Elias

2000; Moffat, Stuiver

Problem Definition

An n-symbol message \(M =\langle s_{0},s_{1},\ldots ,s_{n-1}\rangle\) is given, where each symbol s i is an integer in the range 0 ≤ s i  < U. If the s i s are strictly increasing, then M identifies an n-subset of {0, 1, , U − 1}.

Objective To economically encode M as a binary string over {0, 1}.

Constraints

  1. 1.

    Short messages. The message length n may be small relative to U.

  2. 2.

    Monotonic equivalence. Message M is converted to a strictly increasing message M over the alphabet U  ≤ Un by taking prefix sums, \(s_{i}^{{\prime}} = i +\sum _{ j=0}^{i}s_{j}\) and \(U^{{\prime}} = s_{n-1}^{{\prime}} + 1\). The inverse is to “take gaps,” \(g_{i} = s_{i} - s_{i-1} - 1\), with g 0 = s 0.

  3. 3.

    Combinatorial Limit. If M is monotonic then \(\left \lceil \log _{2}\left (\begin{array}{@{}c@{}} U\\ n\end{array} \right )\right \rceil \leq U\) bits are required in the worst case. When n ≪ U, \...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Recommended Reading

  1. Anh VN, Moffat A (2010) Index compression using 64-bit words. Softw Pract Exp 40(2):131–147

    Google Scholar 

  2. Boldi P, Vigna S (2005) Codes for the world-wide web. Internet Math 2(4):405–427

    Article  MathSciNet  Google Scholar 

  3. Brisaboa NR, Fariña A, Navarro G, Esteller MF (2003) (S, C)-dense coding: an optimized compression code for natural language text databases. In: Proceedings of the symposium on string processing and information retrieval, Manaus, pp 122–136

    Google Scholar 

  4. Culpepper JS, Moffat A (2005) Enhanced byte codes with restricted prefix properties. In: Proceedings of the symposium on string processing and information retrieval, Buenos Aires, pp 1–12

    Google Scholar 

  5. de Moura ES, Navarro G, Ziviani N, Baeza-Yates R (2000) Fast and flexible word searching on compressed text. ACM Trans Inf Syst 18(2):113–139

    Article  Google Scholar 

  6. Elias P (1974) Efficient storage and retrieval by content and address of static files. J ACM 21(2):246–260

    Article  MATH  MathSciNet  Google Scholar 

  7. Elias P (1975) Universal codeword sets and representations of the integers. IEEE Trans Inf Theory IT-21(2):194–203

    Article  MathSciNet  Google Scholar 

  8. Fenwick P (2003) Universal codes. In: Sayood K (ed) Lossless compression handbook. Academic, Boston, pp 55–78

    Chapter  Google Scholar 

  9. Fraenkel AS, Klein ST (1985) Novel compression of sparse bit-strings—preliminary report. In: Apostolico A, Galil Z (eds) Combinatorial algorithms on words. NATO ASI series F, vol 12. Springer, Berlin, pp 169–183

    Chapter  Google Scholar 

  10. Golomb SW (1966) Run-length encodings. IEEE Trans Inf Theory IT–12(3):399–401

    Google Scholar 

  11. Lemire D, Boytsov L (2014, to appear) Decoding billions of integers per second through vectorization. Softw Pract Exp. http://dx.doi.org/10.1002/spe.2203

  12. Moffat A, Anh VN (2006) Binary codes for locally homogeneous sequences. Inf Process Lett 99(5):75–80

    MathSciNet  Google Scholar 

  13. Moffat A, Stuiver L (2000) Binary interpolative coding for effective index compression. Inf Retr 3(1):25–47

    Article  Google Scholar 

  14. Moffat A, Turpin A (2002) Compression and coding algorithms. Kluwer Academic, Boston

    Book  Google Scholar 

  15. Vigna S (2013) Quasi-succinct indices. In: Proceedings of the international conference on web search and data mining, Rome, pp 83–92

    Google Scholar 

  16. Zukowski M, Héman S, Nes N, Boncz P (2006) Super-scalar RAM-CPU cache compression. In: Proceedings of the international conference on data engineering, Atlanta. IEEE Computer Society, Washington, DC, paper 59

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alistair Moffat .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this entry

Cite this entry

Moffat, A. (2014). Compressing Integer Sequences. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, Boston, MA. https://doi.org/10.1007/978-3-642-27848-8_84-2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27848-8_84-2

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Online ISBN: 978-3-642-27848-8

  • eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics