Years and Authors of Summarized Original Work
1966; Golomb
1974; Elias
2000; Moffat, Stuiver
Problem Definition
An n-symbol message \(M =\langle s_{0},s_{1},\ldots ,s_{n-1}\rangle\) is given, where each symbol s i is an integer in the range 0 ≤ s i < U. If the s i s are strictly increasing, then M identifies an n-subset of {0, 1, …, U − 1}.
Objective To economically encode M as a binary string over {0, 1}.
Constraints
- 1.
Short messages. The message length n may be small relative to U.
- 2.
Monotonic equivalence. Message M is converted to a strictly increasing message M ′ over the alphabet U ′ ≤ Un by taking prefix sums, \(s_{i}^{{\prime}} = i +\sum _{ j=0}^{i}s_{j}\) and \(U^{{\prime}} = s_{n-1}^{{\prime}} + 1\). The inverse is to “take gaps,” \(g_{i} = s_{i} - s_{i-1} - 1\), with g 0 = s 0.
- 3.
Combinatorial Limit. If M is monotonic then \(\left \lceil \log _{2}\left (\begin{array}{@{}c@{}} U\\ n\end{array} \right )\right \rceil \leq U\) bits are required in the worst case. When n ≪ U, \...
Recommended Reading
Anh VN, Moffat A (2010) Index compression using 64-bit words. Softw Pract Exp 40(2):131–147
Boldi P, Vigna S (2005) Codes for the world-wide web. Internet Math 2(4):405–427
Brisaboa NR, Fariña A, Navarro G, Esteller MF (2003) (S, C)-dense coding: an optimized compression code for natural language text databases. In: Proceedings of the symposium on string processing and information retrieval, Manaus, pp 122–136
Culpepper JS, Moffat A (2005) Enhanced byte codes with restricted prefix properties. In: Proceedings of the symposium on string processing and information retrieval, Buenos Aires, pp 1–12
de Moura ES, Navarro G, Ziviani N, Baeza-Yates R (2000) Fast and flexible word searching on compressed text. ACM Trans Inf Syst 18(2):113–139
Elias P (1974) Efficient storage and retrieval by content and address of static files. J ACM 21(2):246–260
Elias P (1975) Universal codeword sets and representations of the integers. IEEE Trans Inf Theory IT-21(2):194–203
Fenwick P (2003) Universal codes. In: Sayood K (ed) Lossless compression handbook. Academic, Boston, pp 55–78
Fraenkel AS, Klein ST (1985) Novel compression of sparse bit-strings—preliminary report. In: Apostolico A, Galil Z (eds) Combinatorial algorithms on words. NATO ASI series F, vol 12. Springer, Berlin, pp 169–183
Golomb SW (1966) Run-length encodings. IEEE Trans Inf Theory IT–12(3):399–401
Lemire D, Boytsov L (2014, to appear) Decoding billions of integers per second through vectorization. Softw Pract Exp. http://dx.doi.org/10.1002/spe.2203
Moffat A, Anh VN (2006) Binary codes for locally homogeneous sequences. Inf Process Lett 99(5):75–80
Moffat A, Stuiver L (2000) Binary interpolative coding for effective index compression. Inf Retr 3(1):25–47
Moffat A, Turpin A (2002) Compression and coding algorithms. Kluwer Academic, Boston
Vigna S (2013) Quasi-succinct indices. In: Proceedings of the international conference on web search and data mining, Rome, pp 83–92
Zukowski M, Héman S, Nes N, Boncz P (2006) Super-scalar RAM-CPU cache compression. In: Proceedings of the international conference on data engineering, Atlanta. IEEE Computer Society, Washington, DC, paper 59
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this entry
Cite this entry
Moffat, A. (2014). Compressing Integer Sequences. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, Boston, MA. https://doi.org/10.1007/978-3-642-27848-8_84-2
Download citation
DOI: https://doi.org/10.1007/978-3-642-27848-8_84-2
Received:
Accepted:
Published:
Publisher Name: Springer, Boston, MA
Online ISBN: 978-3-642-27848-8
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering