Skip to main content

Loglog Counting of Large Cardinalities

(Extended Abstract)

  • Conference paper
Algorithms - ESA 2003 (ESA 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2832))

Included in the following conference series:

Abstract

Using an auxiliary memory smaller than the size of this abstract, the LogLog algorithm makes it possible to estimate in a single pass and within a few percents the number of different words in the whole of Shakespeare’s works. In general the LogLog algorithm makes use of m “small bytes” of auxiliary memory in order to estimate in a single pass the number of distinct elements (the “cardinality”) in a file, and it does so with an accuracy that is of the order of 1/sqrtm. The “small bytes” to be used in order to count cardinalities till N max comprise about loglog N max bits, so that cardinalities well in the range of billions can be determined using one or two kilobytes of memory only. The basic version of the LogLog algorithm is validated by a complete analysis. An optimized version, super–LogLog, is also engineered and tested on real-life data. The algorithm parallelizes optimally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58, 137–147 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  2. Estan, C., Varghese, G.: New directions in traffic measurement and accounting. In: Proceedings of SIGCOMM 2002. ACM Press, New York (2002) (Also: UCSD technical report CS2002-0699, February, 2002; available electronically)

    Google Scholar 

  3. Estan, C., Varghese, G., Fisk, M.: Bitmap algorithms for counting active flows on high speed links. Technical Report CS2003-0738, UCSD (Mar 2003)

    Google Scholar 

  4. Flajolet, P.: Approximate counting: A detailed analysis. BIT 25, 113–134 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  5. Flajolet, P.: On adaptive sampling. Computing 34, 391–400 (1990)

    Article  MathSciNet  Google Scholar 

  6. Flajolet, P., Gourdon, X., Dumas, P.: Mellin transforms and asymptotics: Harmonic sums. Theoretical Computer Science 144(1-2), 3–58 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  7. Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences 31(2), 182–209 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  8. Gibbons, P.B., Poosala, V., Acharya, S., Bartal, Y., Matias, Y., Muthukrishnan, S., Ramaswamy, S., Suel, T.: AQUA: System and techniques for approximate query answering. Tech. report, Bell Laboratories, Murray Hill, New Jersey (February 1998)

    Google Scholar 

  9. Jacquet, P., Szpankowski, W.: Analytical depoissonization and its applications. Theoretical Computer Science 201, 1–2 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  10. Knuth, D.E.: The Art of Computer Programming. Sorting and Searching, vol. 3, 2nd edn. Addison-Wesley, Reading (1998)

    Google Scholar 

  11. Morris, R.: Counting large numbers of events in small registers. Communications of the ACM 21, 840–842 (1978)

    Article  MATH  Google Scholar 

  12. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)

    MATH  Google Scholar 

  13. Palmer, C.R., Siganos, G., Faloutsos, M., Faloutsos, C., Gibbons, P.: The connectivity and fault-tolerance of the Internet topology. In: Workshop on Network-Related Data Management (NRDM 2001) (2001)

    Google Scholar 

  14. Prodinger, H.: Combinatorics of geometrically distributed random variables: Leftto- right maxima. Discrete Mathematics 153, 253–270 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  15. Szpankowski, W.: Average-Case Analysis of Algorithms on Sequences. John Wiley, New York (2001)

    MATH  Google Scholar 

  16. Whang, K.-Y., Zanden, B.T.V., Taylor, H.M.: A linear-time probabilistic counting algorithm for database applications. TODS 15(2), 208–229 (1990)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Durand, M., Flajolet, P. (2003). Loglog Counting of Large Cardinalities. In: Di Battista, G., Zwick, U. (eds) Algorithms - ESA 2003. ESA 2003. Lecture Notes in Computer Science, vol 2832. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39658-1_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39658-1_55

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20064-2

  • Online ISBN: 978-3-540-39658-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics