skip to main content
10.1145/2745844.2745870acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Hyper-Compact Virtual Estimators for Big Network Data Based on Register Sharing

Authors Info & Claims
Published:15 June 2015Publication History

ABSTRACT

Cardinality estimation over big network data consisting of numerous flows is a fundamental problem with many practical applications. Traditionally the research on this problem focused on using a small amount of memory to estimate each flow's cardinality from a large range (up to $10^9$). However, although the memory needed for each flow has been greatly compressed, when there is an extremely large number of flows, the overall memory demand can still be very high, exceeding the availability under some important scenarios, such as implementing online measurement modules in network processors using only on-chip cache memory. In this paper, instead of allocating a separated data structure (called estimator) for each flow, we take a different path by viewing all the flows together as a whole: Each flow is allocated with a virtual estimator, and these virtual estimators share a common memory space. We discover that sharing at the register (multi-bit) level is superior than sharing at the bit level. We propose a framework of virtual estimators that allows us to apply the idea of sharing to an array of cardinality estimation solutions, achieving far better memory efficiency than the best existing work. Our experiment shows that the new solution can work in a tight memory space of less than 1 bit per flow or even one tenth of a bit per flow --- a quest that has never been realized before.

References

  1. CAIDA UCSD anonymized 2013 internet traces on Jan. 17.footnotesize http://www.caida.org/data/passive/passive_2013_dataset.xml.Google ScholarGoogle Scholar
  2. Google trends. http://www.google.com/trends/.Google ScholarGoogle Scholar
  3. Z. Bar-yossef, T. S. Jayram, R. Kumar, D. Sivakumar, L. Trevisan, and Luca. Counting distinct elements in a data stream. Proc. of RANDOM: Workshop on Randomization and Approximation, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis, and R. Gemulla. On synopses for distinct-value estimation under multiset operations. Proc. of ACM SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Cormode and S. Muthukrishnan. An improved data stream summary: the Count-Min sketch and its applications. Proc. of LATIN, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou, L. Zhang, and P. Barham. Vigilante: End-to-end containment of internet worms. SIGOPS Operating Systems Review, 39(5), October 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. X. Dimitropoulos, P. Hurley, and A. Kind. Probabilistic lossy counting: An efficient algorithm for finding heavy hitters. ACM SIGCOMM Computer Communication Review, 38(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Durand and P. Flajolet. Loglog counting of large cardinalities. ESA: European Symposia on Algorithms, pages 605--617, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  9. C. Estan and G. Varghese. New directions in traffic measurement and accounting. Proc. of ACM SIGCOMM, August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Estan, G. Varghese, and M. Fish. Bitmap algorithms for counting active flows on high-speed links. IEEE/ACM Transactions on Networking (TON), 14(5):925--937, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Flajolet, E. Fusy, O. Gandouet, and F. Meunier. HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm. Proc. of AOFA: International Conference on Analysis Of Algorithms, 2007.Google ScholarGoogle Scholar
  12. P. Flajolet and G. N. Martin. Probabilistic counting algorithms for database applications. J. Comput. Syst. Sci., 31(2), 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. D. Gardner. Researchers transmit optical data at 16.4 Tbps. InformationWeek, February 2008.Google ScholarGoogle Scholar
  14. S. Heule, M. Nunkesser, and A. Hall. HyperLogLog in practice: Algorithmic engineering of a state-of-the-art cardinality estimation algorithm. Proc. of EDBT, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Li, S. Chen, and Y. Ling. Fast and compact per-flow traffic measurement through randomized counter sharing. in Proc. of IEEE INFOCOM, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  16. T. Li, S. Chen, W. Luo, M. Zhang, and Y. Qiao. Spreader classification based on optimal dynamic bit sharing. IEEE/ACM Transactions on Networking, 21(3):817--830, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Lieven and B. Scheuermann. High-speed per-flow traffic measurement with probabilistic multiplicity counting. Proc. of IEEE INFOCOM, pages 1--9, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Lu, A. Montanari, B. Prabhakar, S. Dharmapurikar, and A. Kabbani. Counter braids: A novel counter architecture for per-flow measurement. Proc. of ACM SIGMETRICS, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Lu and B. Prabhakar. Robust counting via counter braids: An error-resilient network measurement architecture. Proc. of IEEE INFOCOM, April 2009.Google ScholarGoogle ScholarCross RefCross Ref
  20. Neustar.biz. How to choose a good hash function: Part 3.footnotesize http://research.neustar.biz/2012/02/02/choosing-a-good-hash-function-part-3.Google ScholarGoogle Scholar
  21. N. Ntarmos, P. Triantafillou, and G. Weikum. Counting at large: Efficient cardinality estimation in internet-scale data networks. Proc. of ICDE, pages 40--40, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 15(2):208--229, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Q. Xiao, Y. Qiao, M. Zhen, and S. Chen. Estimating the persistent spreads in high-speed networks. Proc. of IEEE ICNP, pages 131--142, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Q. Xiao, B. Xiao, and S. Chen. Differential estimation in dynamic RFID systems. In Proc. of INFOCOM (mini-conference), pages 295--299, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  25. M. Yoon, T. Li, S. Chen, and J.-K. Peir. Fit a spread estimator in small memory. Proc. of IEEE INFOCOM, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  26. Q. Zhao, J. Xu, and A. Kumar. Detection of super sources and destinations in high-speed networks: Algorithms, analysis and evaluation. IEEE JASC, 24(10):1840--1852, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. C. Zou, L. Gao, W. Gong, and D. Towsley. Monitoring and early warning for internet worms. Proc. of the 10th ACM Conference on Computer and Communications Security, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hyper-Compact Virtual Estimators for Big Network Data Based on Register Sharing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
        June 2015
        488 pages
        ISBN:9781450334860
        DOI:10.1145/2745844

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 June 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGMETRICS '15 Paper Acceptance Rate32of239submissions,13%Overall Acceptance Rate459of2,691submissions,17%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader