ABSTRACT
Cardinality estimation over big network data consisting of numerous flows is a fundamental problem with many practical applications. Traditionally the research on this problem focused on using a small amount of memory to estimate each flow's cardinality from a large range (up to $10^9$). However, although the memory needed for each flow has been greatly compressed, when there is an extremely large number of flows, the overall memory demand can still be very high, exceeding the availability under some important scenarios, such as implementing online measurement modules in network processors using only on-chip cache memory. In this paper, instead of allocating a separated data structure (called estimator) for each flow, we take a different path by viewing all the flows together as a whole: Each flow is allocated with a virtual estimator, and these virtual estimators share a common memory space. We discover that sharing at the register (multi-bit) level is superior than sharing at the bit level. We propose a framework of virtual estimators that allows us to apply the idea of sharing to an array of cardinality estimation solutions, achieving far better memory efficiency than the best existing work. Our experiment shows that the new solution can work in a tight memory space of less than 1 bit per flow or even one tenth of a bit per flow --- a quest that has never been realized before.
- CAIDA UCSD anonymized 2013 internet traces on Jan. 17.footnotesize http://www.caida.org/data/passive/passive_2013_dataset.xml.Google Scholar
- Google trends. http://www.google.com/trends/.Google Scholar
- Z. Bar-yossef, T. S. Jayram, R. Kumar, D. Sivakumar, L. Trevisan, and Luca. Counting distinct elements in a data stream. Proc. of RANDOM: Workshop on Randomization and Approximation, 2002. Google ScholarDigital Library
- K. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis, and R. Gemulla. On synopses for distinct-value estimation under multiset operations. Proc. of ACM SIGMOD, 2007. Google ScholarDigital Library
- G. Cormode and S. Muthukrishnan. An improved data stream summary: the Count-Min sketch and its applications. Proc. of LATIN, 2004.Google ScholarCross Ref
- M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou, L. Zhang, and P. Barham. Vigilante: End-to-end containment of internet worms. SIGOPS Operating Systems Review, 39(5), October 2005. Google ScholarDigital Library
- X. Dimitropoulos, P. Hurley, and A. Kind. Probabilistic lossy counting: An efficient algorithm for finding heavy hitters. ACM SIGCOMM Computer Communication Review, 38(1), 2008. Google ScholarDigital Library
- M. Durand and P. Flajolet. Loglog counting of large cardinalities. ESA: European Symposia on Algorithms, pages 605--617, 2003.Google ScholarCross Ref
- C. Estan and G. Varghese. New directions in traffic measurement and accounting. Proc. of ACM SIGCOMM, August 2002. Google ScholarDigital Library
- C. Estan, G. Varghese, and M. Fish. Bitmap algorithms for counting active flows on high-speed links. IEEE/ACM Transactions on Networking (TON), 14(5):925--937, 2006. Google ScholarDigital Library
- P. Flajolet, E. Fusy, O. Gandouet, and F. Meunier. HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm. Proc. of AOFA: International Conference on Analysis Of Algorithms, 2007.Google Scholar
- P. Flajolet and G. N. Martin. Probabilistic counting algorithms for database applications. J. Comput. Syst. Sci., 31(2), 1985. Google ScholarDigital Library
- W. D. Gardner. Researchers transmit optical data at 16.4 Tbps. InformationWeek, February 2008.Google Scholar
- S. Heule, M. Nunkesser, and A. Hall. HyperLogLog in practice: Algorithmic engineering of a state-of-the-art cardinality estimation algorithm. Proc. of EDBT, 2013. Google ScholarDigital Library
- T. Li, S. Chen, and Y. Ling. Fast and compact per-flow traffic measurement through randomized counter sharing. in Proc. of IEEE INFOCOM, 2011.Google ScholarCross Ref
- T. Li, S. Chen, W. Luo, M. Zhang, and Y. Qiao. Spreader classification based on optimal dynamic bit sharing. IEEE/ACM Transactions on Networking, 21(3):817--830, 2013. Google ScholarDigital Library
- P. Lieven and B. Scheuermann. High-speed per-flow traffic measurement with probabilistic multiplicity counting. Proc. of IEEE INFOCOM, pages 1--9, 2010. Google ScholarDigital Library
- Y. Lu, A. Montanari, B. Prabhakar, S. Dharmapurikar, and A. Kabbani. Counter braids: A novel counter architecture for per-flow measurement. Proc. of ACM SIGMETRICS, June 2008. Google ScholarDigital Library
- Y. Lu and B. Prabhakar. Robust counting via counter braids: An error-resilient network measurement architecture. Proc. of IEEE INFOCOM, April 2009.Google ScholarCross Ref
- Neustar.biz. How to choose a good hash function: Part 3.footnotesize http://research.neustar.biz/2012/02/02/choosing-a-good-hash-function-part-3.Google Scholar
- N. Ntarmos, P. Triantafillou, and G. Weikum. Counting at large: Efficient cardinality estimation in internet-scale data networks. Proc. of ICDE, pages 40--40, 2006. Google ScholarDigital Library
- K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 15(2):208--229, 1990. Google ScholarDigital Library
- Q. Xiao, Y. Qiao, M. Zhen, and S. Chen. Estimating the persistent spreads in high-speed networks. Proc. of IEEE ICNP, pages 131--142, 2014. Google ScholarDigital Library
- Q. Xiao, B. Xiao, and S. Chen. Differential estimation in dynamic RFID systems. In Proc. of INFOCOM (mini-conference), pages 295--299, 2013.Google ScholarCross Ref
- M. Yoon, T. Li, S. Chen, and J.-K. Peir. Fit a spread estimator in small memory. Proc. of IEEE INFOCOM, 2009.Google ScholarCross Ref
- Q. Zhao, J. Xu, and A. Kumar. Detection of super sources and destinations in high-speed networks: Algorithms, analysis and evaluation. IEEE JASC, 24(10):1840--1852, 2006. Google ScholarDigital Library
- C. C. Zou, L. Gao, W. Gong, and D. Towsley. Monitoring and early warning for internet worms. Proc. of the 10th ACM Conference on Computer and Communications Security, 2003. Google ScholarDigital Library
Index Terms
- Hyper-Compact Virtual Estimators for Big Network Data Based on Register Sharing
Recommendations
Hyper-Compact Virtual Estimators for Big Network Data Based on Register Sharing
Performance evaluation reviewCardinality estimation over big network data consisting of numerous flows is a fundamental problem with many practical applications. Traditionally the research on this problem focused on using a small amount of memory to estimate each flow's cardinality ...
Persistent Spread Measurement for Big Network Data Based on Register Intersection
Persistent spread measurement is to count the number of distinct elements that persist in each network flow for predefined time periods. It has many practical applications, including detecting long-term stealthy network activities in the background of ...
Persistent Spread Measurement for Big Network Data Based on Register Intersection
SIGMETRICS '17 Abstracts: Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer SystemsPersistent spread measurement is to count the number of distinct elements that persist in each network flow for predefined time periods. It has many practical applications, including detecting long-term stealthy network activities in the background of ...
Comments