research-article

Hyper-Compact Virtual Estimators for Big Network Data Based on Register Sharing

Authors:
Qingjun Xiao

University of Florida, Gainesville, FL, USA

University of Florida, Gainesville, FL, USA
View Profile

,
Shigang Chen

University of Florida, Gainesville, FL, USA

University of Florida, Gainesville, FL, USA
View Profile

,
Min Chen

University of Florida, Gainesville, FL, USA

University of Florida, Gainesville, FL, USA
View Profile

,
Yibei Ling

Telcordia Technologies & Applied Research, Piscataway, NJ, USA

Telcordia Technologies & Applied Research, Piscataway, NJ, USA
View Profile

SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer SystemsJune 2015Pages 417–428https://doi.org/10.1145/2745844.2745870

Published:15 June 2015Publication History

SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

Pages 417–428

ABSTRACT

Cardinality estimation over big network data consisting of numerous flows is a fundamental problem with many practical applications. Traditionally the research on this problem focused on using a small amount of memory to estimate each flow's cardinality from a large range (up to $10^9$). However, although the memory needed for each flow has been greatly compressed, when there is an extremely large number of flows, the overall memory demand can still be very high, exceeding the availability under some important scenarios, such as implementing online measurement modules in network processors using only on-chip cache memory. In this paper, instead of allocating a separated data structure (called estimator) for each flow, we take a different path by viewing all the flows together as a whole: Each flow is allocated with a virtual estimator, and these virtual estimators share a common memory space. We discover that sharing at the register (multi-bit) level is superior than sharing at the bit level. We propose a framework of virtual estimators that allows us to apply the idea of sharing to an array of cardinality estimation solutions, achieving far better memory efficiency than the best existing work. Our experiment shows that the new solution can work in a tight memory space of less than 1 bit per flow or even one tenth of a bit per flow --- a quest that has never been realized before.

References

CAIDA UCSD anonymized 2013 internet traces on Jan. 17.footnotesize http://www.caida.org/data/passive/passive_2013_dataset.xml.Google Scholar
Google trends. http://www.google.com/trends/.Google Scholar
Z. Bar-yossef, T. S. Jayram, R. Kumar, D. Sivakumar, L. Trevisan, and Luca. Counting distinct elements in a data stream. Proc. of RANDOM: Workshop on Randomization and Approximation, 2002. Google ScholarDigital Library
K. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis, and R. Gemulla. On synopses for distinct-value estimation under multiset operations. Proc. of ACM SIGMOD, 2007. Google ScholarDigital Library
G. Cormode and S. Muthukrishnan. An improved data stream summary: the Count-Min sketch and its applications. Proc. of LATIN, 2004.Google ScholarCross Ref
M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou, L. Zhang, and P. Barham. Vigilante: End-to-end containment of internet worms. SIGOPS Operating Systems Review, 39(5), October 2005. Google ScholarDigital Library
X. Dimitropoulos, P. Hurley, and A. Kind. Probabilistic lossy counting: An efficient algorithm for finding heavy hitters. ACM SIGCOMM Computer Communication Review, 38(1), 2008. Google ScholarDigital Library
M. Durand and P. Flajolet. Loglog counting of large cardinalities. ESA: European Symposia on Algorithms, pages 605--617, 2003.Google ScholarCross Ref
C. Estan and G. Varghese. New directions in traffic measurement and accounting. Proc. of ACM SIGCOMM, August 2002. Google ScholarDigital Library
C. Estan, G. Varghese, and M. Fish. Bitmap algorithms for counting active flows on high-speed links. IEEE/ACM Transactions on Networking (TON), 14(5):925--937, 2006. Google ScholarDigital Library
P. Flajolet, E. Fusy, O. Gandouet, and F. Meunier. HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm. Proc. of AOFA: International Conference on Analysis Of Algorithms, 2007.Google Scholar
P. Flajolet and G. N. Martin. Probabilistic counting algorithms for database applications. J. Comput. Syst. Sci., 31(2), 1985. Google ScholarDigital Library
W. D. Gardner. Researchers transmit optical data at 16.4 Tbps. InformationWeek, February 2008.Google Scholar
S. Heule, M. Nunkesser, and A. Hall. HyperLogLog in practice: Algorithmic engineering of a state-of-the-art cardinality estimation algorithm. Proc. of EDBT, 2013. Google ScholarDigital Library
T. Li, S. Chen, and Y. Ling. Fast and compact per-flow traffic measurement through randomized counter sharing. in Proc. of IEEE INFOCOM, 2011.Google ScholarCross Ref
T. Li, S. Chen, W. Luo, M. Zhang, and Y. Qiao. Spreader classification based on optimal dynamic bit sharing. IEEE/ACM Transactions on Networking, 21(3):817--830, 2013. Google ScholarDigital Library
P. Lieven and B. Scheuermann. High-speed per-flow traffic measurement with probabilistic multiplicity counting. Proc. of IEEE INFOCOM, pages 1--9, 2010. Google ScholarDigital Library
Y. Lu, A. Montanari, B. Prabhakar, S. Dharmapurikar, and A. Kabbani. Counter braids: A novel counter architecture for per-flow measurement. Proc. of ACM SIGMETRICS, June 2008. Google ScholarDigital Library
Y. Lu and B. Prabhakar. Robust counting via counter braids: An error-resilient network measurement architecture. Proc. of IEEE INFOCOM, April 2009.Google ScholarCross Ref
Neustar.biz. How to choose a good hash function: Part 3.footnotesize http://research.neustar.biz/2012/02/02/choosing-a-good-hash-function-part-3.Google Scholar
N. Ntarmos, P. Triantafillou, and G. Weikum. Counting at large: Efficient cardinality estimation in internet-scale data networks. Proc. of ICDE, pages 40--40, 2006. Google ScholarDigital Library
K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 15(2):208--229, 1990. Google ScholarDigital Library
Q. Xiao, Y. Qiao, M. Zhen, and S. Chen. Estimating the persistent spreads in high-speed networks. Proc. of IEEE ICNP, pages 131--142, 2014. Google ScholarDigital Library
Q. Xiao, B. Xiao, and S. Chen. Differential estimation in dynamic RFID systems. In Proc. of INFOCOM (mini-conference), pages 295--299, 2013.Google ScholarCross Ref
M. Yoon, T. Li, S. Chen, and J.-K. Peir. Fit a spread estimator in small memory. Proc. of IEEE INFOCOM, 2009.Google ScholarCross Ref
Q. Zhao, J. Xu, and A. Kumar. Detection of super sources and destinations in high-speed networks: Algorithms, analysis and evaluation. IEEE JASC, 24(10):1840--1852, 2006. Google ScholarDigital Library
C. C. Zou, L. Gao, W. Gong, and D. Towsley. Monitoring and early warning for internet worms. Proc. of the 10th ACM Conference on Computer and Communications Security, 2003. Google ScholarDigital Library

Index Terms

Hyper-Compact Virtual Estimators for Big Network Data Based on Register Sharing
1. General and reference
  1. Cross-computing tools and techniques
    1. Measurement
    2. Metrics

Recommendations

Hyper-Compact Virtual Estimators for Big Network Data Based on Register Sharing
Performance evaluation review

Cardinality estimation over big network data consisting of numerous flows is a fundamental problem with many practical applications. Traditionally the research on this problem focused on using a small amount of memory to estimate each flow's cardinality ...
Read More
Persistent Spread Measurement for Big Network Data Based on Register Intersection

Persistent spread measurement is to count the number of distinct elements that persist in each network flow for predefined time periods. It has many practical applications, including detecting long-term stealthy network activities in the background of ...
Read More
Persistent Spread Measurement for Big Network Data Based on Register Intersection
SIGMETRICS '17 Abstracts: Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems

Persistent spread measurement is to count the number of distinct elements that persist in each network flow for predefined time periods. It has many practical applications, including detecting long-term stealthy network activities in the background of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
June 2015
488 pages
ISBN:9781450334860
DOI:10.1145/2745844
General Chairs:
Bill Lin
University of California, San Diego
,
Jun (Jim) Xu
Georgia Tech
,
Program Chairs:
Sudipta Sengupta
Microsoft Research
,
Devavrat Shah
Massachusetts Institute of Technology
ACM SIGMETRICS Performance Evaluation Review Volume 43, Issue 1
Performance evaluation review
June 2015
468 pages
ISSN:0163-5999
DOI:10.1145/2796314
Editors:
Derek Eager
University of Saskatchewan
,
Carey Williamson
University of Calgary
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
big network data
cardinality estimation
network stream monitoring
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMETRICS '15 Paper Acceptance Rate32of239submissions,13%Overall Acceptance Rate459of2,691submissions,17%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 47
  Total Citations
  View Citations
- 353
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hyper-Compact Virtual Estimators for Big Network Data Based on Register Sharing

SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hyper-Compact Virtual Estimators for Big Network Data Based on Register Sharing

Persistent Spread Measurement for Big Network Data Based on Register Intersection

Persistent Spread Measurement for Big Network Data Based on Register Intersection