Abstract
We consider a router on the Internet analyzing the statistical properties of a TCP/IP packet stream. A fundamental difficulty with measuring trafic behavior on the Internet is that there is simply too much data to be recorded for later analysis, on the order of gigabytes a second. As a result, network routers can collect only relatively few statistics about the data. The central problem addressed here is to use the limited memory of routers to determine essential features of the network traffic stream. A particularly difficult and representative subproblem is to determine the top k categories to which the most packets belong, for a desired value of k and for a given notion of categorization such as the destination IP address.
We present an algorithm that deterministically finds (in particular) all categories having a frequency above 1/(m+1) using m counters, which we prove is best possible in the worst case. We also present a sampling-based algorithm for the case that packet categories follow an arbitrary distribution, but their order over time is permuted uniformly at random. Under this model, our algorithm identifies flows above a frequency threshold of roughly 1/√nm with high probability, where m is the number of counters and n is the number of packets observed. This guarantee is not far off from the ideal of identifying all flows (probability 1/n), and we prove that it is best possible up to a logarithmic factor. We show that the algorithm ranks the identified flows according to frequency within any desired constant factor of accuracy.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This research is partially supported by the Natural Science and Engineering Research Council of Canada, by the Canada Research Chairs Program, and by the Nippon Telegraph and Telephone Corporation through the NTT-MIT research collaboration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
N. Alon, Y. Matias and M. Szegedy. “The space complexity of approximating the frequency moments”, STOC, 1996, pp. 20–29.
B. Bloom. “Space/time trade-offs in hash coding with allowable queries”, Comm. ACM, 13:7, July 1970, pp. 422–426.
M. Charikar, K. Chen and M. Farach-Colton. “Finding frequent items in data streams”, to appear in ICALP, 2002.
S. Chaudhuri, R. Motwani and V. Narasayya. “Random sampling for histogram construction: how much is enough”, In SIGMOD, 1998, pp. 436–447.
Cisco Systems. Sampled NetFlow, http://www.cisco.com/univercd/cc/td/doc/product/software/ios120/120newft/120limit/120s/120s11/12ssanf.htm, April 2002.
K. Claffy, G. Miller, K. Thompson. The nature of the beast: recent traffic measurements from an Internet backbone. In Proc. 8th Annual Internet Society Conference, 1998.
M. Datar, A. Gionis, P. Indyk and R. Motwani. “Maintaining stream statistics over sliding windows”, In SODA, 2002, pp. 635–644.
N.G. Duffield and M. Grossglauser. “Trajectory sampling for direct traffic observation”, In Proc. ACM SIGCOMM, 2000, pp. 271–282.
C. Estan and G. Varghese. “New directions in trafic measurement and accounting”, In Proc. ACM SIGCOMM Internet Measurement Workshop, 2001.
M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani and J. Ullman. “Computingiceberg queries efficiently”, VLDB, 1998, pp. 299–310.
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. “An approximate L1-difference algorithm for massive data streams”, In FOCS, 1999, pp. 501–511.
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. “Testing and Spot Checking of Data Streams”, In SODA, 2000, pp. 165–174.
M. J. Fischer and S. L. Salzberg. “Finding a Majority Among N Votes: Solution to Problem 81-5 (Journal of Algorithms, June 1981)”, J. Algorithms, 3(4):362–380, 1982.
W. Feller.An Introduction to Probability Theory and its Applications. 3rd Edition, John Wiley & Sons, 1968.
P. Flajolet. “Approximate counting: a detailed analysis”, BIT, 25, 1985, pp. 113–134.
P. Flajolet and G.N. Martin. “Probabilistic counting algorithms”, J. Computer and System Sciences, 31, 1985, pp. 182–209.
P. B. Gibbons and Y. Matias. “New sampling-based summary statistics for improving approximate query answers”, InProc. ACM SIGMOD International Conf. on Management of Data, June 1998, pp. 331–342.
I.D. Graham, S. F. Donelly, S. Martin, J. Martens, and J.G. Cleary. Nonintrusive and accurate measurements of unidirectional delay and delay variation in the Internet. Proc. 8th Annual Internet Society Conference, 1998.
P. Gupta and N. Mckeown. “Packet classification on multiple fields”, In Proc. ACM SIGCOMM, 1999, pp. 147–160.
P. J. Haas, J. F Naughton, S. Seshadri and L. Stokes. “Sampling-Based Estimation of the Number of Distinct Values of an Attributerd, In VLDB, 1995, pp. 311–322.
P. Indyk. “Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computations”, In FOCS, 2000, pp. 189–197.
J. G. Kalbfleisch, Probability and Statistical Inference, Springer-Verlag, 1979.
R. Mahajan and S. Floyd. “Controlling High Bandwith Flows at the Congested Router”, In Proc. 9th International Conference on Network Protocols, 2001.
R. Morris. “Counting large numbers of events in small registers”, Comm. ACM, 21, 1978, pp. 840–842.
Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms, Cambridge University Press, 1995.
J. S. Vitter. “Optimum algorithms for two random sampling problemsrd, In FOCS, 1983, pp. 65–75.
K.-Y Whang, B.T. Vander-Zanden, H. M. Taylor. “A Linear-Time Probabilistic Counting Algorithm for Database Applications”, ACM Trans. Database Systems 15(2):208–229, 1990.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Demaine, E.D., López-Ortiz, A., Munro, J.I. (2002). Frequency Estimation of Internet Packet Streams with Limited Space. In: Möhring, R., Raman, R. (eds) Algorithms — ESA 2002. ESA 2002. Lecture Notes in Computer Science, vol 2461. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45749-6_33
Download citation
DOI: https://doi.org/10.1007/3-540-45749-6_33
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44180-9
Online ISBN: 978-3-540-45749-7
eBook Packages: Springer Book Archive