Abstract
Over the last decade, there has been considerable interest in designing algorithms for processing massive graphs in the data stream model. The original motivation was two-fold: a) in many applications, the dynamic graphs that arise are too large to be stored in the main memory of a single machine and b) considering graph problems yields new insights into the complexity of stream computation. However, the techniques developed in this area are now finding applications in other areas including data structures for dynamic graphs, approximation algorithms, and distributed and parallel computation. We survey the state-of-the-art results; identify general techniques; and highlight some simple algorithms that illustrate basic ideas.
- K. J. Ahn. Analyzing massive graphs in the semi-streaming model. PhD thesis, University of Pennsylvania, Philadelphia, Pennsylvania, Jan. 2013.Google Scholar
- K. J. Ahn and S. Guha. Graph sparsification in the semi-streaming model. In International Colloquium on Automata, Languages and Programming, pages 328--338, 2009. Google ScholarDigital Library
- K. J. Ahn and S. Guha. Access to data and number of iterations: Dual primal algorithms for maximum matching under resource constraints. CoRR, abs/1307.4359, 2013.Google Scholar
- K. J. Ahn and S. Guha. Linear programming in the semi-streaming model with application to the maximum matching problem. Inf. Comput., 222:59--79, 2013. Google ScholarDigital Library
- K. J. Ahn, S. Guha, and A. McGregor. Analyzing graph structure via linear measurements. In ACM-SIAM Symposium on Discrete Algorithms, pages 459--467, 2012. Google ScholarDigital Library
- K. J. Ahn, S. Guha, and A. McGregor. Graph sketches: sparsification, spanners, and subgraphs. In ACM Symposium on Principles of Database Systems, pages 5--14, 2012. Google ScholarDigital Library
- K. J. Ahn, S. Guha, and A. McGregor. Spectral sparsification of dynamic graph streams. In International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, 2013.Google ScholarCross Ref
- M. Badoiu, A. Sidiropoulos, and V. Vaikuntanathan. Computing s-t min-cuts in a semi-streaming model. Manuscript.Google Scholar
- B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest subgraph in streaming and mapreduce. PVLDB, 5(5):454--465, 2012. Google ScholarDigital Library
- Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In ACM-SIAM Symposium on Discrete Algorithms, pages 623--632, 2002. Google ScholarDigital Library
- S. Baswana. Streaming algorithm for graph spanners - single pass and constant processing time per edge. Inf. Process. Lett., 106(3):110--114, 2008. Google ScholarDigital Library
- J. D. Batson, D. A. Spielman, and N. Srivastava. Twice-ramanujan sparsifiers. SIAM J. Comput., 41(6):1704--1721, 2012.Google ScholarDigital Library
- L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient algorithms for large-scale local triangle counting. TKDD, 4(3), 2010. Google ScholarDigital Library
- A. A. Benczúr and D. R. Karger. Approximating s-t minimum cuts in ¿O(n2) time. In ACM Symposium on Theory of Computing, pages 47--55, 1996. Google ScholarDigital Library
- B. Bollobás. Extremal Graph Theory. Academic Press, New York, 1978.Google ScholarDigital Library
- V. Braverman and R. Ostrovsky. Smooth histograms for sliding windows. In IEEE Symposium on Foundations of Computer Science, pages 283--293, 2007. Google ScholarDigital Library
- V. Braverman, R. Ostrovsky, and D. Vilenchik. How hard is counting triangles in the streaming model? In International Colloquium on Automata, Languages and Programming, pages 244--254, 2013. Google ScholarDigital Library
- L. S. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler. Counting triangles in data streams. In ACM Symposium on Principles of Database Systems, pages 253--262, 2006. Google ScholarDigital Library
- A. Chakrabarti, G. Cormode, and A. McGregor. Robust lower bounds for communication and stream computation. In ACM Symposium on Theory of Computing, pages 641--650, 2008. Google ScholarDigital Library
- A. Chakrabarti and S. Kale. Submodular maximization meets streaming: Matchings, matroids, and more. CoRR, arXiv:1309.2038, 2013.Google Scholar
- G. Cormode and S. Muthukrishnan. Space efficient mining of multigraph streams. In ACM Symposium on Principles of Database Systems, pages 271--282, 2005. Google ScholarDigital Library
- M. S. Crouch, A. McGregor, and D. Stubbs. Dynamic graphs in the sliding-window model. In European Symposium on Algorithms, pages 337--348, 2013.Google ScholarCross Ref
- M. Elkin. Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners. ACM Transactions on Algorithms, 7(2):20, 2011. Google ScholarDigital Library
- M. Elkin and J. Zhang. Efficient algorithms for constructing (1 + e, ß)-spanners in the distributed and streaming models. Distributed Computing, 18(5):375--385, 2006.Google ScholarDigital Library
- L. Epstein, A. Levin, J. Mestre, and D. Segev. Improved approximation guarantees for weighted matching in the semi-streaming model. SIAM J. Discrete Math., 25(3):1251--1265, 2011.Google ScholarCross Ref
- L. Epstein, A. Levin, D. Segev, and O. Weimann. Improved bounds for online preemptive matching. In STACS, pages 389--399, 2013.Google Scholar
- J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. Theoretical Computer Science, 348(2-3):207--216, 2005. Google ScholarDigital Library
- J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. Graph distances in the data-stream model. SIAM Journal on Computing, 38(5):1709--1727, 2008. Google ScholarDigital Library
- W. S. Fung, R. Hariharan, N. J. A. Harvey, and D. Panigrahi. A general framework for graph sparsification. In ACM Symposium on Theory of Computing, pages 71--80, 2011. Google ScholarDigital Library
- A. Goel, M. Kapralov, and S. Khanna. On the communication and streaming complexity of maximum bipartite matching. In ACM-SIAM Symposium on Discrete Algorithms, pages 468--485, 2012. Google ScholarDigital Library
- A. Goel, M. Kapralov, and I. Post. Single pass sparsification in the streaming model with edge deletions. CoRR, abs/1203.4900, 2012.Google Scholar
- O. Goldreich. Introduction to testing graph properties. In O. Goldreich, editor, Studies in Complexity and Cryptography, volume 6650 of Lecture Notes in Computer Science, pages 470--506. Springer, 2011. Google Scholar
- V. Guruswami and K. Onak. Superlinear lower bounds for multipass graph processing. In IEEE Conference on Computational Complexity, pages 287--298, 2013.Google ScholarCross Ref
- B. V. Halldórsson, M. M. Halldórsson, E. Losievskaja, and M. Szegedy. Streaming algorithms for independent sets. In International Colloquium on Automata, Languages and Programming, pages 641--652, 2010. Google ScholarDigital Library
- M. M. Halldórsson, X. Sun, M. Szegedy, and C. Wang. Streaming and communication complexity of clique approximation. In International Colloquium on Automata, Languages and Programming, pages 449--460, 2012. Google ScholarDigital Library
- M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. External memory algorithms, pages 107--118, 1999. Google ScholarDigital Library
- M. Jha, C. Seshadhri, and A. Pinar. A space efficient streaming algorithm for triangle counting using the birthday paradox. In KDD, pages 589--597, 2013. Google ScholarDigital Library
- M. Jha, C. Seshadhri, and A. Pinar. When a graph is not so simple: Counting triangles in multigraph streams. CoRR, arXiv:1310.7665, 2013.Google Scholar
- H. Jowhari and M. Ghodsi. New streaming algorithms for counting triangles in graphs. In COCOON, pages 710--716, 2005. Google ScholarDigital Library
- H. Jowhari, M. Saglam, and G. Tardos. Tight bounds for lp samplers, finding duplicates in streams, and related problems. In ACM Symposium on Principles of Database Systems, pages 49--58, 2011. Google ScholarDigital Library
- D. M. Kane, K. Mehlhorn, T. Sauerwald, and H. Sun. Counting arbitrary subgraphs in data streams. In International Colloquium on Automata, Languages and Programming, pages 598--609, 2012. Google ScholarDigital Library
- M. Kapralov. Better bounds for matchings in the streaming model. In ACM-SIAM Symposium on Discrete Algorithms, pages 1679--1697, 2013. Google ScholarDigital Library
- M. Kapralov, S. Khanna, and M. Sudan. Approximating matching size from random streams. In ACM-SIAM Symposium on Discrete Algorithms, 2014.Google ScholarCross Ref
- B. M. Kapron, V. King, and B. Mountjoy. Dynamic graph connectivity in polylogarithmic worst case time. In ACM-SIAM Symposium on Discrete Algorithms, pages 1131--1142, 2013. Google ScholarDigital Library
- D. R. Karger. Random sampling in cut, flow, and network design problems. In ACM Symposium on Theory of Computing, pages 648--657, 1994. Google ScholarDigital Library
- J. A. Kelner and A. Levin. Spectral sparsification in the semi-streaming setting. Theory Comput. Syst., 53(2):243--262, 2013.Google ScholarCross Ref
- C. Konrad, F. Magniez, and C. Mathieu. Maximum matching in semi-streaming with few passes. In APPROX-RANDOM, pages 231--242, 2012.Google ScholarCross Ref
- C. Konrad and A. Rosén. Approximating semi-matchings in streaming and in two-party communication. In International Colloquium on Automata, Languages and Programming, pages 637--649, 2013. Google ScholarDigital Library
- K. Kutzkov and R. Pagh. On the streaming complexity of computing local clustering coefficients. In WSDM, pages 677--686, 2013. Google ScholarDigital Library
- M. Manjunath, K. Mehlhorn, K. Panagiotou, and H. Sun. Approximate counting of cycles in streams. In European Symposium on Algorithms, pages 677--688, 2011. Google ScholarDigital Library
- A. McGregor. Finding graph matchings in data streams. In APPROX-RANDOM, pages 170--181, 2005. Google ScholarDigital Library
- S. Muthukrishnan. Data Streams: Algorithms and Applications. Now Publishers, 2006.Google Scholar
- R. Pagh and C. E. Tsourakakis. Colorful triangle counting and a mapreduce implementation. Inf. Process. Lett., 112(7):277--281, 2012. Google ScholarDigital Library
- A. Pavan, K. Tangwongsan, S. Tirthapura, and K.-L. Wu. Counting and sampling triangles from a graph stream. In International Conference on Very Large Data Bases, 2013. Google ScholarDigital Library
- J. M. Phillips, E. Verbin, and Q. Zhang. Lower bounds for number-in-hand multiparty communication complexity, made easy. In ACM-SIAM Symposium on Discrete Algorithms, pages 486--501, 2012. Google ScholarDigital Library
- A. D. Sarma, S. Gollapudi, and R. Panigrahy. Estimating pagerank on graph streams. J. ACM, 58(3):13, 2011. Google ScholarDigital Library
- A. D. Sarma, R. J. Lipton, and D. Nanongkai. Best-order streaming model. Theor. Comput. Sci., 412(23):2544--2555, 2011. Google ScholarDigital Library
- D. A. Spielman and N. Srivastava. Graph sparsification by effective resistances. SIAM J. Comput., 40(6):1913--1926, 2011. Google ScholarDigital Library
- D. A. Spielman and S.-H. Teng. Spectral sparsification of graphs. SIAM J. Comput., 40(4):981--1025, 2011. Google ScholarDigital Library
- R. Tarjan. Data Structures and Network Algorithms. SIAM, Philadelphia, 1983. Google ScholarDigital Library
- A. B. Varadaraja. Buyback problem - approximate matroid intersection with cancellation costs. In International Colloquium on Automata, Languages and Programming, pages 379--390, 2011. Google ScholarDigital Library
- M. Zelke. Weighted matching in the semi-streaming model. Algorithmica, 62(1-2):1--20, 2012. Google ScholarDigital Library
Index Terms
- Graph stream algorithms: a survey
Recommendations
Research on data stream clustering algorithms
Data stream is a potentially massive, continuous, rapid sequence of data information. It has aroused great concern and research upsurge in the field of data mining. Clustering is an effective tool of data mining, so data stream clustering will ...
Graph Stream Summarization: From Big Bang to Big Crunch
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataA graph stream, which refers to the graph with edges being updated sequentially in a form of a stream, has important applications in cyber security and social networks. Due to the sheer volume and highly dynamic nature of graph streams, the practical ...
Clustering data stream: A survey of algorithms
A data stream is a massive, continuous and rapid sequence of data elements. The data stream model requires algorithms to make a single pass over the data, with bounded memory and limited processing time, whereas the stream may be highly dynamic and ...
Comments