Abstract
Data mining techniques for understanding how graphs evolve over time have become increasingly important. Evolving graphs arise naturally in diverse applications such as computer network topologies, multiplayer games and medical imaging. A natural and interesting problem in evolving graph analysis is the discovery of compact subgraphs that change in a similar manner. Such subgraphs are known as regions of correlated change and they can both summarise change patterns in graphs and help identify the underlying events causing these changes. However, previous techniques for discovering regions of correlated change suffer from limited scalability, making them unsuitable for analysing the evolution of very large graphs. In this paper, we introduce a new algorithm called ciForager, that addresses this scalability challenge and offers considerable improvements. The efficiency of ciForager is based on the use of new incremental techniques for detecting change, as well as the use of Voronoi representations for efficiently determining distance. We experimentally show that ciForager can achieve speedups of up to 1000 times over previous approaches. As a result, it becomes feasible for the first time to discover regions of correlated change in extremely large graphs, such as the entire BGP routing topology of the Internet.
- Aggarwal , C. C., Han, J., Wang, J., and Yu, P. S. 2003. A framework for clustering evolving data streams. In Proceedings of the 29th International Conference on Very Large Data Bases. 81--92. Google ScholarDigital Library
- Ali, M. H., Mokbel, M. F., Aref, W. G., and Kamel, I. 2005. Detection and tracking of discrete phenomena in sensor-network databases. In Proceedings of the 17th International Conference on Scientific and Statistical Database Management. 163--172. Google ScholarDigital Library
- Arlitt, M. and Jin, T. 1999. Workload characterization of the 1998 World Cup website. Tech. rep. HPL-99-35R1, Hewlett-Packard Labs.Google Scholar
- Bae, E., Bailey, J., and Dong, G. 2010. A clustering comparison measure using density profiles and its application to the discovery of alternate clusterings. Data Mining Knowl. Discov. 21, 427--471. Google ScholarDigital Library
- Bogdanov, P., Mongiovì, M., and Singh, A. K. 2011. Mining heavy subgraphs in time-evolving networks. In Proceedings of the 11th International Conference on Data Mining. 81--90. Google ScholarDigital Library
- Borgwardt, K. M., Kriegel, H.-P., and Wackersreuther, P. 2006. Pattern mining in frequent dynamic subgraphs. In Proceedings of the 6th International Conference on Data Mining. 818--822. Google ScholarDigital Library
- Celik, M., Shekhar, S., Rogers, J. P., Shine, J. A., and Yoo, J. S. 2006. Mixed-drove spatio-temporal co-occurrence pattern mining: A summary of results. In Proceedings of the 6th International Conference on Data Mining. 119--128. Google ScholarDigital Library
- Chakrabarti, D., Kumar, R., and Tomkins, A. 2006. Evolutionary clustering. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 554--560. Google ScholarDigital Library
- Chan, J., Bailey, J., and Leckie, C. 2008. Discovering correlated spatio-temporal changes in evolving graphs. Knowl. Inform. Syst. 16, 1, 53--96. Google ScholarDigital Library
- Chan, J., Bailey, J., and Leckie, C. 2009. Using graph partitioning to discover regions of correlated change spatio-temporal change in evolving graphs. Intell. Data Analy. 13, 5, 755--793. Google ScholarDigital Library
- Chi, Y., Song, X., Zhou, D., Hino, K., and Tseng, B. L. 2007. Evolutionary spectral clustering by incorporating temporal smoothness. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 153--162. Google ScholarDigital Library
- Clare, S. 1997. Functional MRI: Methods and applications. Ph.D. thesis, University of Nottingham.Google Scholar
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduction to Algorithms. MIT Press. Google ScholarDigital Library
- de Berg, M., Cheong, O., van Krevel D, M., and Overmars, M. 2008. Computational Geometry: Algorithms and Applications. Springer-Verlag. Google ScholarCross Ref
- Du, N., Wang, H., and Faloutsos, C. 2010. Analysis of large multi-modal social networks: Patterns and a generator. In Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science Series, vol. 6321, Springer Berlin. 393--408. Google ScholarDigital Library
- Elnekave, S., Last, M., and Maimon, O. 2007. Incremental clustering of mobile objects. In Proceedings of the 15th Annual ACM International Symposium on Advances in Geographic Information Systems. 585--592. Google ScholarDigital Library
- Erwig, M. 2000. The graph Voronoi diagram with applications. Netw. 36, 3, 156--163.Google ScholarCross Ref
- Gibson, D., Kumar, R., and Tomkins, A. 2005. Discovering large dense subgraphs in massive graphs. In Proceedings of the 31st International Conference on Very Large Data Bases. 721--732. Google ScholarDigital Library
- Halkidi, M., Batisakis, Y., and Vazirgiannis, M. 2001. On clustering validation techniques. J. Intell. Inform. Syst. 17, 2--3, 107--145. Google ScholarDigital Library
- Honiden, S., Houle, M. E., and Sommer, C. 2009. Balancing graph Voronoi diagrams. In Proceedings of the Sixth International Symposium on Voronoi Diagrams. IEEE Computer Society, 183--191. Google ScholarDigital Library
- Jain, A. K. and Dubes, R. C. 1998. Algorithms for Clustering Data. Prentice-Hall, Inc. Google ScholarDigital Library
- Kumar, R., Novak, J., Raghavan, P., and Tomkins, A. S. 2003. On the bursty evolution of blogspace. In Proceedings of the 12th International Conference on World Wide Web. 568--576. Google ScholarDigital Library
- Kumar, R., Novak, J., and Tomkins, A. S. 2006. Structure and evolution of online social networks. In Proceedings of the 12th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (poster). Google ScholarDigital Library
- Lahiri, M. and Berger-Wolf, T. Y. 2010. Periodic subgraph mining in dynamic networks. Knowl. Inform. Syst. 24, 467--497. Google ScholarDigital Library
- Lauw, H. W., Lim, E.-P., Tan, T.-T., and Pang, H.-H. 2005. Mining social networks from spatio-temporal events. In Workshop on Link Analysis, Couterterrorism and Security.Google Scholar
- Leskovec, J., Kleinberg, J., and Faloutsos, C. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. 177--187. Google ScholarDigital Library
- Luenberger, D. 2003. Linear and Nonlinear Programming. Kluwer Academic Publishers.Google Scholar
- Meila, M. 2003. Comparing clusterings by the variation of information. In Proceedings of the Conference on Learning Theory and Kernel Machines. 173--187.Google ScholarCross Ref
- Shoubridge, P. J., Kraetzl, M., Wallis, W. D., and Bunke, H. 2002. Detection of abnormal change in a time series of graphs. J. Interconn. Netw. 3, 1-2, 85--101.Google ScholarCross Ref
- Steinder, M. and Sethi, A. S. 2004. A survey of fault localization techniques in computer networks. Sci. Comput. Program. 53, 2, 165--194.Google ScholarCross Ref
- Sun, J., Papadimitriou, S., Yu, P. S., and Faloutsos, C. 2007. Graphscope: Parameter-free mining of large time-evolving graphs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 687--696. Google ScholarDigital Library
- Sun, J., Tao, D., and Faloutsos, C. 2006. Beyond streams and graphs: Dynamic tensor analysis. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 374--383. Google ScholarDigital Library
- Thon, I., Landwehr, N., and Raedt, L. D. 2008. A simple model for sequences of relational state descriptions. In Proceedings of the 19th European Conference on Machine Learning. 506--521 Google ScholarDigital Library
- Yang, H., Parthasarathy, S., and Mehta, S. 2005. A generalized framework for mining spatio-temporal patterns in scientific data. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. 716--721. Google ScholarDigital Library
- Zhou, A., Cao, F., Qian, W., and Jin, C. 2007. Tracking clusters in evolving data streams over sliding windows. Knowl. Inform. Syst. 181--214. Google ScholarDigital Library
- Zhou, D., Li, J., and Zha, H. 2005. A new mallows distance based metric for comparing clusterings. In Proceedings of the 22nd International Conference on Machine Learning. 1028--1035. Google ScholarDigital Library
Index Terms
ciForager: Incrementally discovering regions of correlated change in evolving graphs
Recommendations
Frequent subgraph discovery in large attributed streaming graphs
BIGMINE'14: Proceedings of the 3rd International Conference on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications - Volume 36The problem of finding frequent subgraphs in large dynamic graphs has so far only considered a dynamic graph as being represented by a series of static snapshots taken at various points in time. This representation of a dynamic graph does not lend ...
Note: Size of monochromatic components in local edge colorings
An edge coloring of a graph is a local r coloring if the edges incident to any vertex are colored with at most r distinct colors. We determine the size of the largest monochromatic component that must occur in any local r coloring of a complete graph or ...
Connectivity structure of bipartite graphs via the KNC-plot
WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data MiningIn this paper we introduce the k-neighbor connectivity plot, or KNC-plot, as a tool to study the macroscopic connectiv-ity structure of sparse bipartite graphs. Given a bipartite graph G = (U, V, E), we say that two nodes in U are k-neighbors if there ...
Comments