research-article

Inferring Networks of Diffusion and Influence

Authors:
Manuel Gomez-Rodriguez

Stanford University and MPI for Intelligent Systems

Stanford University and MPI for Intelligent Systems
View Profile

,
Jure Leskovec

Stanford University

Stanford University
View Profile

,
Andreas Krause

ETH Zürich and California Institute of Technology

ETH Zürich and California Institute of Technology
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 5 Issue 4Article No.: 21pp 1–37https://doi.org/10.1145/2086737.2086741

Published:01 February 2012Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Information diffusion and virus propagation are fundamental processes taking place in networks. While it is often possible to directly observe when nodes become infected with a virus or publish the information, observing individual transmissions (who infects whom, or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NP-hard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and finds provably near-optimal networks.

We demonstrate the effectiveness of our approach by tracing information diffusion in a set of 170 million blogs and news articles over a one year period to infer how information flows through the online media space. We find that the diffusion network of news for the top 1,000 media sites and blogs tends to have a core-periphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them.

References

Adar, E. and Adamic, L. A. 2005. Tracking information epidemics in blogspace. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI). 207--214. Google ScholarDigital Library
Adar, E., Zhang, L., Adamic, L. A., and Lukose, R. M. 2004. Implicit structure and the dynamics of blogspace. In Proceedings of the 13th International World Wide Web Conference (WWW). Workshop on the Weblogging Ecosystem.Google Scholar
Ahmed, A. and Xing, E. 2009. Recovering time-varying networks of dependencies in social and biological studies. Proc. Nat. Acad. Sci. 106.Google Scholar
Anderson, R. M. and May, R. M. 2002. Infectious Diseases Of Humans: Dynamics and Control. Oxford Press.Google Scholar
Backstrom, L. and Leskovec, J. 2011. Supervised random walks: Predicting and recommending links in social networks. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM). Google ScholarDigital Library
Bailey, N. T. J. 1975. The Mathematical Theory of Infectious Diseases and its Applications 2nd Ed. Hafner Press.Google Scholar
Barabási, A.-L. 2005. The origin of bursts and heavy tails in human dynamics. Nature 435, 207.Google ScholarCross Ref
Barabási, A.-L. and Albert, R. 1999. Emergence of scaling in random networks. Science 286, 509--512.Google ScholarCross Ref
Butte, A. and Kohane, I. 2000. Mutual information relevance networks: Functional genomic clustering using pairwise entropy measurements. In Proceedings of the Pacific Symposium on Biocomputing Vol. 5., 418--429.Google Scholar
Clauset, A., Moore, C., and Newman, M. E. J. 2008. Hierarchical structure and the prediction of missing links in networks. Nature 453, 7191, 98--101.Google Scholar
Crane, R. and Sornette, D. 2008. Robust dynamic classes revealed by measuring the response function of a social system. Proc. Nat. Acad. Sci. 105, 41, 15649--15653.Google ScholarCross Ref
Domingos, P. and Richardson, M. 2001. Mining the network value of customers. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarDigital Library
Edmonds, J. 1967. Optimum branchings. J. Res. Nat Bureau Stand. 71B, 233--240.Google ScholarCross Ref
Erdős, P. and Rényi, A. 1960. On the evolution of random graphs. Publ. Math. Inst. Hungarian Acad. Sci. 5, 17--67.Google Scholar
Friedman, N. and Koller, D. 2003. Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Mach. Learn. 50, 1, 95--125.Google ScholarCross Ref
Friedman, N., Nachman, I., and Pe’er, D. 1999. Learning Bayesian network structure from massive datasets: The “Sparse Candidate” algorithm. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI). Google ScholarDigital Library
Friedman, J., Hastie, T., and Tibshirani, R. 2008. Sparse inverse covariance estimation with the graphical lasso. Biostat 9, 3, 432--441.Google ScholarCross Ref
Getoor, L., Friedman, N., Koller, D., and Taskar, B. 2003. Learning probabilistic models of link structure. J. Mach. Learn. Res. 3, 707. Google ScholarDigital Library
Ghahramani, Z. 1998. Learning dynamic Bayesian networks. In Adaptive Processing of Sequences and Data Structures, C. Lee Giles, Marco Gori Eds., Springer. Google ScholarDigital Library
Ghosh, R. and Lerman, K. 2011. A framework for quantitative analysis of cascades on networks. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM). 665--674. Google ScholarDigital Library
Gomez-Rodriguez, M., Balduzzi, D., and Schölkopf, B. 2011. Uncovering the temporal dynamics of diffusion networks. In Proceedings of the 28th International Conference on Machine Learning (ICML). 561--568.Google Scholar
Goodman, L. A. 1961. Snowball sampling. Annals Math. Statist. 32, 1, 148--170.Google ScholarCross Ref
Goyal, A., Bonchi, F., and Lakshmanan, L. 2010. Learning influence probabilities in social networks. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM). 241--250. Google ScholarDigital Library
Gruhl, D., Guha, R., Liben-Nowell, D., and Tomkins, A. 2004. Information diffusion through blogspace. In Proceedings of the 13th International Conference on World Wide Web (WWW). 491--501. Google ScholarDigital Library
Heckathorn, D. 1997. Respondent-driven sampling: A new approach to the study of hidden populations. Soc. Prob. 44, 2, 174--199.Google ScholarCross Ref
Hethcote, H. W. 2000. The mathematics of infectious diseases. SIAM Rev. 42, 4, 599--653. Google ScholarDigital Library
Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N., Chung, S., Emili, A., Snyder, M., Greeblatt, J., and Gerstein, M. 2003. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 5644, 449--453.Google Scholar
Katz, E. and Lazarsfeld, P. 1955. Personal Influence: The Part Played By People in The Flow of Mass Communications. Free Press.Google Scholar
Kearns, M., Suri, S., and Montfort, N. 2006. An experimental study of the coloring problem on human subject networks. Science 313, 5788, 824.Google ScholarCross Ref
Kempe, D., Kleinberg, J. M., and Tardos, E. 2003. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 137--146. Google ScholarDigital Library
Khuller, S., Moss, A., and Naor, J. 1999. The budgeted maximum coverage problem. Inform. Process. Lett. 70, 1, 39--45. Google ScholarDigital Library
Knuth, D. 1968. The Art of Computer Programming. Addison-Wesley.Google Scholar
Kumar, R., Novak, J., Raghavan, P., and Tomkins, A. 2004. Structure and evolution of blogspace. Comm. ACM 47, 12, 35--39. Google ScholarDigital Library
Leskovec, J. and Faloutsos, C. 2007. Scalable modeling of real graphs using Kronecker multiplication. In Proceedings of the 24th International Conference on Machine Learning (ICML). 504. Google ScholarDigital Library
Leskovec, J., Kleinberg, J., and Faloutsos, C. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery In Data Mining (KDD). Google ScholarDigital Library
Leskovec, J., Adamic, L. A., and Huberman, B. A. 2006a. The dynamics of viral marketing. In Proceedings of the 7th ACM Conference on Electronic Commerce (EC). 228--237. Google ScholarDigital Library
Leskovec, J., Singh, A., and Kleinberg, J. M. 2006b. Patterns of influence in a recommendation network. In Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). 380--389. Google ScholarDigital Library
Leskovec, J., Kleinberg, J. M., and Faloutsos, C. 2007a. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1, 1, 2. Google ScholarDigital Library
Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., and Glance, N. 2007b. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 420--429. Google ScholarDigital Library
Leskovec, J., McGlohon, M., Faloutsos, C., Glance, N., and Hurst, M. 2007c. Cascading behavior in large blog graphs. In Proceedings of the SIAM Conference on Data Mining (SDM).Google Scholar
Leskovec, J., Lang, K. J., Dasgupta, A., and Mahoney, M. W. 2008. Statistical properties of community structure in large social and information networks. In Proceedings of the 17th International Conference on World Wide Web (WWW). Google ScholarDigital Library
Leskovec, J., Backstrom, L., and Kleinberg, J. 2009. Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, NY, 497--506. Google ScholarDigital Library
Liben-Nowell, D. and Kleinberg, J. 2003. The link prediction problem for social networks. In Proceedings of the International Conference on Information and Knowledge Management (CIKM). 556--559. Google ScholarDigital Library
Liben-Nowell, D. and Kleinberg, J. 2008. Tracing the flow of information on a global scale using Internet chain-letter data. Proc. Nat. Acad. Sci. 105, 12, 4633--4638.Google ScholarCross Ref
Lippert, C., Stegle, O., Ghahramani, Z., and Borgwardt, K. 2009. A kernel method for unsupervised structured network inference. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS).Google Scholar
Malmgren, R. D., Stouffer, D. B., Motter, A. E., and Amaral, L. A. A. N. 2008. A Poissonian explanation for heavy tails in e-mail communication. Proc. Nat. Acad. Sci. 105, 47, 18153--18158.Google ScholarCross Ref
Meinshausen, N. and Buehlmann, P. 2006. High-dimensional graphs and variable selection with the lasso. Annals Statist. 34, 1436--1462.Google ScholarCross Ref
Myers, S. and Leskovec, J. 2010. On the convexity of latent social network inference. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
Nemhauser, G., Wolsey, L., and Fisher, M. 1978. An analysis of approximations for maximizing submodular set functions. Math. Prog. 14, 1, 265--294.Google ScholarDigital Library
Rogers, E. M. 1995. Diffusion of Innovations Fourth Ed. Free Press, New York.Google Scholar
Romero, D., Meeder, B., and Kleinberg, J. 2011. Differences in the mechanics of information diffusion across topics: Idioms, political hashtags, and complex contagion on Twitter. In Proceedings of the 20th International Conference on World Wide Web (WWW). ACM, 695--704. Google ScholarDigital Library
Sadikov, S., Medina, M., Leskovec, J., and Garcia-Molina, H. 2011. Correcting for missing data in information cascades. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM). Google ScholarDigital Library
Schmidt, M., Niculescu-Mizil, A., and Murphy, K. 2007. Learning graphical model structure using L1-regularization paths. In Proceedings of the 21st Conference on Artificial Intelligence (AAAI). Vol. 22. Google ScholarDigital Library
Song, L., Kolar, M., and Xing, E. 2009. Time-varying dynamic Bayesian networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
Strang, D. and Soule, S. A. 1998. Diffusion in organizations and social movements: From hybrid corn to poison pills. Annual Rev. Sociology 24, 265--290.Google ScholarCross Ref
Taskar, B., Wong, M. F., Abbeel, P., and Koller, D. 2003. Link prediction in relational data. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
Tutte, W. 1948. The disection of equilateral triangles into equilateral triangles. Proc. Cambridge Philos. Soc. 44, 63--482.Google ScholarCross Ref
Ver Steeg, G., Ghosh, R., and Lerman, K. 2011. What stops social epidemics? In Proceedings of the 5th International Conference on Weblogs and Social Media (ICWSM0).Google Scholar
Vert, J. and Yamanishi, Y. 2005. Supervised graph inference. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
Wainwright, M. J., Ravikumar, P., and Lafferty, J. D. 2006. High-dimensional graphical model selection using l1-regularized logistic regression. Proc. Nat. Acad. Sci.Google Scholar
Wallinga, J. and Teunis, P. 2004. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. Amer. J. Epidemiology 160, 6, 509--516.Google ScholarCross Ref
Watts, D. J. and Dodds, P. S. 2007. Influentials, networks, and public opinion formation. J. Consumer Res. 34, 4, 441--458.Google ScholarCross Ref

Index Terms

Inferring Networks of Diffusion and Influence
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Inferring networks of diffusion and influence
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Information diffusion and virus propagation are fundamental processes talking place in networks. While it is often possible to directly observe when nodes become infected, observing individual transmissions (i.e., who infects whom or who influences whom)...
Read More
Structure and dynamics of information pathways in online media
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data mining

Diffusion of information, spread of rumors and infectious diseases are all instances of stochastic processes that occur over the edges of an underlying network. Many times networks over which contagions spread are unobserved, and such networks are often ...
Read More
NIFTY: a system for large scale information flow tracking and clustering
WWW '13: Proceedings of the 22nd international conference on World Wide Web

The real-time information on news sites, blogs and social networking sites changes dynamically and spreads rapidly through the Web. Developing methods for handling such information at a massive scale requires that we think about how information content ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Knowledge Discovery from Data Volume 5, Issue 4
February 2012
176 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/2086737
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 February 2012
- Accepted: 1 November 2011
- Revised: 1 October 2011
- Received: 1 December 2010
Published in tkdd Volume 5, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Networks of diffusion
blogs
information cascades
meme-tracking
news media
social networks
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 284
  Total Citations
  View Citations
- 3,781
  Total Downloads
- Downloads (Last 12 months)241
- Downloads (Last 6 weeks)33
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Inferring Networks of Diffusion and Influence

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Inferring networks of diffusion and influence

Structure and dynamics of information pathways in online media

NIFTY: a system for large scale information flow tracking and clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Inferring Networks of Diffusion and Influence

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Inferring networks of diffusion and influence

Structure and dynamics of information pathways in online media

NIFTY: a system for large scale information flow tracking and clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media