skip to main content
research-article

Inferring Networks of Diffusion and Influence

Published:01 February 2012Publication History
Skip Abstract Section

Abstract

Information diffusion and virus propagation are fundamental processes taking place in networks. While it is often possible to directly observe when nodes become infected with a virus or publish the information, observing individual transmissions (who infects whom, or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NP-hard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and finds provably near-optimal networks.

We demonstrate the effectiveness of our approach by tracing information diffusion in a set of 170 million blogs and news articles over a one year period to infer how information flows through the online media space. We find that the diffusion network of news for the top 1,000 media sites and blogs tends to have a core-periphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them.

References

  1. Adar, E. and Adamic, L. A. 2005. Tracking information epidemics in blogspace. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI). 207--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Adar, E., Zhang, L., Adamic, L. A., and Lukose, R. M. 2004. Implicit structure and the dynamics of blogspace. In Proceedings of the 13th International World Wide Web Conference (WWW). Workshop on the Weblogging Ecosystem.Google ScholarGoogle Scholar
  3. Ahmed, A. and Xing, E. 2009. Recovering time-varying networks of dependencies in social and biological studies. Proc. Nat. Acad. Sci. 106.Google ScholarGoogle Scholar
  4. Anderson, R. M. and May, R. M. 2002. Infectious Diseases Of Humans: Dynamics and Control. Oxford Press.Google ScholarGoogle Scholar
  5. Backstrom, L. and Leskovec, J. 2011. Supervised random walks: Predicting and recommending links in social networks. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bailey, N. T. J. 1975. The Mathematical Theory of Infectious Diseases and its Applications 2nd Ed. Hafner Press.Google ScholarGoogle Scholar
  7. Barabási, A.-L. 2005. The origin of bursts and heavy tails in human dynamics. Nature 435, 207.Google ScholarGoogle ScholarCross RefCross Ref
  8. Barabási, A.-L. and Albert, R. 1999. Emergence of scaling in random networks. Science 286, 509--512.Google ScholarGoogle ScholarCross RefCross Ref
  9. Butte, A. and Kohane, I. 2000. Mutual information relevance networks: Functional genomic clustering using pairwise entropy measurements. In Proceedings of the Pacific Symposium on Biocomputing Vol. 5., 418--429.Google ScholarGoogle Scholar
  10. Clauset, A., Moore, C., and Newman, M. E. J. 2008. Hierarchical structure and the prediction of missing links in networks. Nature 453, 7191, 98--101.Google ScholarGoogle Scholar
  11. Crane, R. and Sornette, D. 2008. Robust dynamic classes revealed by measuring the response function of a social system. Proc. Nat. Acad. Sci. 105, 41, 15649--15653.Google ScholarGoogle ScholarCross RefCross Ref
  12. Domingos, P. and Richardson, M. 2001. Mining the network value of customers. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Edmonds, J. 1967. Optimum branchings. J. Res. Nat Bureau Stand. 71B, 233--240.Google ScholarGoogle ScholarCross RefCross Ref
  14. Erdős, P. and Rényi, A. 1960. On the evolution of random graphs. Publ. Math. Inst. Hungarian Acad. Sci. 5, 17--67.Google ScholarGoogle Scholar
  15. Friedman, N. and Koller, D. 2003. Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Mach. Learn. 50, 1, 95--125.Google ScholarGoogle ScholarCross RefCross Ref
  16. Friedman, N., Nachman, I., and Pe’er, D. 1999. Learning Bayesian network structure from massive datasets: The “Sparse Candidate” algorithm. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Friedman, J., Hastie, T., and Tibshirani, R. 2008. Sparse inverse covariance estimation with the graphical lasso. Biostat 9, 3, 432--441.Google ScholarGoogle ScholarCross RefCross Ref
  18. Getoor, L., Friedman, N., Koller, D., and Taskar, B. 2003. Learning probabilistic models of link structure. J. Mach. Learn. Res. 3, 707. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ghahramani, Z. 1998. Learning dynamic Bayesian networks. In Adaptive Processing of Sequences and Data Structures, C. Lee Giles, Marco Gori Eds., Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ghosh, R. and Lerman, K. 2011. A framework for quantitative analysis of cascades on networks. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM). 665--674. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Gomez-Rodriguez, M., Balduzzi, D., and Schölkopf, B. 2011. Uncovering the temporal dynamics of diffusion networks. In Proceedings of the 28th International Conference on Machine Learning (ICML). 561--568.Google ScholarGoogle Scholar
  22. Goodman, L. A. 1961. Snowball sampling. Annals Math. Statist. 32, 1, 148--170.Google ScholarGoogle ScholarCross RefCross Ref
  23. Goyal, A., Bonchi, F., and Lakshmanan, L. 2010. Learning influence probabilities in social networks. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM). 241--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gruhl, D., Guha, R., Liben-Nowell, D., and Tomkins, A. 2004. Information diffusion through blogspace. In Proceedings of the 13th International Conference on World Wide Web (WWW). 491--501. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Heckathorn, D. 1997. Respondent-driven sampling: A new approach to the study of hidden populations. Soc. Prob. 44, 2, 174--199.Google ScholarGoogle ScholarCross RefCross Ref
  26. Hethcote, H. W. 2000. The mathematics of infectious diseases. SIAM Rev. 42, 4, 599--653. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N., Chung, S., Emili, A., Snyder, M., Greeblatt, J., and Gerstein, M. 2003. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 5644, 449--453.Google ScholarGoogle Scholar
  28. Katz, E. and Lazarsfeld, P. 1955. Personal Influence: The Part Played By People in The Flow of Mass Communications. Free Press.Google ScholarGoogle Scholar
  29. Kearns, M., Suri, S., and Montfort, N. 2006. An experimental study of the coloring problem on human subject networks. Science 313, 5788, 824.Google ScholarGoogle ScholarCross RefCross Ref
  30. Kempe, D., Kleinberg, J. M., and Tardos, E. 2003. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 137--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Khuller, S., Moss, A., and Naor, J. 1999. The budgeted maximum coverage problem. Inform. Process. Lett. 70, 1, 39--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Knuth, D. 1968. The Art of Computer Programming. Addison-Wesley.Google ScholarGoogle Scholar
  33. Kumar, R., Novak, J., Raghavan, P., and Tomkins, A. 2004. Structure and evolution of blogspace. Comm. ACM 47, 12, 35--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Leskovec, J. and Faloutsos, C. 2007. Scalable modeling of real graphs using Kronecker multiplication. In Proceedings of the 24th International Conference on Machine Learning (ICML). 504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Leskovec, J., Kleinberg, J., and Faloutsos, C. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery In Data Mining (KDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Leskovec, J., Adamic, L. A., and Huberman, B. A. 2006a. The dynamics of viral marketing. In Proceedings of the 7th ACM Conference on Electronic Commerce (EC). 228--237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Leskovec, J., Singh, A., and Kleinberg, J. M. 2006b. Patterns of influence in a recommendation network. In Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). 380--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Leskovec, J., Kleinberg, J. M., and Faloutsos, C. 2007a. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1, 1, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., and Glance, N. 2007b. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 420--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Leskovec, J., McGlohon, M., Faloutsos, C., Glance, N., and Hurst, M. 2007c. Cascading behavior in large blog graphs. In Proceedings of the SIAM Conference on Data Mining (SDM).Google ScholarGoogle Scholar
  41. Leskovec, J., Lang, K. J., Dasgupta, A., and Mahoney, M. W. 2008. Statistical properties of community structure in large social and information networks. In Proceedings of the 17th International Conference on World Wide Web (WWW). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Leskovec, J., Backstrom, L., and Kleinberg, J. 2009. Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, NY, 497--506. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Liben-Nowell, D. and Kleinberg, J. 2003. The link prediction problem for social networks. In Proceedings of the International Conference on Information and Knowledge Management (CIKM). 556--559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Liben-Nowell, D. and Kleinberg, J. 2008. Tracing the flow of information on a global scale using Internet chain-letter data. Proc. Nat. Acad. Sci. 105, 12, 4633--4638.Google ScholarGoogle ScholarCross RefCross Ref
  45. Lippert, C., Stegle, O., Ghahramani, Z., and Borgwardt, K. 2009. A kernel method for unsupervised structured network inference. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS).Google ScholarGoogle Scholar
  46. Malmgren, R. D., Stouffer, D. B., Motter, A. E., and Amaral, L. A. A. N. 2008. A Poissonian explanation for heavy tails in e-mail communication. Proc. Nat. Acad. Sci. 105, 47, 18153--18158.Google ScholarGoogle ScholarCross RefCross Ref
  47. Meinshausen, N. and Buehlmann, P. 2006. High-dimensional graphs and variable selection with the lasso. Annals Statist. 34, 1436--1462.Google ScholarGoogle ScholarCross RefCross Ref
  48. Myers, S. and Leskovec, J. 2010. On the convexity of latent social network inference. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  49. Nemhauser, G., Wolsey, L., and Fisher, M. 1978. An analysis of approximations for maximizing submodular set functions. Math. Prog. 14, 1, 265--294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Rogers, E. M. 1995. Diffusion of Innovations Fourth Ed. Free Press, New York.Google ScholarGoogle Scholar
  51. Romero, D., Meeder, B., and Kleinberg, J. 2011. Differences in the mechanics of information diffusion across topics: Idioms, political hashtags, and complex contagion on Twitter. In Proceedings of the 20th International Conference on World Wide Web (WWW). ACM, 695--704. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Sadikov, S., Medina, M., Leskovec, J., and Garcia-Molina, H. 2011. Correcting for missing data in information cascades. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Schmidt, M., Niculescu-Mizil, A., and Murphy, K. 2007. Learning graphical model structure using L1-regularization paths. In Proceedings of the 21st Conference on Artificial Intelligence (AAAI). Vol. 22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Song, L., Kolar, M., and Xing, E. 2009. Time-varying dynamic Bayesian networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  55. Strang, D. and Soule, S. A. 1998. Diffusion in organizations and social movements: From hybrid corn to poison pills. Annual Rev. Sociology 24, 265--290.Google ScholarGoogle ScholarCross RefCross Ref
  56. Taskar, B., Wong, M. F., Abbeel, P., and Koller, D. 2003. Link prediction in relational data. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  57. Tutte, W. 1948. The disection of equilateral triangles into equilateral triangles. Proc. Cambridge Philos. Soc. 44, 63--482.Google ScholarGoogle ScholarCross RefCross Ref
  58. Ver Steeg, G., Ghosh, R., and Lerman, K. 2011. What stops social epidemics? In Proceedings of the 5th International Conference on Weblogs and Social Media (ICWSM0).Google ScholarGoogle Scholar
  59. Vert, J. and Yamanishi, Y. 2005. Supervised graph inference. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  60. Wainwright, M. J., Ravikumar, P., and Lafferty, J. D. 2006. High-dimensional graphical model selection using l1-regularized logistic regression. Proc. Nat. Acad. Sci.Google ScholarGoogle Scholar
  61. Wallinga, J. and Teunis, P. 2004. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. Amer. J. Epidemiology 160, 6, 509--516.Google ScholarGoogle ScholarCross RefCross Ref
  62. Watts, D. J. and Dodds, P. S. 2007. Influentials, networks, and public opinion formation. J. Consumer Res. 34, 4, 441--458.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Inferring Networks of Diffusion and Influence

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 5, Issue 4
      February 2012
      176 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2086737
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 February 2012
      • Accepted: 1 November 2011
      • Revised: 1 October 2011
      • Received: 1 December 2010
      Published in tkdd Volume 5, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader