skip to main content
research-article

A Scalable Redefined Stochastic Blockmodel

Authors Info & Claims
Published:21 April 2021Publication History
Skip Abstract Section

Abstract

Stochastic blockmodel (SBM) is a widely used statistical network representation model, with good interpretability, expressiveness, generalization, and flexibility, which has become prevalent and important in the field of network science over the last years. However, learning an optimal SBM for a given network is an NP-hard problem. This results in significant limitations when it comes to applications of SBMs in large-scale networks, because of the significant computational overhead of existing SBM models, as well as their learning methods. Reducing the cost of SBM learning and making it scalable for handling large-scale networks, while maintaining the good theoretical properties of SBM, remains an unresolved problem. In this work, we address this challenging task from a novel perspective of model redefinition. We propose a novel redefined SBM with Poisson distribution and its block-wise learning algorithm that can efficiently analyse large-scale networks. Extensive validation conducted on both artificial and real-world data shows that our proposed method significantly outperforms the state-of-the-art methods in terms of a reasonable trade-off between accuracy and scalability.1

References

  1. Emmanuel Abbe. 2017. Community detection and stochastic block models: Recent developments. The Journal of Machine Learning Research 18, 177 (2017), 6446–6531.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Lada A. Adamic and Natalie Glance. 2005. The political blogosphere and the 2004 US election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery. ACM, 36–43.Google ScholarGoogle Scholar
  3. Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. 2008. Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9, 65 (2008), 1981–2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alessia Amelio and Clara Pizzuti. 2015. Is normalized mutual information a fair measure for comparing community detection methods? In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. 1584–1585.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brian Ball, Brian Karrer, and Mark E. J. Newman. 2011. Efficient and principled method for detecting communities in networks. Physical Review E 84, 3 (2011), 036103.Google ScholarGoogle ScholarCross RefCross Ref
  6. Hongxu Chen, Hongzhi Yin, Tong Chen, Weiqing Wang, Xue Li, and Xia Hu. 2020. Social boosted recommendation with folded bipartite network embedding. IEEE Transactions on Knowledge and Data Engineering (2020), 1--1. DOI:https://doi.org/10.1109/TKDE.2020.2982878Google ScholarGoogle Scholar
  7. Hongxu Chen, Hongzhi Yin, Xiangguo Sun, Tong Chen, Bogdan Gabrys, and Katarzyna Musial. 2020. Multi-level graph convolutional networks for cross-platform anchor link prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J.-J. Daudin, Franck Picard, and Stéphane Robin. 2008. A mixture model for random graphs. Statistics and Computing 18, 2 (2008), 173–183.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hassan Ali Dawah, Bradford A. Hawkins, and Michael F. Claridge. 1995. Structure of the parasitoid communities of grass-feeding chalcid wasps. Journal of Animal Ecology 64, 6 (1995), 708–720.Google ScholarGoogle ScholarCross RefCross Ref
  10. Caterina De Bacco, Eleanor A. Power, Daniel B. Larremore, and Cristopher Moore. 2017. Community detection, link prediction, and layer interdependence in multilayer networks. Physical Review E 95, 4 (2017), 042317.Google ScholarGoogle ScholarCross RefCross Ref
  11. Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. 2011. Inference and phase transitions in the detection of modules in sparse networks. Physical Review Letters 107, 6 (2011), 065701.Google ScholarGoogle ScholarCross RefCross Ref
  12. Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39, 1 (1977), 1–22.Google ScholarGoogle ScholarCross RefCross Ref
  13. Giuseppe Facchetti, Giovanni Iacono, and Claudio Altafini. 2011. Computing global structural balance in large-scale signed social networks. Proceedings of the National Academy of Sciences 108, 52 (2011), 20953–20958.Google ScholarGoogle ScholarCross RefCross Ref
  14. Santo Fortunato and Darko Hric. 2016. Community detection in networks: A user guide. Physics Reports 659, 1 (2016), 1–44. Google ScholarGoogle ScholarCross RefCross Ref
  15. Prem K. Gopalan and David M. Blei. 2013. Efficient discovery of overlapping communities in massive networks. Proceedings of the National Academy of Sciences 110, 36 (2013), 14534–14539.Google ScholarGoogle ScholarCross RefCross Ref
  16. Prem K. Gopalan, Sean Gerrish, Michael Freedman, David M. Blei, and David M. Mimno. 2012. Scalable inference of overlapping communities. In Proceedings of the Advances in Neural Information Processing Systems. 2249–2257.Google ScholarGoogle Scholar
  17. Roger Guimerà and Marta Sales-Pardo. 2009. Missing and spurious interactions and the reconstruction of complex networks. Proceedings of the National Academy of Sciences 106, 52 (2009), 22073–22078.Google ScholarGoogle ScholarCross RefCross Ref
  18. Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems. 1024–1034.Google ScholarGoogle Scholar
  19. James A. Hanley and Barbara J. McNeil. 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology 143, 1 (1982), 29–36.Google ScholarGoogle Scholar
  20. Jake M. Hofman and Chris H. Wiggins. 2008. Bayesian approach to network modularity. Physical Review Letters 100, 25 (2008), 258701.Google ScholarGoogle ScholarCross RefCross Ref
  21. Paul W. Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. 1983. Stochastic blockmodels: First steps. Social Networks 5, 2 (1983), 109–137.Google ScholarGoogle ScholarCross RefCross Ref
  22. Jonathan Q. Jiang. 2015. Stochastic block model and exploratory analysis in signed networks. Physical Review E 91, 6 (2015), 062805.Google ScholarGoogle ScholarCross RefCross Ref
  23. Di Jin, Bingyi Li, Pengfei Jiao, Dongxiao He, and Weixiong Zhang. 2019. Network-specific variational auto-encoder for embedding in attribute networks. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2663–2669. DOI:https://doi.org/10.24963/ijcai.2019/370Google ScholarGoogle ScholarCross RefCross Ref
  24. Brian Karrer and Mark E. J. Newman. 2011. Stochastic blockmodels and community structure in networks. Physical Review E 83, 1 (2011), 016107.Google ScholarGoogle ScholarCross RefCross Ref
  25. Emilie Kaufmann, Thomas Bonald, and Marc Lelarge. 2016. A spectral algorithm with additive clustering for the recovery of overlapping communities in networks. In Proceedings of the International Conference on Algorithmic Learning Theory. Springer, 355–370.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Charles Kemp, Joshua B. Tenenbaum, Thomas L. Griffiths, Takeshi Yamada, and Naonori Ueda. 2006. Learning systems of concepts with an infinite relational model. In Proceedings of the 21st National Conference on Artificial Intelligence.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ludmila I. Kuncheva and Stefan Todorov Hadjitodorov. 2004. Using diversity in cluster ensembles. In Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics. Vol. 2. IEEE, 1214–1219.Google ScholarGoogle Scholar
  28. Aaron D. Lanterman. 2001. Schwarz, Wallace, and Rissanen: Intertwining themes in theories of model selection. International Statistical Review 69, 2 (2001), 185–212.Google ScholarGoogle ScholarCross RefCross Ref
  29. Pierre Latouche, Etienne Birmele, and Christophe Ambroise. 2012. Variational Bayesian inference and complexity control for stochastic block models. Statistical Modelling 12, 1 (2012), 93–115.Google ScholarGoogle ScholarCross RefCross Ref
  30. Chaochao Liu, Wenjun Wang, Carlo Vittorio Cannistraci, Di Jin, and Yueheng Sun. 2018. Layer clustering-enhanced stochastic block model for community detection in multiplex networks. In Proceedings of the International Conference on Computer Engineering and Networks. Springer, 287–297.Google ScholarGoogle Scholar
  31. Xueyan Liu, Wenzhuo Song, Katarzyna Musial, Xuehua Zhao, Wanli Zuo, and Bo Yang. 2020. Semi-supervised stochastic blockmodel for structure analysis of signed networks. Knowledge-Based Systems 195, 1 (2020), 105714.Google ScholarGoogle ScholarCross RefCross Ref
  32. David Lusseau, Karsten Schneider, Oliver J. Boisseau, Patti Haase, Elisabeth Slooten, and Steve M. Dawson. 2003. The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology 54, 4 (2003), 396–405.Google ScholarGoogle ScholarCross RefCross Ref
  33. Xueyu Mao, Purnamrita Sarkar, and Deepayan Chakrabarti. 2017. On mixed memberships and symmetric nonnegative matrix factorizations. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2324–2333.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Catherine Matias and Vincent Miele. 2017. Statistical clustering of temporal networks through a dynamic stochastic block model. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 4 (2017), 1119–1141.Google ScholarGoogle ScholarCross RefCross Ref
  35. Nikhil Mehta, Lawrence Carin Duke, and Piyush Rai. 2019. Stochastic blockmodels meet graph neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML’19). 4466–4474.Google ScholarGoogle Scholar
  36. Mark E. J. Newman. 2006. Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74, 3 (2006), 036104.Google ScholarGoogle ScholarCross RefCross Ref
  37. Mark E. J. Newman. 2012. Communities, modules and large-scale structure in networks. Nature Physics 8, 1 (2012), 25.Google ScholarGoogle ScholarCross RefCross Ref
  38. Mark E. J. Newman and Elizabeth A. Leicht. 2007. Mixture models and exploratory analysis in networks. Proceedings of the National Academy of Sciences 104, 23 (2007), 9564–9569.Google ScholarGoogle ScholarCross RefCross Ref
  39. Tiago P. Peixoto. 2013. Parsimonious module inference in large networks. Physical Review Letters 110, 14 (2013), 148701.Google ScholarGoogle ScholarCross RefCross Ref
  40. Tiago P. Peixoto. 2015. Model selection and hypothesis testing for large-scale network models with overlapping groups. Physical Review X 5, 1 (2015), 011033.Google ScholarGoogle ScholarCross RefCross Ref
  41. Tiago P. Peixoto. 2018. Nonparametric weighted stochastic block models. Physical Review E 97, 1 (2018), 012306.Google ScholarGoogle ScholarCross RefCross Ref
  42. Chengbin Peng, Zhihua Zhang, Ka-Chun Wong, Xiangliang Zhang, and David Keyes. 2015. A scalable community detection algorithm for large graphs using stochastic block models. In Proceedings of the 24th International Joint Conference on Artificial Intelligence.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Marianna Pensky and Teng Zhang. 2019. Spectral clustering in the dynamic stochastic block model. Electronic Journal of Statistics 13, 1 (2019), 678–709.Google ScholarGoogle ScholarCross RefCross Ref
  44. Maria A. Riolo, George T. Cantwell, Gesine Reinert, and M. E. J. Newman. 2017. Efficient method for estimating the number of communities in a network. Physical Review E 96, 3 (2017), 032310.Google ScholarGoogle ScholarCross RefCross Ref
  45. Mikkel N. Schmidt and Morten Morup. 2013. Nonparametric Bayesian modeling of complex networks: An introduction. IEEE Signal Processing Magazine 30, 3 (2013), 110–128.Google ScholarGoogle ScholarCross RefCross Ref
  46. Lei Tang and Huan Liu. 2009. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM Conference on Information and Knowledge Management. ACM, 1107–1116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. D. Michael Titterington, Adrian F. M. Smith, and Udi E. Makov. 1985. Statistical Analysis of Finite Mixture Distributions. Wiley.Google ScholarGoogle Scholar
  48. Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2010. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. The Journal of Machine Learning Research 11, 95 (2010), 2837–2854.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Bo Yang, Jiming Liu, and Dayou Liu. 2012. Characterizing and extracting multiplex patterns in complex networks. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics 42, 2 (2012), 469–481.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Bo Yang, Xueyan Liu, Yang Li, and Xuehua Zhao. 2017. Stochastic blockmodeling and variational Bayes learning for signed network analysis. IEEE Transactions on Knowledge and Data Engineering 29, 9 (2017), 2026–2039.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Bo Yang and Xuehua Zhao. 2015. On the scalable learning of stochastic blockmodel. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. 360–366.Google ScholarGoogle Scholar
  52. Tianbao Yang, Yun Chi, Shenghuo Zhu, Yihong Gong, and Rong Jin. 2011. Detecting communities and their evolutions in dynamic social networks—A Bayesian approach. Machine Learning 82, 2 (2011), 157–189.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Wayne W. Zachary. 1977. An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33, 4 (1977), 452–473.Google ScholarGoogle ScholarCross RefCross Ref
  54. Xuehua Zhao, Bo Yang, and Hechang Chen. 2014. Efficiently and fast learning a fine-grained stochastic blockmodel from large networks. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 374–385.Google ScholarGoogle ScholarCross RefCross Ref
  55. Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Scalable Redefined Stochastic Blockmodel

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 3
      June 2021
      533 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/3454120
      Issue’s Table of Contents

      Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 April 2021
      • Accepted: 1 December 2020
      • Revised: 1 October 2020
      • Received: 1 April 2020
      Published in tkdd Volume 15, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)54
      • Downloads (Last 6 weeks)10

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format