Abstract
Stochastic blockmodel (SBM) is a widely used statistical network representation model, with good interpretability, expressiveness, generalization, and flexibility, which has become prevalent and important in the field of network science over the last years. However, learning an optimal SBM for a given network is an NP-hard problem. This results in significant limitations when it comes to applications of SBMs in large-scale networks, because of the significant computational overhead of existing SBM models, as well as their learning methods. Reducing the cost of SBM learning and making it scalable for handling large-scale networks, while maintaining the good theoretical properties of SBM, remains an unresolved problem. In this work, we address this challenging task from a novel perspective of model redefinition. We propose a novel redefined SBM with Poisson distribution and its block-wise learning algorithm that can efficiently analyse large-scale networks. Extensive validation conducted on both artificial and real-world data shows that our proposed method significantly outperforms the state-of-the-art methods in terms of a reasonable trade-off between accuracy and scalability.1
- Emmanuel Abbe. 2017. Community detection and stochastic block models: Recent developments. The Journal of Machine Learning Research 18, 177 (2017), 6446–6531.Google ScholarDigital Library
- Lada A. Adamic and Natalie Glance. 2005. The political blogosphere and the 2004 US election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery. ACM, 36–43.Google Scholar
- Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. 2008. Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9, 65 (2008), 1981–2014.Google ScholarDigital Library
- Alessia Amelio and Clara Pizzuti. 2015. Is normalized mutual information a fair measure for comparing community detection methods? In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. 1584–1585.Google ScholarDigital Library
- Brian Ball, Brian Karrer, and Mark E. J. Newman. 2011. Efficient and principled method for detecting communities in networks. Physical Review E 84, 3 (2011), 036103.Google ScholarCross Ref
- Hongxu Chen, Hongzhi Yin, Tong Chen, Weiqing Wang, Xue Li, and Xia Hu. 2020. Social boosted recommendation with folded bipartite network embedding. IEEE Transactions on Knowledge and Data Engineering (2020), 1--1. DOI:https://doi.org/10.1109/TKDE.2020.2982878Google Scholar
- Hongxu Chen, Hongzhi Yin, Xiangguo Sun, Tong Chen, Bogdan Gabrys, and Katarzyna Musial. 2020. Multi-level graph convolutional networks for cross-platform anchor link prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.Google ScholarDigital Library
- J.-J. Daudin, Franck Picard, and Stéphane Robin. 2008. A mixture model for random graphs. Statistics and Computing 18, 2 (2008), 173–183.Google ScholarDigital Library
- Hassan Ali Dawah, Bradford A. Hawkins, and Michael F. Claridge. 1995. Structure of the parasitoid communities of grass-feeding chalcid wasps. Journal of Animal Ecology 64, 6 (1995), 708–720.Google ScholarCross Ref
- Caterina De Bacco, Eleanor A. Power, Daniel B. Larremore, and Cristopher Moore. 2017. Community detection, link prediction, and layer interdependence in multilayer networks. Physical Review E 95, 4 (2017), 042317.Google ScholarCross Ref
- Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. 2011. Inference and phase transitions in the detection of modules in sparse networks. Physical Review Letters 107, 6 (2011), 065701.Google ScholarCross Ref
- Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39, 1 (1977), 1–22.Google ScholarCross Ref
- Giuseppe Facchetti, Giovanni Iacono, and Claudio Altafini. 2011. Computing global structural balance in large-scale signed social networks. Proceedings of the National Academy of Sciences 108, 52 (2011), 20953–20958.Google ScholarCross Ref
- Santo Fortunato and Darko Hric. 2016. Community detection in networks: A user guide. Physics Reports 659, 1 (2016), 1–44. Google ScholarCross Ref
- Prem K. Gopalan and David M. Blei. 2013. Efficient discovery of overlapping communities in massive networks. Proceedings of the National Academy of Sciences 110, 36 (2013), 14534–14539.Google ScholarCross Ref
- Prem K. Gopalan, Sean Gerrish, Michael Freedman, David M. Blei, and David M. Mimno. 2012. Scalable inference of overlapping communities. In Proceedings of the Advances in Neural Information Processing Systems. 2249–2257.Google Scholar
- Roger Guimerà and Marta Sales-Pardo. 2009. Missing and spurious interactions and the reconstruction of complex networks. Proceedings of the National Academy of Sciences 106, 52 (2009), 22073–22078.Google ScholarCross Ref
- Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems. 1024–1034.Google Scholar
- James A. Hanley and Barbara J. McNeil. 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology 143, 1 (1982), 29–36.Google Scholar
- Jake M. Hofman and Chris H. Wiggins. 2008. Bayesian approach to network modularity. Physical Review Letters 100, 25 (2008), 258701.Google ScholarCross Ref
- Paul W. Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. 1983. Stochastic blockmodels: First steps. Social Networks 5, 2 (1983), 109–137.Google ScholarCross Ref
- Jonathan Q. Jiang. 2015. Stochastic block model and exploratory analysis in signed networks. Physical Review E 91, 6 (2015), 062805.Google ScholarCross Ref
- Di Jin, Bingyi Li, Pengfei Jiao, Dongxiao He, and Weixiong Zhang. 2019. Network-specific variational auto-encoder for embedding in attribute networks. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2663–2669. DOI:https://doi.org/10.24963/ijcai.2019/370Google ScholarCross Ref
- Brian Karrer and Mark E. J. Newman. 2011. Stochastic blockmodels and community structure in networks. Physical Review E 83, 1 (2011), 016107.Google ScholarCross Ref
- Emilie Kaufmann, Thomas Bonald, and Marc Lelarge. 2016. A spectral algorithm with additive clustering for the recovery of overlapping communities in networks. In Proceedings of the International Conference on Algorithmic Learning Theory. Springer, 355–370.Google ScholarDigital Library
- Charles Kemp, Joshua B. Tenenbaum, Thomas L. Griffiths, Takeshi Yamada, and Naonori Ueda. 2006. Learning systems of concepts with an infinite relational model. In Proceedings of the 21st National Conference on Artificial Intelligence.Google ScholarDigital Library
- Ludmila I. Kuncheva and Stefan Todorov Hadjitodorov. 2004. Using diversity in cluster ensembles. In Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics. Vol. 2. IEEE, 1214–1219.Google Scholar
- Aaron D. Lanterman. 2001. Schwarz, Wallace, and Rissanen: Intertwining themes in theories of model selection. International Statistical Review 69, 2 (2001), 185–212.Google ScholarCross Ref
- Pierre Latouche, Etienne Birmele, and Christophe Ambroise. 2012. Variational Bayesian inference and complexity control for stochastic block models. Statistical Modelling 12, 1 (2012), 93–115.Google ScholarCross Ref
- Chaochao Liu, Wenjun Wang, Carlo Vittorio Cannistraci, Di Jin, and Yueheng Sun. 2018. Layer clustering-enhanced stochastic block model for community detection in multiplex networks. In Proceedings of the International Conference on Computer Engineering and Networks. Springer, 287–297.Google Scholar
- Xueyan Liu, Wenzhuo Song, Katarzyna Musial, Xuehua Zhao, Wanli Zuo, and Bo Yang. 2020. Semi-supervised stochastic blockmodel for structure analysis of signed networks. Knowledge-Based Systems 195, 1 (2020), 105714.Google ScholarCross Ref
- David Lusseau, Karsten Schneider, Oliver J. Boisseau, Patti Haase, Elisabeth Slooten, and Steve M. Dawson. 2003. The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology 54, 4 (2003), 396–405.Google ScholarCross Ref
- Xueyu Mao, Purnamrita Sarkar, and Deepayan Chakrabarti. 2017. On mixed memberships and symmetric nonnegative matrix factorizations. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2324–2333.Google ScholarDigital Library
- Catherine Matias and Vincent Miele. 2017. Statistical clustering of temporal networks through a dynamic stochastic block model. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 4 (2017), 1119–1141.Google ScholarCross Ref
- Nikhil Mehta, Lawrence Carin Duke, and Piyush Rai. 2019. Stochastic blockmodels meet graph neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML’19). 4466–4474.Google Scholar
- Mark E. J. Newman. 2006. Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74, 3 (2006), 036104.Google ScholarCross Ref
- Mark E. J. Newman. 2012. Communities, modules and large-scale structure in networks. Nature Physics 8, 1 (2012), 25.Google ScholarCross Ref
- Mark E. J. Newman and Elizabeth A. Leicht. 2007. Mixture models and exploratory analysis in networks. Proceedings of the National Academy of Sciences 104, 23 (2007), 9564–9569.Google ScholarCross Ref
- Tiago P. Peixoto. 2013. Parsimonious module inference in large networks. Physical Review Letters 110, 14 (2013), 148701.Google ScholarCross Ref
- Tiago P. Peixoto. 2015. Model selection and hypothesis testing for large-scale network models with overlapping groups. Physical Review X 5, 1 (2015), 011033.Google ScholarCross Ref
- Tiago P. Peixoto. 2018. Nonparametric weighted stochastic block models. Physical Review E 97, 1 (2018), 012306.Google ScholarCross Ref
- Chengbin Peng, Zhihua Zhang, Ka-Chun Wong, Xiangliang Zhang, and David Keyes. 2015. A scalable community detection algorithm for large graphs using stochastic block models. In Proceedings of the 24th International Joint Conference on Artificial Intelligence.Google ScholarDigital Library
- Marianna Pensky and Teng Zhang. 2019. Spectral clustering in the dynamic stochastic block model. Electronic Journal of Statistics 13, 1 (2019), 678–709.Google ScholarCross Ref
- Maria A. Riolo, George T. Cantwell, Gesine Reinert, and M. E. J. Newman. 2017. Efficient method for estimating the number of communities in a network. Physical Review E 96, 3 (2017), 032310.Google ScholarCross Ref
- Mikkel N. Schmidt and Morten Morup. 2013. Nonparametric Bayesian modeling of complex networks: An introduction. IEEE Signal Processing Magazine 30, 3 (2013), 110–128.Google ScholarCross Ref
- Lei Tang and Huan Liu. 2009. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM Conference on Information and Knowledge Management. ACM, 1107–1116.Google ScholarDigital Library
- D. Michael Titterington, Adrian F. M. Smith, and Udi E. Makov. 1985. Statistical Analysis of Finite Mixture Distributions. Wiley.Google Scholar
- Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2010. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. The Journal of Machine Learning Research 11, 95 (2010), 2837–2854.Google ScholarDigital Library
- Bo Yang, Jiming Liu, and Dayou Liu. 2012. Characterizing and extracting multiplex patterns in complex networks. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics 42, 2 (2012), 469–481.Google ScholarDigital Library
- Bo Yang, Xueyan Liu, Yang Li, and Xuehua Zhao. 2017. Stochastic blockmodeling and variational Bayes learning for signed network analysis. IEEE Transactions on Knowledge and Data Engineering 29, 9 (2017), 2026–2039.Google ScholarDigital Library
- Bo Yang and Xuehua Zhao. 2015. On the scalable learning of stochastic blockmodel. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. 360–366.Google Scholar
- Tianbao Yang, Yun Chi, Shenghuo Zhu, Yihong Gong, and Rong Jin. 2011. Detecting communities and their evolutions in dynamic social networks—A Bayesian approach. Machine Learning 82, 2 (2011), 157–189.Google ScholarDigital Library
- Wayne W. Zachary. 1977. An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33, 4 (1977), 452–473.Google ScholarCross Ref
- Xuehua Zhao, Bo Yang, and Hechang Chen. 2014. Efficiently and fast learning a fine-grained stochastic blockmodel from large networks. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 374–385.Google ScholarCross Ref
- Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer.Google ScholarDigital Library
Index Terms
- A Scalable Redefined Stochastic Blockmodel
Recommendations
On the scalable learning of stochastic blockmodel
AAAI'15: Proceedings of the Twenty-Ninth AAAI Conference on Artificial IntelligenceStochastic blockmodel (SBM) enables us to decompose and analyze an exploratory network without a priori knowledge about its intrinsic structure. However, the task of effectively and efficiently learning a SBM from a large-scale network is still ...
Sampling algorithms for stochastic graphs
Stochastic graph as a graph model for complex social networks.Four sampling algorithms for stochastic graphs in which edge weights are random variables.Analyze complex networks using stochastic network measures and sampling algorithms.Study the ...
Stochastic blockmodel approximation of a graphon: Theory and consistent estimation
NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1Non-parametric approaches for analyzing network data based on exchangeable graph models (ExGM) have recently gained interest. The key object that defines an ExGM is often referred to as a graphon. This non-parametric perspective on network modeling ...
Comments