research-article

A Scalable Redefined Stochastic Blockmodel

Authors:
Xueyan Liu

Jilin University, Changchun, China

Jilin University, Changchun, China
View Profile

,
Bo Yang

Jilin University, Changchun, China

Jilin University, Changchun, China
View Profile

,
Hechang Chen

Jilin University, Changchun, China

Jilin University, Changchun, China
View Profile

,
Katarzyna Musial

University of Technology Sydney, NSW, Australia

University of Technology Sydney, NSW, Australia
View Profile

,
Hongxu Chen

University of Technology Sydney, NSW, Australia

University of Technology Sydney, NSW, Australia
View Profile

,
Yang Li

Aviation University of Air Force and Jilin University, Changchun, China

Aviation University of Air Force and Jilin University, Changchun, China
View Profile

,
Wanli Zuo

Jilin University, Changchun, China

Jilin University, Changchun, China
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 15 Issue 3Article No.: 46pp 1–28https://doi.org/10.1145/3442589

Published:21 April 2021Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Stochastic blockmodel (SBM) is a widely used statistical network representation model, with good interpretability, expressiveness, generalization, and flexibility, which has become prevalent and important in the field of network science over the last years. However, learning an optimal SBM for a given network is an NP-hard problem. This results in significant limitations when it comes to applications of SBMs in large-scale networks, because of the significant computational overhead of existing SBM models, as well as their learning methods. Reducing the cost of SBM learning and making it scalable for handling large-scale networks, while maintaining the good theoretical properties of SBM, remains an unresolved problem. In this work, we address this challenging task from a novel perspective of model redefinition. We propose a novel redefined SBM with Poisson distribution and its block-wise learning algorithm that can efficiently analyse large-scale networks. Extensive validation conducted on both artificial and real-world data shows that our proposed method significantly outperforms the state-of-the-art methods in terms of a reasonable trade-off between accuracy and scalability.¹

References

Emmanuel Abbe. 2017. Community detection and stochastic block models: Recent developments. The Journal of Machine Learning Research 18, 177 (2017), 6446–6531.Google ScholarDigital Library
Lada A. Adamic and Natalie Glance. 2005. The political blogosphere and the 2004 US election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery. ACM, 36–43.Google Scholar
Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. 2008. Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9, 65 (2008), 1981–2014.Google ScholarDigital Library
Alessia Amelio and Clara Pizzuti. 2015. Is normalized mutual information a fair measure for comparing community detection methods? In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. 1584–1585.Google ScholarDigital Library
Brian Ball, Brian Karrer, and Mark E. J. Newman. 2011. Efficient and principled method for detecting communities in networks. Physical Review E 84, 3 (2011), 036103.Google ScholarCross Ref
Hongxu Chen, Hongzhi Yin, Tong Chen, Weiqing Wang, Xue Li, and Xia Hu. 2020. Social boosted recommendation with folded bipartite network embedding. IEEE Transactions on Knowledge and Data Engineering (2020), 1--1. DOI:https://doi.org/10.1109/TKDE.2020.2982878Google Scholar
Hongxu Chen, Hongzhi Yin, Xiangguo Sun, Tong Chen, Bogdan Gabrys, and Katarzyna Musial. 2020. Multi-level graph convolutional networks for cross-platform anchor link prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.Google ScholarDigital Library
J.-J. Daudin, Franck Picard, and Stéphane Robin. 2008. A mixture model for random graphs. Statistics and Computing 18, 2 (2008), 173–183.Google ScholarDigital Library
Hassan Ali Dawah, Bradford A. Hawkins, and Michael F. Claridge. 1995. Structure of the parasitoid communities of grass-feeding chalcid wasps. Journal of Animal Ecology 64, 6 (1995), 708–720.Google ScholarCross Ref
Caterina De Bacco, Eleanor A. Power, Daniel B. Larremore, and Cristopher Moore. 2017. Community detection, link prediction, and layer interdependence in multilayer networks. Physical Review E 95, 4 (2017), 042317.Google ScholarCross Ref
Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. 2011. Inference and phase transitions in the detection of modules in sparse networks. Physical Review Letters 107, 6 (2011), 065701.Google ScholarCross Ref
Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39, 1 (1977), 1–22.Google ScholarCross Ref
Giuseppe Facchetti, Giovanni Iacono, and Claudio Altafini. 2011. Computing global structural balance in large-scale signed social networks. Proceedings of the National Academy of Sciences 108, 52 (2011), 20953–20958.Google ScholarCross Ref
Santo Fortunato and Darko Hric. 2016. Community detection in networks: A user guide. Physics Reports 659, 1 (2016), 1–44. Google ScholarCross Ref
Prem K. Gopalan and David M. Blei. 2013. Efficient discovery of overlapping communities in massive networks. Proceedings of the National Academy of Sciences 110, 36 (2013), 14534–14539.Google ScholarCross Ref
Prem K. Gopalan, Sean Gerrish, Michael Freedman, David M. Blei, and David M. Mimno. 2012. Scalable inference of overlapping communities. In Proceedings of the Advances in Neural Information Processing Systems. 2249–2257.Google Scholar
Roger Guimerà and Marta Sales-Pardo. 2009. Missing and spurious interactions and the reconstruction of complex networks. Proceedings of the National Academy of Sciences 106, 52 (2009), 22073–22078.Google ScholarCross Ref
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems. 1024–1034.Google Scholar
James A. Hanley and Barbara J. McNeil. 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology 143, 1 (1982), 29–36.Google Scholar
Jake M. Hofman and Chris H. Wiggins. 2008. Bayesian approach to network modularity. Physical Review Letters 100, 25 (2008), 258701.Google ScholarCross Ref
Paul W. Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. 1983. Stochastic blockmodels: First steps. Social Networks 5, 2 (1983), 109–137.Google ScholarCross Ref
Jonathan Q. Jiang. 2015. Stochastic block model and exploratory analysis in signed networks. Physical Review E 91, 6 (2015), 062805.Google ScholarCross Ref
Di Jin, Bingyi Li, Pengfei Jiao, Dongxiao He, and Weixiong Zhang. 2019. Network-specific variational auto-encoder for embedding in attribute networks. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2663–2669. DOI:https://doi.org/10.24963/ijcai.2019/370Google ScholarCross Ref
Brian Karrer and Mark E. J. Newman. 2011. Stochastic blockmodels and community structure in networks. Physical Review E 83, 1 (2011), 016107.Google ScholarCross Ref
Emilie Kaufmann, Thomas Bonald, and Marc Lelarge. 2016. A spectral algorithm with additive clustering for the recovery of overlapping communities in networks. In Proceedings of the International Conference on Algorithmic Learning Theory. Springer, 355–370.Google ScholarDigital Library
Charles Kemp, Joshua B. Tenenbaum, Thomas L. Griffiths, Takeshi Yamada, and Naonori Ueda. 2006. Learning systems of concepts with an infinite relational model. In Proceedings of the 21st National Conference on Artificial Intelligence.Google ScholarDigital Library
Ludmila I. Kuncheva and Stefan Todorov Hadjitodorov. 2004. Using diversity in cluster ensembles. In Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics. Vol. 2. IEEE, 1214–1219.Google Scholar
Aaron D. Lanterman. 2001. Schwarz, Wallace, and Rissanen: Intertwining themes in theories of model selection. International Statistical Review 69, 2 (2001), 185–212.Google ScholarCross Ref
Pierre Latouche, Etienne Birmele, and Christophe Ambroise. 2012. Variational Bayesian inference and complexity control for stochastic block models. Statistical Modelling 12, 1 (2012), 93–115.Google ScholarCross Ref
Chaochao Liu, Wenjun Wang, Carlo Vittorio Cannistraci, Di Jin, and Yueheng Sun. 2018. Layer clustering-enhanced stochastic block model for community detection in multiplex networks. In Proceedings of the International Conference on Computer Engineering and Networks. Springer, 287–297.Google Scholar
Xueyan Liu, Wenzhuo Song, Katarzyna Musial, Xuehua Zhao, Wanli Zuo, and Bo Yang. 2020. Semi-supervised stochastic blockmodel for structure analysis of signed networks. Knowledge-Based Systems 195, 1 (2020), 105714.Google ScholarCross Ref
David Lusseau, Karsten Schneider, Oliver J. Boisseau, Patti Haase, Elisabeth Slooten, and Steve M. Dawson. 2003. The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology 54, 4 (2003), 396–405.Google ScholarCross Ref
Xueyu Mao, Purnamrita Sarkar, and Deepayan Chakrabarti. 2017. On mixed memberships and symmetric nonnegative matrix factorizations. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2324–2333.Google ScholarDigital Library
Catherine Matias and Vincent Miele. 2017. Statistical clustering of temporal networks through a dynamic stochastic block model. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 4 (2017), 1119–1141.Google ScholarCross Ref
Nikhil Mehta, Lawrence Carin Duke, and Piyush Rai. 2019. Stochastic blockmodels meet graph neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML’19). 4466–4474.Google Scholar
Mark E. J. Newman. 2006. Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74, 3 (2006), 036104.Google ScholarCross Ref
Mark E. J. Newman. 2012. Communities, modules and large-scale structure in networks. Nature Physics 8, 1 (2012), 25.Google ScholarCross Ref
Mark E. J. Newman and Elizabeth A. Leicht. 2007. Mixture models and exploratory analysis in networks. Proceedings of the National Academy of Sciences 104, 23 (2007), 9564–9569.Google ScholarCross Ref
Tiago P. Peixoto. 2013. Parsimonious module inference in large networks. Physical Review Letters 110, 14 (2013), 148701.Google ScholarCross Ref
Tiago P. Peixoto. 2015. Model selection and hypothesis testing for large-scale network models with overlapping groups. Physical Review X 5, 1 (2015), 011033.Google ScholarCross Ref
Tiago P. Peixoto. 2018. Nonparametric weighted stochastic block models. Physical Review E 97, 1 (2018), 012306.Google ScholarCross Ref
Chengbin Peng, Zhihua Zhang, Ka-Chun Wong, Xiangliang Zhang, and David Keyes. 2015. A scalable community detection algorithm for large graphs using stochastic block models. In Proceedings of the 24th International Joint Conference on Artificial Intelligence.Google ScholarDigital Library
Marianna Pensky and Teng Zhang. 2019. Spectral clustering in the dynamic stochastic block model. Electronic Journal of Statistics 13, 1 (2019), 678–709.Google ScholarCross Ref
Maria A. Riolo, George T. Cantwell, Gesine Reinert, and M. E. J. Newman. 2017. Efficient method for estimating the number of communities in a network. Physical Review E 96, 3 (2017), 032310.Google ScholarCross Ref
Mikkel N. Schmidt and Morten Morup. 2013. Nonparametric Bayesian modeling of complex networks: An introduction. IEEE Signal Processing Magazine 30, 3 (2013), 110–128.Google ScholarCross Ref
Lei Tang and Huan Liu. 2009. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM Conference on Information and Knowledge Management. ACM, 1107–1116.Google ScholarDigital Library
D. Michael Titterington, Adrian F. M. Smith, and Udi E. Makov. 1985. Statistical Analysis of Finite Mixture Distributions. Wiley.Google Scholar
Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2010. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. The Journal of Machine Learning Research 11, 95 (2010), 2837–2854.Google ScholarDigital Library
Bo Yang, Jiming Liu, and Dayou Liu. 2012. Characterizing and extracting multiplex patterns in complex networks. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics 42, 2 (2012), 469–481.Google ScholarDigital Library
Bo Yang, Xueyan Liu, Yang Li, and Xuehua Zhao. 2017. Stochastic blockmodeling and variational Bayes learning for signed network analysis. IEEE Transactions on Knowledge and Data Engineering 29, 9 (2017), 2026–2039.Google ScholarDigital Library
Bo Yang and Xuehua Zhao. 2015. On the scalable learning of stochastic blockmodel. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. 360–366.Google Scholar
Tianbao Yang, Yun Chi, Shenghuo Zhu, Yihong Gong, and Rong Jin. 2011. Detecting communities and their evolutions in dynamic social networks—A Bayesian approach. Machine Learning 82, 2 (2011), 157–189.Google ScholarDigital Library
Wayne W. Zachary. 1977. An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33, 4 (1977), 452–473.Google ScholarCross Ref
Xuehua Zhao, Bo Yang, and Hechang Chen. 2014. Efficiently and fast learning a fine-grained stochastic blockmodel from large networks. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 374–385.Google ScholarCross Ref
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer.Google ScholarDigital Library

Index Terms

A Scalable Redefined Stochastic Blockmodel
1. Information systems
  1. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

On the scalable learning of stochastic blockmodel
AAAI'15: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

Stochastic blockmodel (SBM) enables us to decompose and analyze an exploratory network without a priori knowledge about its intrinsic structure. However, the task of effectively and efficiently learning a SBM from a large-scale network is still ...
Read More
Sampling algorithms for stochastic graphs

Stochastic graph as a graph model for complex social networks.Four sampling algorithms for stochastic graphs in which edge weights are random variables.Analyze complex networks using stochastic network measures and sampling algorithms.Study the ...
Read More
Stochastic blockmodel approximation of a graphon: Theory and consistent estimation
NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1

Non-parametric approaches for analyzing network data based on exchangeable graph models (ExGM) have recently gained interest. The key object that defines an ExGM is often referred to as a graphon. This non-parametric perspective on network modeling ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Knowledge Discovery from Data Volume 15, Issue 3
June 2021
533 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3454120
Issue’s Table of Contents

Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 April 2021
- Accepted: 1 December 2020
- Revised: 1 October 2020
- Received: 1 April 2020
Published in tkdd Volume 15, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Complex networks
redefined stochastic blockmodel
structural pattern detection
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 304
  Total Downloads
- Downloads (Last 12 months)54
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Scalable Redefined Stochastic Blockmodel

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

On the scalable learning of stochastic blockmodel

Sampling algorithms for stochastic graphs

Stochastic blockmodel approximation of a graphon: Theory and consistent estimation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

A Scalable Redefined Stochastic Blockmodel

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

On the scalable learning of stochastic blockmodel

Sampling algorithms for stochastic graphs

Stochastic blockmodel approximation of a graphon: Theory and consistent estimation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media