Elsevier

Neurocomputing

Volume 297, 5 July 2018, Pages 71-81
Neurocomputing

Incorporating network structure with node contents for community detection on large networks using deep learning

https://doi.org/10.1016/j.neucom.2018.01.065Get rights and content

Abstract

Community detection is an important task in social network analysis. In community detection, in general, there exist two types of the models that utilize either network topology or node contents. Some studies endeavor to incorporate these two types of models under the framework of spectral clustering for a better community detection. However, it was not successful to obtain a big achievement since they used a simple way for the combination. To reach a better community detection, it requires to realize a seamless combination of these two methods. For this purpose, we re-examine the properties of the modularity maximization and normalized-cut models and fund out a certain approach to realize a seamless combination of these two models. These two models seek for a low-rank embedding to represent of the community structure and reconstruct the network topology and node contents, respectively. Meanwhile, we found that autoencoder and spectral clustering have a similar framework in their low- rank matrix reconstruction. Based on this property, we proposed a new approach to seamlessly combine the models of modularity and normalized-cut via the autoencoder. The proposed method also utilized the advantages of the deep structure by means of deep learning. The experiment demonstrated that the proposed method can provide a nonlinearly deep representation for a large-scale network and reached an efficient community detection. The evaluation results showed that our proposed method outperformed the existing leading methods on nine real-world networks.

Introduction

The development of Internet has led to producing more and more variety of data, such as online comments, product reviews and co-author networks, which have affected all aspects of people's lives, and thus the analysis of those data has attracted more and more attention of researchers in various fields. One hot topic in the studies of such social media or online data is to discovery the underlying structure with group effect, which is the so-called community structure. The vertices (users) related to the communities in the network can be divided into groups, in which vertices have more multiple connection but the connections are relatively sparse in the whole network. Those individuals or users belonging to the same community share common profiles or have common interests. The identification of communities consisting of users with similarity is very important, and has been applied in many areas, e.g., sociology, biology and computer science. For example, in biology, some different units belonging to an organization have some related functions that are interconnected with special structures to characterize the whole effect of the organization. The interaction among a set of proteins in a cell can form an RNA polymerase for transcription of genes. In computer science, finding the salient communities from an organization of people can create a guide which helps to web marketing, behavior prediction of users belonging to a community and understanding the functions of a complex system [1].

For community detection, some methods have put forward, which can be cast as graph clustering. In normalized cut (n-cut) [4], the Laplacian matrix is the main objective to be processed. The eigenvectors with a non-zero eigenvalue, which are obtained by the eigenvalue decomposition (EVD) of graph Laplacian matrix, are treated as graph representation. Some other works can be also transformed to spectral clustering. For example, the modularity maximization model [10] first constructs a graph that is based on feature vectors, and then solves the top k eigenvectors as network representation for clustering. Here, we can deem modularity matrix as graph Laplacian matrix. We realize that, those two methods (i.e. modularity optimization and n-cut) can easily capture topology-based and content-based features by EVD of the corresponding spectral matrices separately, as shown in Fig. 1. However, those methods for community detection are often limited to obtain the important information about the structure of the communities in networks. It demonstrates that one of the techniques can overcome the problem, which only consider topological structure (or node contents), by fusing the vertex information (or say node contents) with linkage information for community detection [2].

When considering both topological and content information for community discovery, we can combine these two objective functions into one in the form of linearity directly. However, some classical graph embedding methods, such as locally linear embedding (LLE) [11], show that the relations among vertices in the real-world networks are not certainly linear. So, the model based on this linearly combination strategy is still limited on real-world networks. Moreover, although we could get a network representation by fusing those two types of information, the problem in the optimization of the combination model is, the low efficiency for deciding an appropriate ratio of two kinds of information due to manual tuning such ratios.

In the recent years, deep learning is used in many areas, such as speech recognition [6], image classification [7] and so on. As we known, neural network is a good framework for nonlinear computation with the elements that simulate the structure and properties of neurons [8]. Among them, autoencoder is proposed by Ng [5], which aims to obtain features from the input data. We found that autoencoder and spectral methods all intent to obtain the low-dimensional approximation of the corresponding matrix. Based on this similarity, we adopt autoencoder as a breakpoint method to solve the disadvantages of linear optimization, and to achieve the incorporation of these two different spectral methods.

In order to not only take the advantage of spectral methods but also achieve the incorporation of linkage and node content information, we propose an autoencoder-based method for community detection using the normalized-cut and modularity maximization. Our work is inspired by the similarity in theory between autoencoder and spectral methods in terms of getting an intrinsic structure of the spectral matrix. The framework of our main idea is shown in Fig. 1. We realized that autoencoder is a type of unsupervised learning methods, and thus only treat the low-dimensional encoding in the hidden layer as the network representation. In our method, we adopt modularity maximization model and normalized-cut to portray linkage and content information, separately, and construct the spectral matrices (i.e. modularity matrix and Markov matrix) as the input of the autoencoder. We design a unified objective function to get the best reconstruction of the combination matrix that consists of modularity matrix and Markov matrix, while make use of autoencoder to get a best encoding in the hidden layer as the network representation which is used to finding communities nicely. Furthermore, by building a multi-layers autoencoder, we adopt deep autoencoder to obtain a powerful representation by means of the deep structure, and combine with the intrinsic information of the original data to achieve an improvement for discovering communities. In total, our framework has three main contributions as follows:

  • First, in theory both the autoencoder and spectral methods are related to the low-dimensional approximation of the specified corresponding matrix, i.e. the modularity matrix and Markov matrix. This study utilizes the autoencoder to obtain a low-dimensional encoding which can best reconstruct the joint matrix consisting of the modularity matrix and the Markov matrix, and treats this encoding as the graph representations for community detection. The important point is to propose an autoencoder-based method that can achieve the joint optimization of modularity model and normalized-cut without a seam.

  • Second, this encoding supplies a nonlinear way to integrate the linkage and content information. This helps to further improve the performance of community detection, when using both those two types of information. The autoencoder not only encodes the important factors of the data in the process of reducing the dimension, but also automatically learns the weight of the relationship among the various factors to obtain the minimum of reconstruction error. In this framework, therefore, the performance improvement of our method is realized by its self-tuning characteristic, but not depend on adjusting balance factor.

  • Furthermore, by stacking a series of autoencoders, we built a multi-layer autoencoder in favor of enhancing better generalization ability of the encoding in the hidden layer. Benefitting from the deep structure, we get a powerful encoding with both topological and content information, which can effectively aid network community detection.

The rest of the paper is organized as follows. In Section 2, we give a brief review of the related work. The proposed framework and the relevant algorithms are introduced in Section 3. Next, datasets and experimental setting are described in Section 4, and followed experimental evaluation and the analysis of the balance factor in this Section demonstrate the effectiveness of the proposed new method. The paper is then concluded in Section 5.

Section snippets

Related work

There exist three aspects of relevant works regarding the topic here, which are community detection with topological structure or content information alone, and the combination of links and node contents. As described above, it is not appropriate that node community memberships are denoted by using the network topology or content information alone. Combining topology and content achieves an improvement for community detection, as showed in studies [1], [2], [3], [17], [19], [20]. However, they

Framework of community detection using deep learning

To fully utilize the advantages of deep neural network (DNN) for combining network topology and content information, we re-examine the properties of the modularity maximization model and normalized cut, which are the leading models for community detection, and re-search the DNN framework to find out a certain approach appropriate to realize a seamless combination of the different modalities. These two models seek for a low-rank embedding to represent of the community structure and reconstruct

Experiments

Here we give the comparisons between our algorithm and some state-of-the-art community detection algorithms on a wealth of real-world networks. There are also some detailed descriptions on the baseline methods, networked datasets and experimental setups.

Conclusion

In this paper, we proposed a new method that fuses the topological and content information for community detection using the deep learning framework. This study is inspired by the similarity between autoencoder and spectral methods in terms of a low-dimensional approximation of the spectral matrix. The proposed method provides a nice approach for finding a low-dimensional encoding of the community structure and achieving collective optimization of modularity and normalized-cut without a seam.

Acknowledgments

The work was supported by National Basic Research Program of China (2013CB329301), and Natural Science Foundation of China (61772361, 61503281, 61303110).

Jinxin Cao received his B.S. degree from Shandong Normal University, China, in 2010. Since 2011, he has been a post-graduate and Ph.D. joint program student in school of Computer Science and Technology at Tianjin University, China. His research interests includes data mining and analysis of complex networks.

References (39)

  • ShaoL. et al.

    Performance evaluation of deep feature learning for RGB-D image/video classification

    Inf. Sci.

    (2017)
  • A.Z. Broder et al.

    Min-wise independent permutations

    J. Comput. Syst. Sci.

    (2000)
  • ChenK. et al.

    Network cross-validation for determining the number of communities in network data

    J. Am. Stat. Assoc.

    (2017)
  • M. Sachan et al.

    Using content and interactions for discovering communities in social networks

  • S. Ganguly et al.

    Author2vec: learning author representations by combining content and link information

  • HeD. et al.

    Joint identification of network communities and semantics via integrative modeling of network topologies and node contents

  • ShiJ. et al.

    Normalized cuts and image segmentation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • NgA

    Sparse autoencoder

    CS294A Lecture Notes

    (2011)
  • DengL

    Deep learning: from speech recognition to language and multimodal processing

    APSIPA Trans. Signal and Inf. Process.

    (2016)
  • H. Bourlard et al.

    Auto-association by multilayer perceptrons and singular value decomposition

    Biol. Cybern.

    (1988)
  • C. Eckart et al.

    The approximation of one matrix by another of lower rank

    Psychometrika

    (1936)
  • M.E. Newman

    Modularity and community structure in networks

    Proc. Ntl. Acad. Sci.

    (2006)
  • S.T. Roweis et al.

    Nonlinear dimensionality reduction by locally linear embedding

    Science

    (2000)
  • R. Balasubramanyan et al.

    Block-LDA: jointly modeling entity-annotated text and entity-entity links

  • I. Derényi et al.

    Clique percolation in random networks

    Phys. Rev. Lett.

    (2005)
  • A.P. Streich et al.

    Multi-assignment clustering for Boolean data

  • L. Danon et al.

    Comparing community structure identification

    J. Stat. Mech. Theory Exp.

    (2005)
  • J. Mcauley et al.

    Discovering social circles in ego networks

    ACM Trans. Knowl. Discov. Data

    (2014)
  • ZhangY. et al.

    Community detection in networks with node features

    Electron. J. Stat.

    (2016)
  • Cited by (66)

    • Dynamic community detection including node attributes

      2023, Expert Systems with Applications
    • Gumbel-SoftMax based graph convolution network approach for community detection

      2023, International Journal of Information Technology (Singapore)
    View all citing articles on Scopus

    Jinxin Cao received his B.S. degree from Shandong Normal University, China, in 2010. Since 2011, he has been a post-graduate and Ph.D. joint program student in school of Computer Science and Technology at Tianjin University, China. His research interests includes data mining and analysis of complex networks.

    Di Jin received his B.S., M.S. and Ph.D. degree from College of Computer Science and Technology, Jilin University, China, in 2005, 2008 and 2012. Since 2012, he has been associate professor in Tianjin University. His current research interests include artificial intelligence, complex network analysis, and network community detection.

    Liang Yang received his Ph.D. degree from the State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences in 2016. He has been an assistant professor in School of Information Engineering, Tianjin University of Commerce. His current research interests include community detection, machine learning and computer vision.

    Jianwu Dang graduated from Tsinghua University, China, in 1982, and got his M.S. at the same university in 1984. He worked for Tianjin University as a lecture from 1984 to 1988. He was awarded the PhD from Shizuoka University, Japan in 1992. Since 2001, he has moved to Japan Advanced Institute of Science and Technology (JAIST). His research interests are in all the fields of speech production, speech synthesis, and speech cognition.

    View full text