Elsevier

Computer Communications

Volume 108, 1 August 2017, Pages 110-122
Computer Communications

A label-based evolutionary computing approach to dynamic community detection

https://doi.org/10.1016/j.comcom.2017.04.009Get rights and content

Highlights

  • The combination of a multi-objective genetic algorithm and a label propagation algorithm fully diversifies initial individual clusters and yields high clustering quality.

  • The label propagation algorithm integrated into the mutation operator improves clustering quality and convergence speed.

  • The proposed solution is of a linear time complexity with respect to the number of edges in the network.

Abstract

Dynamic community detection is the process to discover the structure of and determine the number of communities in dynamic networks consisting of a series of temporal network snapshots. Due to the time-varying characteristics of such networks, community detection must consider both the quality of the community structure and the temporal cost that quantifies the difference between the current network snapshot and previous ones. In this paper, we propose a label-based multi-objective optimization algorithm for dynamic community detection, which employs a genetic algorithm to optimize two objectives, i.e. clustering quality and temporal cost. A label propagation method is designed and used to initialize the network’s communities and restrict the conditions of the mutation process to further improve the detection efficiency and effectiveness. We conduct experiments on both synthesized and empirical datasets, and extensive results illustrate that the proposed method outperforms a state-of-the-art algorithm in terms of detection quality and speed, which sheds light on its wide applications to various complex networks with dynamic structures such as rapidly growing online social networks.

Introduction

Many real-life systems are modeled as networks exemplified by communication networks, collaboration networks, and social networks. Due to the rapid increase in participating nodes and interaction dynamics, such networks have become very complex with some common characteristics including small-world and scale-free, but the community structure is typically of more importance. In general, a community is formed based on high internal cohesion and low coupling with outsiders.

Community detection has found many important applications, such as recommendation, data clustering [1], social network analysis [2], network vulnerability assessment [3], [4], [5], and so on. Community detection in static networks has been extensively investigated and the label propagation algorithm [6], [7] is well recognized as one of the fastest algorithms suitable for large-scale networks. However, most real-life complex networks would evolve over time with changing structures, hence exhibiting dynamic characteristics, which make community detection a challenging task.

A number of algorithms have been proposed for dynamic community detection, mainly in two categories, i.e. incremental clustering [8], and evolutionary clustering [9]. Incremental clustering performs a clustering of the initial network followed by fine-tuning in subsequent networks. Incremental clustering runs fast at the sacrifice of clustering quality. Evolutionary clustering, which is the focus of this work, was proposed by Chakrabarti et al. [9] to define a framework of slowly varying networks where two metrics are used in community detection, (i) snapshot quality that measures the clustering quality based on the current network topology, and (ii) temporal cost that quantifies the difference between the current clustering result and the previous one. Obviously, a successfully detected community structure would be of high snapshot quality and low temporal cost.

In this paper, we formulate dynamic community detection as a multi-objective optimization problem and design a label-based dynamic multi-objective genetic algorithm, referred to as L-DMGA, to detect the community structure in dynamic networks with two objectives to (i) maximize the snapshot quality and (ii) minimize the temporal cost. To take advantage of the efficiency and performance of the label propagation algorithm, we integrate it into the proposed genetic algorithm to achieve desirable clustering results. We employ the modularity approach introduced by Newman and Girvan [10] to maximize the snapshot quality and the Normalized Mutual Information (NMI) proposed by Danon [11] to minimize the temporal cost. NMI is a well-known entropy-based approach in information theory to measure the similarity of the community structures between the current and previous time steps.

We conduct experiments on both synthesized and real-life datasets, and the extensive results show the very good performance of the method in terms of detection result and speed, when compared with the existing algorithms, which sheds light on its wide applications to various complex networks with dynamic structures such as rapidly growing online social networks. The main contributions of our work are summarized as follows:

  • (1)

    The combination of a multi-objective genetic algorithm and a label propagation algorithm fully diversifies initial individual clusters and yields high clustering quality.

  • (2)

    The label propagation algorithm integrated into the mutation operator improves clustering quality and convergence speed.

  • (3)

    The proposed solution is of a linear time complexity with respect to the number of edges.

The rest of the paper is organized as follows. Section 2 conducts a survey of related work. Section 3 describes dynamic networks and evolutionary clustering, and formulates dynamic community detection as a multi-objective optimization problem. Section 4 designs the proposed algorithm with focus on gene representation and mutation process. Section 5 presents the community detection results from synthesized and real-life datasets, and compares the performance with other existing algorithms. Section 6 analyzes the algorithm scalability and Section 7 concludes our work.

Section snippets

Related work

The analysis of dynamic networks and the detection of community structures have attracted an increasing amount of attention [12], [13], [14]. Due to the dynamic nature of time-varying networks, the traditional methods for community detection in static networks may not perform well in dynamic networks. Hence, recent research efforts have been shifted to the design of community detection algorithms in dynamic networks including social networks [15], [16].

Dynamic networks and community

We consider a network snapshot modeled by a weighted graph Gt=(Vt,Et), where Vt is the set of nodes and Et is the set of edges at time step t. A time-varying network over T time steps is denoted as G={G1,G2,,GT}.

Community is a well-studied property of many networks. Generally, nodes in the same community are densely connected, while nodes from different communities are sparsely connected. We denote the set of k communities of network Gt at time step t as Ct={Ct1,Ct2,,Ctk}, where CtpCtq=,p,q

L-DMGA algorithm

We propose a label-based multi-objective evolutionary community detection algorithm to optimize both snapshot quality and temporal cost without the need to balance factor α in advance. The solutions containing the Pareto Front of the MOP represent the best compromise satisfying both objectives.

We leverage DYNMOGA proposed by Folino et al. [26] in the design of our multi-objective genetic algorithm (MOGA). DYNMOGA generates a population that represents partitions of a network and then rank them

Experimental results

We implement L-DMGA in MATLAB. As well recognized, the parameter settings significantly affect the performance of genetic algorithms. The work in [36] shows that it is a great challenge to find appropriate parameter values for evolutionary algorithms. Following a general guideline, we choose a high crossover rate and a low mutation rate. In our experiment, we set crossover rate to be 0.9 and mutation rate to be 0.1. Besides, we set the population size to be 100, the number of iterations to be

Scalability analysis

One limitation of genetic algorithms is the repeated calculation of fitness function, which oftentimes leads to a high time complexity. L-DMGA performs an efficient calculation of fitness function in large-scale networks.

To show the scalability of L-DMGA, we use dataset 1 (avgDegree=16,z=5,nC=10%) and increase the number of nodes n from 128, 256, 512, 1024, 2048, to 4096. The population size p varies in the interval [50, 100, 200], and the number g of generations varies within the range of [50,

Conclusion

In this paper, we proposed a new multi-objective optimization algorithm combining the label propagation algorithm to detect desirable community structures in dynamic networks. During the initialization of the algorithm, the label propagation algorithm generates satisfactory community structures, and at each time step, our algorithm achieves a good tradeoff between maximizing the desirable community structure at the current time step and minimizing the variation of the community structures in

Acknowledgments

This work was supported by the Soft Science Research Project of Chengdu Science and Technology Bureau (2015-RK00-00247-ZF), the Scientific Research Project of Sichuan Provincial Public Security Department (2015SCYYCX06), and the National Natural Science Foundation of China (61300192).

References (45)

  • Z. Lin et al.

    Ck-lpa: efficient community detection algorithm based on label propagation with community kernel

    Physica A

    (2014)
  • U.V. Luxburg

    A tutorial on spectral clustering

    Stat. Comput.

    (2007)
  • M.E.J. Newman

    Modularity and community structure in networks.

    2006 APS March Meeting

    (2006)
  • P.Y. Chen et al.

    Node removal vulnerability of the largest component of a network

    IEEE GLOBALSIP 2013

    (2014)
  • P.Y. Chen et al.

    Assessing and safeguarding network resilience to nodal attacks

    IEEE Commun. Mag.

    (2014)
  • M. Coscia et al.

    A classification for community discovery methods in complex networks

    Stat. Anal. Data Mining

    (2011)
  • J. Xie et al.

    Labelrankt: Incremental community detection in dynamic networks via label propagation

    Proceedings of the Workshop on Dynamic Networks Management and Mining

    (2013)
  • S.Z. Bo Shan et al.

    Ic:incremental algorithm for community identifiction in dynamic social networks

    J. Softw.

    (2009)
  • D. Chakrabarti et al.

    Abstract evolutionary clustering

    Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

    (2006)
  • M.E.J. Newman et al.

    Finding and evaluating community structure in networks.

    Phys Rev. E Stat. Nonlinear Soft Matter Phys.

    (2004)
  • L. Danon et al.

    Comparing community structure identification

    J. Stat. Mech. Theory Exp.

    (2005)
  • T. Aynaud et al.

    Communities in evolving networks: definitions, detection, and analysis techniques

    Dynamics On and Of Complex Networks, Volume 2

    (2013)
  • R. Cazabet et al.

    Dynamic community detection

    Encyclopedia of Social Network Analysis and Mining

    (2014)
  • P. Holme

    Temporal Networks

    (2014)
  • S. Papadopoulos et al.

    Community detection in social media

    Data Min. Knowl. Discovery

    (2012)
  • M. Spiliopoulou

    Evolution in Social Networks: A Survey

    (2011)
  • L.D. Yang B

    Force-based incremental algorithm for mining community structure in dynamic network

    J. Comput. Sci. Technol.

    (2006)
  • J. Sun et al.

    Graphscope: parameter-free mining of large time-evolving graphs

    Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12–15, 2007

    (2007)
  • Y. Chi et al.

    Evolutionary spectral clustering by incorporating temporal smoothness

    Kdd 07 Acm Sigkdd International Conference on Knowledge Discovery & Data

    (2007)
  • L. Tang et al.

    Identifying evolving groups in dynamic multi-mode networks

    IEEE Trans. Knowl. Data Eng.

    (2011)
  • K.S. Xu et al.

    Adaptive evolutionary clustering

    Data Min. Knowl. Discovery

    (2014)
  • M.S. Kim et al.

    A particle-and-density based evolutionary clustering method for dynamic networks

    Proc. Vldb Endowment

    (2009)
  • Cited by (27)

    • Multi-objective optimization algorithm based on characteristics fusion of dynamic social networks for community discovery

      2022, Information Fusion
      Citation Excerpt :

      Zhou et al. [16] proposed a method of community detection in dynamic networks based on multi-objective biogeography based optimization and decomposition mechanism [17]. Niu et al. [18] aggregated the label propagation algorithm with the multi-objective genetic algorithm to discover dynamic communities. Zhou et al. [19] discretized the multi-objective bat algorithm [20], proposing a discrete bat algorithm for dynamic network community detection (MDBA).

    • On a two-stage progressive clustering algorithm with graph-augmented density peak clustering

      2022, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      In the past decade, we have witnessed an explosive growth in both volume and velocity of big data, and real-time streaming data analysis has become increasingly important in many applications such as dynamic network analysis (Niu et al., 2017), traffic management (Au et al., 2015), financial time series forecasting (Jahandari et al., 2018), GPS and mobile device tracking (Hao et al., 2016), deep learning (Xue et al., 2021b,a; O’Neil et al., 2021), sentimental analysis (Doan and Kalita, 2016), and many others (Krawczyk et al., 2017; He et al., 2011; Nayak et al., 2018).

    • ACSIMCD: A 2-phase framework for detecting meaningful communities in dynamic social networks

      2021, Future Generation Computer Systems
      Citation Excerpt :

      Therefore communities at a time-step t are specified as a trade-off between an optimal solution at t and the known past. Usually, the main objective of algorithms based on this approach, such as [8,28–33], is to improve the computational cost compared to the two-stage approach by incrementally updating the previous community structure. However, this approach suffers from some weaknesses that can be listed as follows:

    • PODCD: Probabilistic overlapping dynamic community detection

      2021, Expert Systems with Applications
      Citation Excerpt :

      The hidden Markov model is applied to detect dynamic communities in (Crane, 2015). The label propagation algorithm aligned with a genetic algorithm was used to detect dynamic communities (Niu et al., 2017). In (Yang et al., 2018), a model-based on the matrix factorization was employed by allowing the edges to be just removed or added.

    View all citing articles on Scopus
    View full text