A label-based evolutionary computing approach to dynamic community detection

doi:10.1016/j.comcom.2017.04.009

Computer Communications

Volume 108, 1 August 2017, Pages 110-122

https://doi.org/10.1016/j.comcom.2017.04.009 Get rights and content

Highlights

•
The combination of a multi-objective genetic algorithm and a label propagation algorithm fully diversifies initial individual clusters and yields high clustering quality.
•
The label propagation algorithm integrated into the mutation operator improves clustering quality and convergence speed.
•
The proposed solution is of a linear time complexity with respect to the number of edges in the network.

Abstract

Dynamic community detection is the process to discover the structure of and determine the number of communities in dynamic networks consisting of a series of temporal network snapshots. Due to the time-varying characteristics of such networks, community detection must consider both the quality of the community structure and the temporal cost that quantifies the difference between the current network snapshot and previous ones. In this paper, we propose a label-based multi-objective optimization algorithm for dynamic community detection, which employs a genetic algorithm to optimize two objectives, i.e. clustering quality and temporal cost. A label propagation method is designed and used to initialize the network’s communities and restrict the conditions of the mutation process to further improve the detection efficiency and effectiveness. We conduct experiments on both synthesized and empirical datasets, and extensive results illustrate that the proposed method outperforms a state-of-the-art algorithm in terms of detection quality and speed, which sheds light on its wide applications to various complex networks with dynamic structures such as rapidly growing online social networks.

Introduction

Many real-life systems are modeled as networks exemplified by communication networks, collaboration networks, and social networks. Due to the rapid increase in participating nodes and interaction dynamics, such networks have become very complex with some common characteristics including small-world and scale-free, but the community structure is typically of more importance. In general, a community is formed based on high internal cohesion and low coupling with outsiders.

Community detection has found many important applications, such as recommendation, data clustering [1], social network analysis [2], network vulnerability assessment [3], [4], [5], and so on. Community detection in static networks has been extensively investigated and the label propagation algorithm [6], [7] is well recognized as one of the fastest algorithms suitable for large-scale networks. However, most real-life complex networks would evolve over time with changing structures, hence exhibiting dynamic characteristics, which make community detection a challenging task.

A number of algorithms have been proposed for dynamic community detection, mainly in two categories, i.e. incremental clustering [8], and evolutionary clustering [9]. Incremental clustering performs a clustering of the initial network followed by fine-tuning in subsequent networks. Incremental clustering runs fast at the sacrifice of clustering quality. Evolutionary clustering, which is the focus of this work, was proposed by Chakrabarti et al. [9] to define a framework of slowly varying networks where two metrics are used in community detection, (i) snapshot quality that measures the clustering quality based on the current network topology, and (ii) temporal cost that quantifies the difference between the current clustering result and the previous one. Obviously, a successfully detected community structure would be of high snapshot quality and low temporal cost.

In this paper, we formulate dynamic community detection as a multi-objective optimization problem and design a label-based dynamic multi-objective genetic algorithm, referred to as L-DMGA, to detect the community structure in dynamic networks with two objectives to (i) maximize the snapshot quality and (ii) minimize the temporal cost. To take advantage of the efficiency and performance of the label propagation algorithm, we integrate it into the proposed genetic algorithm to achieve desirable clustering results. We employ the modularity approach introduced by Newman and Girvan [10] to maximize the snapshot quality and the Normalized Mutual Information (NMI) proposed by Danon [11] to minimize the temporal cost. NMI is a well-known entropy-based approach in information theory to measure the similarity of the community structures between the current and previous time steps.

We conduct experiments on both synthesized and real-life datasets, and the extensive results show the very good performance of the method in terms of detection result and speed, when compared with the existing algorithms, which sheds light on its wide applications to various complex networks with dynamic structures such as rapidly growing online social networks. The main contributions of our work are summarized as follows:

(1)
The combination of a multi-objective genetic algorithm and a label propagation algorithm fully diversifies initial individual clusters and yields high clustering quality.
(2)
The label propagation algorithm integrated into the mutation operator improves clustering quality and convergence speed.
(3)
The proposed solution is of a linear time complexity with respect to the number of edges.

The rest of the paper is organized as follows. Section 2 conducts a survey of related work. Section 3 describes dynamic networks and evolutionary clustering, and formulates dynamic community detection as a multi-objective optimization problem. Section 4 designs the proposed algorithm with focus on gene representation and mutation process. Section 5 presents the community detection results from synthesized and real-life datasets, and compares the performance with other existing algorithms. Section 6 analyzes the algorithm scalability and Section 7 concludes our work.

Section snippets

Related work

The analysis of dynamic networks and the detection of community structures have attracted an increasing amount of attention [12], [13], [14]. Due to the dynamic nature of time-varying networks, the traditional methods for community detection in static networks may not perform well in dynamic networks. Hence, recent research efforts have been shifted to the design of community detection algorithms in dynamic networks including social networks [15], [16].

Dynamic networks and community

We consider a network snapshot modeled by a weighted graph $G_{t} = (V_{t}, E_{t}),$ where V_t is the set of nodes and E_t is the set of edges at time step t. A time-varying network over T time steps is denoted as $G = {G_{1}, G_{2}, \dots, G_{T}}$ .

Community is a well-studied property of many networks. Generally, nodes in the same community are densely connected, while nodes from different communities are sparsely connected. We denote the set of k communities of network G_t at time step t as $C_{t} = {C_{t 1}, C_{t 2}, \dots, C_{t k}},$ where $C_{t p} \cap C_{t q} = ⌀,$ $p, q \in$

L-DMGA algorithm

We propose a label-based multi-objective evolutionary community detection algorithm to optimize both snapshot quality and temporal cost without the need to balance factor α in advance. The solutions containing the Pareto Front of the MOP represent the best compromise satisfying both objectives.

We leverage DYNMOGA proposed by Folino et al. [26] in the design of our multi-objective genetic algorithm (MOGA). DYNMOGA generates a population that represents partitions of a network and then rank them

Experimental results

We implement L-DMGA in MATLAB. As well recognized, the parameter settings significantly affect the performance of genetic algorithms. The work in [36] shows that it is a great challenge to find appropriate parameter values for evolutionary algorithms. Following a general guideline, we choose a high crossover rate and a low mutation rate. In our experiment, we set crossover rate to be 0.9 and mutation rate to be 0.1. Besides, we set the population size to be 100, the number of iterations to be

Scalability analysis

One limitation of genetic algorithms is the repeated calculation of fitness function, which oftentimes leads to a high time complexity. L-DMGA performs an efficient calculation of fitness function in large-scale networks.

To show the scalability of L-DMGA, we use dataset 1 ( $a v g D e g r e e = 16,$ $z = 5,$ $n C = 10 %$ ) and increase the number of nodes n from 128, 256, 512, 1024, 2048, to 4096. The population size p varies in the interval [50, 100, 200], and the number g of generations varies within the range of [50,

Conclusion

In this paper, we proposed a new multi-objective optimization algorithm combining the label propagation algorithm to detect desirable community structures in dynamic networks. During the initialization of the algorithm, the label propagation algorithm generates satisfactory community structures, and at each time step, our algorithm achieves a good tradeoff between maximizing the desirable community structure at the current time step and minimizing the variation of the community structures in

Acknowledgments

This work was supported by the Soft Science Research Project of Chengdu Science and Technology Bureau (2015-RK00-00247-ZF), the Scientific Research Project of Sichuan Provincial Public Security Department (2015SCYYCX06), and the National Natural Science Foundation of China (61300192).

References (45)

Z. Lin et al.
Ck-lpa: efficient community detection algorithm based on label propagation with community kernel
Physica A
(2014)
U.V. Luxburg
A tutorial on spectral clustering
Stat. Comput.
(2007)
M.E.J. Newman
Modularity and community structure in networks.
2006 APS March Meeting
(2006)
P.Y. Chen et al.
Node removal vulnerability of the largest component of a network
IEEE GLOBALSIP 2013
(2014)
P.Y. Chen et al.
Assessing and safeguarding network resilience to nodal attacks
IEEE Commun. Mag.
(2014)
M. Coscia et al.
A classification for community discovery methods in complex networks
Stat. Anal. Data Mining
(2011)
J. Xie et al.
Labelrankt: Incremental community detection in dynamic networks via label propagation
Proceedings of the Workshop on Dynamic Networks Management and Mining
(2013)
S.Z. Bo Shan et al.
Ic:incremental algorithm for community identifiction in dynamic social networks
J. Softw.
(2009)
D. Chakrabarti et al.
Abstract evolutionary clustering
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
(2006)
M.E.J. Newman et al.
Finding and evaluating community structure in networks.
Phys Rev. E Stat. Nonlinear Soft Matter Phys.
(2004)

L. Danon et al.

Comparing community structure identification

J. Stat. Mech. Theory Exp.

(2005)

T. Aynaud et al.

Communities in evolving networks: definitions, detection, and analysis techniques

Dynamics On and Of Complex Networks, Volume 2

(2013)

R. Cazabet et al.

Dynamic community detection

Encyclopedia of Social Network Analysis and Mining

(2014)

P. Holme

Temporal Networks

(2014)

S. Papadopoulos et al.

Community detection in social media

Data Min. Knowl. Discovery

(2012)

M. Spiliopoulou

Evolution in Social Networks: A Survey

(2011)

L.D. Yang B

Force-based incremental algorithm for mining community structure in dynamic network

J. Comput. Sci. Technol.

(2006)

J. Sun et al.

Graphscope: parameter-free mining of large time-evolving graphs

Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12–15, 2007

(2007)

Y. Chi et al.

Evolutionary spectral clustering by incorporating temporal smoothness

Kdd 07 Acm Sigkdd International Conference on Knowledge Discovery & Data

(2007)

L. Tang et al.

Identifying evolving groups in dynamic multi-mode networks

IEEE Trans. Knowl. Data Eng.

(2011)

K.S. Xu et al.

Adaptive evolutionary clustering

Data Min. Knowl. Discovery

(2014)

M.S. Kim et al.

A particle-and-density based evolutionary clustering method for dynamic networks

Proc. Vldb Endowment

(2009)

Cited by (27)

Multi-objective based unbiased community identification in dynamic social networks
2024, Computer Communications
A network is a topological arrangement of its two basic elements, nodes and edges. Networks in the real world are not static. They tend to evolve with time, causing the set of nodes and edges to alter as well. They consist of several hidden bits of data whose analysis have drawn significant research interest. Identifying groups of similar nodes or edges helps in gaining knowledge about their interaction patterns. These groups are known as communities, which can be disjoint or overlapping. The dynamic nature of the network also impact its current community structure and makes it difficult to keep track of them. The paper presents a multi-objective optimization approach for identifying community structure in a dynamic network. A network is considered as a series of events generated over time, where each event is a new edge introduced at a time. The proposed algorithm uses three objective functions that are inspired from network properties. The community of a node corresponding to an input edge is updated by an algorithm based on its newness. The algorithm uses the Pareto front principle to identify the optimal community. The algorithm is evaluated over 12 datasets and compared to 10 state-of-the-art algorithms. It shows superior performance on real and connected datasets and also performs well for disconnected datasets. The algorithm is evaluated using both accuracy and quality metrics, with the quality metrics slightly outweighing the accuracy metrics.
Multi-objective optimization algorithm based on characteristics fusion of dynamic social networks for community discovery
2022, Information Fusion
Citation Excerpt :
Zhou et al. [16] proposed a method of community detection in dynamic networks based on multi-objective biogeography based optimization and decomposition mechanism [17]. Niu et al. [18] aggregated the label propagation algorithm with the multi-objective genetic algorithm to discover dynamic communities. Zhou et al. [19] discretized the multi-objective bat algorithm [20], proposing a discrete bat algorithm for dynamic network community detection (MDBA).
The network structure exhibits a variety of changes over time. Fusing this structure and the development of communities in dynamic networks plays an important role in analyzing the evolution and development of the entire network. How to ensure the division of the community structure in social network big data, as well as ensure the continuity of the community between the current time and previous time period, are issues that need to be explored. This problem can be solved by fusing the three characteristics of temporal variability, stability, and continuity in dynamic social network communities, and by adopting the multi-objective optimization method to detect community structures in dynamic networks. The probability fusion method is added to the initial step of the algorithm to generate suitable network partitions and ensure fast convergence and high accuracy. Two neighboring fusion strategies are proposed that are suitable for communities: the neighbor diversity strategy and the neighbor crowd strategy. These two strategies make different changes to the candidate network partitions. A continuity metric for dynamic community evolution is formulated to compare the similarity of the dynamic network communities of two consecutive time steps. Experiments on synthetic datasets and actual datasets prove that the proposed method in this paper provides better performance than existing methods.
On a two-stage progressive clustering algorithm with graph-augmented density peak clustering
2022, Engineering Applications of Artificial Intelligence
Citation Excerpt :
In the past decade, we have witnessed an explosive growth in both volume and velocity of big data, and real-time streaming data analysis has become increasingly important in many applications such as dynamic network analysis (Niu et al., 2017), traffic management (Au et al., 2015), financial time series forecasting (Jahandari et al., 2018), GPS and mobile device tracking (Hao et al., 2016), deep learning (Xue et al., 2021b,a; O’Neil et al., 2021), sentimental analysis (Doan and Kalita, 2016), and many others (Krawczyk et al., 2017; He et al., 2011; Nayak et al., 2018).
Due to the rapidly growing volume and velocity of big data, real-time streaming data analysis has become increasingly important in many applications. To discover knowledge from such data, a wide range of machine learning techniques have been proposed and used in practice. Among them, clustering, which aims at grouping objects into different classes on the basis of their similarity, is the most common form of unsupervised learning. However, most existing clustering algorithms are designed for static data, and hence are not best suited for streaming data. In this paper, we propose PC-DPC, a two-stage progressive clustering algorithm with graph-augmented density peak clustering. PC-DPC first identifies clusters of streaming data using an improved density peak clustering algorithm, and then merges newly arriving data into the existing data pool by measuring inter-cluster structural similarity, which considers the distance between a center and representative points. We illustrate the superiority of PC-DPC over several state-of-the-art clustering algorithms in terms of clustering accuracy and running time on publicly available benchmark datasets.
ACSIMCD: A 2-phase framework for detecting meaningful communities in dynamic social networks
2021, Future Generation Computer Systems
Citation Excerpt :
Therefore communities at a time-step t are specified as a trade-off between an optimal solution at t and the known past. Usually, the main objective of algorithms based on this approach, such as [8,28–33], is to improve the computational cost compared to the two-stage approach by incrementally updating the previous community structure. However, this approach suffers from some weaknesses that can be listed as follows:
Detecting and analyzing community structure is a challenging topic in dynamic social network analysis. Although the number of methods in this area is on the rise, there are only a few algorithms that can discover meaningful communities based on different aspects of social networks. Indeed, social networks contain various information sources that can be used to analyze them. The most important part of this information is related to users’ topics of interest (content information) and users’ interactions (structure information). One promising solution to discover meaningful communities is to combine these two concepts. Based on this, we introduce ACSIMCD, a 2-phase framework for discovering and updating community structure without recomputing them from scratch at each snapshot. This article mainly includes two parts. In the first part, a static community detection algorithm which is called Content and Structure Information based Method for Community Detection (CSIMCD for short) is proposed to discover the initial community structure. The CSIMCD uses a hybrid approach founded on statistical and semantic measures to extract the users’ topics of interest. Accordingly, the original network is divided into several clusters (topical clusters) so that each one represents a distinct topic, then by performing a link analysis on each topical cluster, the communities are detected. In the second part, we propose ACSIMCD (Adaptive CSIMCD), an adaptive method for detecting and updating community structure in dynamic social networks. More precisely, the ACSIMCD explores the topics of interest of each changed node to identify the topical cluster it belongs to. After that, we update the community structure in this topical cluster, and we keep others as they are. We compare the ACSIMCD model with algorithms from different approaches including content-based methods on real-world networks. The experimental results showed that ACSIMCD produces a community structure of high quality from the perspective of links and interests compared with the classical methods, and that it is able to process network changes effectively in a reasonable time scale.
PODCD: Probabilistic overlapping dynamic community detection
2021, Expert Systems with Applications
Citation Excerpt :
The hidden Markov model is applied to detect dynamic communities in (Crane, 2015). The label propagation algorithm aligned with a genetic algorithm was used to detect dynamic communities (Niu et al., 2017). In (Yang et al., 2018), a model-based on the matrix factorization was employed by allowing the edges to be just removed or added.
Community detection is an important task to reveal hidden structures of real-world complex networks which are vary over time. Most of the existing works on the dynamic community detection assumes the sparse connectivity between communities and supposes that the number of nodes and communities in different snapshots is constant. In this work, a probabilistic overlapping community detection method called PODCD is proposed that considers the task of detecting communities as a non-negative matrix factorization problem. The proposed method considers the more likely assumption of dense connections between communities and utilizes a probabilistic model to control the dynamics of community structure. The proposed method uses the block coordinate decent method to solve the objective function of the matrix factorization model. This solver estimates non-negative latent factor to speeds up the computation of gradients. We demonstrate the efficiency of the proposed method by performing experiments on several synthetic and real-world dynamic networks. The obtained results reveal that the proposed method outperforms the earlier algorithms on evolving networks in terms of well-known evaluation criteria.
Community detection in networks using bio-inspired optimization: Latest developments, new results and perspectives with a selection of recent meta-heuristics
2020, Applied Soft Computing Journal
Detecting groups within a set of interconnected nodes is a widely addressed problem that can model a diversity of applications. Unfortunately, detecting the optimal partition of a network is a computationally demanding task, usually conducted by means of optimization methods. Among them, randomized search heuristics have been proven to be efficient approaches. This manuscript is devoted to providing an overview of community detection problems from the perspective of bio-inspired computation. To this end, we first review the recent history of this research area, placing emphasis on milestone studies contributed in the last five years. Next, we present an extensive experimental study to assess the performance of a selection of modern heuristics over weighted directed network instances. Specifically, we combine seven global search heuristics based on two different similarity metrics and eight heterogeneous search operators designed ad-hoc. We compare our methods with six different community detection techniques over a benchmark of 17 Lancichinetti–Fortunato–Radicchi network instances. Ranking statistics of the tested algorithms reveal that the proposed methods perform competitively, but the high variability of the rankings leads to the main conclusion: no clear winner can be declared. This finding aligns with community detection tools available in the literature that hinge on a sequential application of different algorithms in search for the best performing counterpart. We end our research by sharing our envisioned status of this area, for which we identify challenges and opportunities which should stimulate research efforts in years to come.

View all citing articles on Scopus

View full text

A label-based evolutionary computing approach to dynamic community detection

Highlights

Abstract

Introduction

Section snippets

Related work

Dynamic networks and community

L-DMGA algorithm

Experimental results

Scalability analysis

Conclusion

Acknowledgments

Physica A

A tutorial on spectral clustering

Stat. Comput.

Modularity and community structure in networks.

2006 APS March Meeting

Node removal vulnerability of the largest component of a network

IEEE GLOBALSIP 2013

Assessing and safeguarding network resilience to nodal attacks

IEEE Commun. Mag.

A classification for community discovery methods in complex networks

Stat. Anal. Data Mining

Labelrankt: Incremental community detection in dynamic networks via label propagation

Proceedings of the Workshop on Dynamic Networks Management and Mining

Ic:incremental algorithm for community identifiction in dynamic social networks

J. Softw.

Abstract evolutionary clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Finding and evaluating community structure in networks.

Phys Rev. E Stat. Nonlinear Soft Matter Phys.

Comparing community structure identification

J. Stat. Mech. Theory Exp.

Communities in evolving networks: definitions, detection, and analysis techniques

Dynamics On and Of Complex Networks, Volume 2

Dynamic community detection

Encyclopedia of Social Network Analysis and Mining

Temporal Networks

Community detection in social media

Data Min. Knowl. Discovery

Evolution in Social Networks: A Survey

Force-based incremental algorithm for mining community structure in dynamic network

J. Comput. Sci. Technol.

Graphscope: parameter-free mining of large time-evolving graphs

Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12–15, 2007

Evolutionary spectral clustering by incorporating temporal smoothness

Kdd 07 Acm Sigkdd International Conference on Knowledge Discovery & Data

Identifying evolving groups in dynamic multi-mode networks

IEEE Trans. Knowl. Data Eng.

Adaptive evolutionary clustering

Data Min. Knowl. Discovery

A particle-and-density based evolutionary clustering method for dynamic networks

Proc. Vldb Endowment