An overlapping community detection algorithm based on rough clustering of links

https://doi.org/10.1016/j.datak.2019.101777Get rights and content

Highlights

  • A new algorithm for community detection in complex networks has been proposed.

  • Experiments have been performed on ten benchmark networks from diverse domains.

  • Comparative analysis has been performed with the relevant state-of-the-art methods.

  • The proposed algorithm competes favorably with state-of-the-art methods.

Abstract

The growth of networks is prevalent in almost every field due to the digital transformation of consumers, business and society at large. The unfolding of community structure in such real-world complex networks is crucial since it aids in gaining strategic insights leading to informed decisions. Moreover, the co-occurrence of disjoint, overlapping and nested community patterns in such networks demands methodologically rigorous community detection algorithms so as to foster cumulative tradition in data and knowledge engineering. In this paper, we introduce an algorithm for overlapping community detection based on granular information of links and concepts of rough set theory. First, neighborhood links around each pair of nodes are utilized to form initial link subsets. Subsequently, constrained linkage upper approximation of the link subsets is computed iteratively until convergence. The upper approximation subsets obtained during each iteration are constrained and merged using the notion of mutual link reciprocity. The experimental results on ten real-world networks and comparative evaluation with state-of-the-art community detection algorithms demonstrate the effectiveness of the proposed algorithm.

Introduction

Technological advancement in the contemporary digital world has led to the formation and mapping of systems that consist of many interconnected dynamical units. Such systems are collectively referred to as complex systems because their constituent units are capable of interacting not only with each other but also with the environment [1]. Moreover, the working of interconnected units of a complex system is not well understood and could change with time [2]. The examples of complex systems include a social club that requires cooperation among members to achieve a common goal, the Internet consisting of millions of interconnected routers, World Wide Web comprising of billions of interconnected webpages, telecom network consisting of billions of cell phones, the human brain comprising of millions of synaptically connected neurons and a power grid system consisting of electric substations connected through transmission lines [3]. The digital innovation offering cheap storage and fast data sharing has enabled us to keep track of the data pertaining to intricate networks operating behind such complex systems [4]. For instance, social network companies such as LinkedIn, Twitter and Facebook have developed accurate repositories of professional and friendship ties; the human connectome project aims to systematically map a network of neural connections in the human brain [5]. Since, all the networks arising from complex systems irrespective of the diversity in their origin, nature, size and scope follow a common set of organizing principles, they are essentially similar in architecture [6]. The organization of network nodes into cohesive subgroups (generally known as communities) is one of the fundamental principles of complex networks. Therefore, community detection is essential to explore and understand the dynamics of complex real-world systems.

Communities are defined as thickly connected subgroups of nodes within a complex network such that link density within subgroups is much higher than the density of links between subgroups [7]. The dense inter-community connectedness exists due to organizational or functional components within a network such as groups of friends in a social network of students, groups of companies with interlocking directorates in organizational networks, and groups of actors associated with common movies in actor networks [3]. Given the applicability and complexity of the community detection problem, an effective community identification approach can transform business and society by extracting unforeseen insights [6]. In today’s digital world, applications of community detection range from topic detection in collaborative tagging systems to event detection on social media [8]. In the past, community detection techniques have been used to devise antiterrorism strategies and identify key players in terrorist networks [9]. In the marketing domain, functional patterns of the human brain can be understood through the community structure of neuronal connections and help in positioning and pricing of products [10]. Moreover, community detection in the underlying interaction network of proteins in human cells can help in understanding the alterations between healthy and disease states [11].

Communities within complex networks can be interpreted in two ways: a closely connected group of nodes (node-centric) or a group of closely interrelated links (link-centric). Researchers have been investigating community detection in complex networks for more than a decade with a major focus on node-centric community detection but relatively few attempts at link-centric community detection. Moreover, state-of-the-art community detection algorithms still have several limitations that confine their applications in the real-world [11]. A majority of the existing algorithms assign each node to only one community such that the identified communities are disjoint, some algorithms are domain-specific, some algorithms require a priori knowledge about the number of communities in a complex network, some algorithms are not scalable to large networks, and some are unsusceptible to variations in size of communities. One of the major challenges currently encountered for community detection in complex networks is the identification of overlapping communities that are closer to the reality of today’s interconnected world [12].

The community structure within a network may not only have overlapping communities but also nested communities wherein one community is contained within another. For example, a number of ethnic communities may be present inside a location-based community [13]. Fig. 1 shows two examples, first one is a schematic network with only disjoint communities; second is a schematic network with disjoint, overlapping and nested communities occurring simultaneously. Since links are more idiosyncratic in nature as compared to nodes, pervasive overlaps in community structure are better discovered by clustering of links rather than nodes [14]. This fundamental difference in the characteristics of nodes and links motivated us to develop a new link-centric community detection algorithm that is receptive to distinctive features of community structure in complex networks while being capable of detecting co-occurring disjoint, nested and overlapping communities.

In this paper, we design a community detection algorithm based on the unexplored theoretical synergy of the concepts of link communities in network science and upper approximation in the rough set paradigm. The proposed Link Upper Approximation Method for COMmunity detection is abbreviated as LUAMCOM. To illustrate the functioning of the proposed algorithm, we explain it on a schematic network shown in Fig. 1(b). Further, experiments have been performed on ten complex network datasets from diverse domains. Experimental results and comparative analysis of the proposed algorithm with relevant state-of-the-art algorithms demonstrate its effectiveness and superiority. The contributions of this paper are as follows:

  • 1.

    We introduce the concept of mutual link reciprocity to deduce similarity between links of a complex network. The constraining and merging of link subsets in each iteration are performed using mutual link reciprocity.

  • 2.

    Based on the theoretical underpinning that relationships (links) are more unique as compared to actors (nodes), we interpret communities as groups of closely interrelated links and propose a community detection algorithm based on rough clustering of links.

  • 3.

    The integrated use of mutual link reciprocity and link-based rough clustering enables the detection of disjoint, nested and overlapping communities. The experimental and comparative evaluation of real-world networks from diverse domains establish the viability of the proposed algorithm.

This paper is organized as follows. In Section 2, we review the state-of-the-art community detection algorithms. Section 3 explains the methodology and complexity of the proposed algorithm. The functioning of the proposed algorithm on a schematic network dataset is illustrated in Section 4. Section 5 presents experimental results and comparative analysis with state-of-the-art algorithms. Finally, Section 6 concludes the paper with business implications and directions for future research.

Section snippets

Related work

Researchers across varied disciplines have devoted significant resources to discover communities in complex networks wherein communities have been referred to as clusters, modules, cohesive subgroups or complexes contingent upon research setting and discipline [15]. The increasing importance of networks has led to a number of research papers and survey reports that have discussed the advances in the field of community detection [8], [16], [17].

“Modularity” was the first metric proposed for

Proposed algorithm: Link Upper Approximation Method for Community Detection

This section presents the proposed community detection algorithm in detail. A complex network can be represented by Γ=N,L where N={n1,n2,,np} denotes a finite set of nodes and LN×N denotes a finite set of links. In an undirected and unweighted network, a link is an unordered pair of nodes such that i,j and j,i represent the same link. Formally, the objective of detecting communities in Γ is to unravel a partition C=C1,C2,C3,Cm where nodes may have more than one community memberships such

Illustration on a schematic network

In order to explain the functioning of the proposed algorithm, we illustrate it on a schematic network shown in Fig. 1(b) constructed for evaluation of co-occurring disjoint, overlapping and nested community detection [27]. The real-world social, biological and technological networks collectively referred to as complex networks constitute intricate community configurations similar to those in the schematic network. A visual inspection of this schematic network consisting of 23 nodes and 64

Experimental setup

In order to examine the effectiveness of the proposed community detection algorithm, we conducted experiments on network datasets representing complex systems in diverse domains. These networks include a network of friendships at a karate club [64], network of bones in the human skull [65], network of associations between dolphins [66], network of friendships in a high school [67], network of political books sold online by Amazon.com [68], co-appearance network of football teams [69],

Conclusion

The digital revolution has led to the pervasiveness of complex networks in all domains of socio-techno-economic life. The detection of communities in complex networks aims to understand the dynamics between entities of a system and behavior of the system as a whole. Community detection techniques aid in designing efficient routing protocols for mobile ad hoc networks (MANETs) and search engine optimization for the worldwide web. Community detection cannot only be used for portfolio management

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Samrat Gupta is currently an Assistant Professor in Information Systems area at Indian Institute of Management (IIM), Ahmedabad. His current research interests include Analysis of Complex Networks, Soft Computing and Crowdsourcing. He did his doctorate in IT & Systems from Indian Institute of Management (IIM), Lucknow and B.E. in Information Technology from Punjab Engineering College (PEC), Chandigarh. Prior to his doctoral work, he was working as Software Engineer with Computer Sciences

References (76)

  • ShuW. et al.

    An incremental approach to attribute reduction from dynamic incomplete decision systems in rough set theory

    Data Knowl. Eng.

    (2015)
  • TeohH.J. et al.

    Fuzzy time series model based on probabilistic approach and rough set rule induction for empirical research in stock markets

    Data Knowl. Eng.

    (2008)
  • ParmarD. et al.

    MMR: An algorithm for clustering categorical data using Rough Set Theory

    Data Knowl. Eng.

    (2007)
  • KumarP. et al.

    Rough clustering of sequential data

    Data Knowl. Eng.

    (2007)
  • KunduS. et al.

    Fuzzy-rough community in social networks

    Pattern Recognit. Lett.

    (2015)
  • ZadehL.A.

    Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic

    Fuzzy Sets and Systems

    (1997)
  • KumarP. et al.

    An upper approximation based community detection algorithm for complex networks

    Decis. Support Syst.

    (2017)
  • SchaefferS.E.

    Graph clustering

    Comp. Sci. Rev.

    (2007)
  • DashM. et al.

    Fast hierarchical clustering and its validation

    Data Knowl. Eng.

    (2003)
  • YangB. et al.

    Hierarchical community detection with applications to real-world network analysis

    Data Knowl. Eng.

    (2013)
  • JavedM.A. et al.

    Community detection in networks: A multidisciplinary review

    J. Netw. Comput. Appl.

    (2018)
  • AmaralL.A.N. et al.

    Complex networks

    Eur. Phys. J. B

    (2004)
  • AlbertR. et al.

    Statistical mechanics of complex networks

    Rev. Modern Phys.

    (2002)
  • YooY. et al.

    Research commentary—the new organizing logic of digital innovation: an agenda for information systems research

    Inf. Syst. Res.

    (2010)
  • BarabásiA.L.

    Network Science Introdution

    (2014)
  • ZhangK. et al.

    Large-scale network analysis for online social brand advertising

    MIS Q.

    (2016)
  • PapadopoulosS. et al.

    Community detection in Social Media: Performance and application considerations

    Data Min. Knowl. Discov.

    (2012)
  • WiilU.K. et al.

    Detecting new trends in terrorist networks

  • De SchottenM.T. et al.

    A lateralized brain network for visuospatial attention

    Nature Neurosci.

    (2011)
  • LiuW. et al.

    Detecting communities based on network topology

    Sci. Rep.

    (2014)
  • YangJ. et al.

    Overlapping community detection at scale: a nonnegative matrix factorization approach

  • ShiZ. et al.

    Network structure and observational learning: Evidence from a location-based social network

    J. Manage. Inf. Syst.

    (2013)
  • AhnY.-Y. et al.

    Link communities reveal multiscale complexity in networks

    Nature

    (2010)
  • YangZ. et al.

    A comparative analysis of community detection algorithms on artificial networks

    Sci. Rep.

    (2016)
  • XieJ. et al.

    Overlapping community detection in networks: The state-of-the-art and comparative study

    ACM Comput. Surv.

    (2013)
  • NewmanM.E. et al.

    Finding and evaluating community structure in networks

    Phys. Rev. E

    (2004)
  • BlondelV.D. et al.

    Fast unfolding of communities in large networks

    J. Stat. Mech. Theory Exp.

    (2008)
  • ClausetA. et al.

    Finding community structure in very large networks

    Phys. Rev. E

    (2004)
  • Cited by (32)

    • Fake news believability: The effects of political beliefs and espoused cultural values

      2023, Information and Management
      Citation Excerpt :

      Thus, misinformation/disinformation cascades (based on fake news) could be more easily dissipated by focusing on the SNSs in which they are more likely to occur. Furthermore, fact-checking efforts could be even more focalized, and perhaps stay ahead of the fake news game, by means of well-intentioned, nonintrusive online community detection efforts [128–130]. In addition, specific fake news items being shared virally by conservatives using SNSs could be identified and then addressed in terms of their argument consistency, as well as their argument-induced belief change characteristics.

    • A spatial filtering inspired three-way clustering approach with application to outlier detection

      2021, International Journal of Approximate Reasoning
      Citation Excerpt :

      There is no such constraint in 3WC and it is possible that an object belongs to the upper bound or support set of only one cluster. Some recent and notable works in rough clustering are found in the references [24,39,45,48]. 3WC and shadowed set clustering: In shadowed sets, it is possible that an object is not part of the core or shadowed set (or equivalently fringe set) of any cluster [47].

    View all citing articles on Scopus

    Samrat Gupta is currently an Assistant Professor in Information Systems area at Indian Institute of Management (IIM), Ahmedabad. His current research interests include Analysis of Complex Networks, Soft Computing and Crowdsourcing. He did his doctorate in IT & Systems from Indian Institute of Management (IIM), Lucknow and B.E. in Information Technology from Punjab Engineering College (PEC), Chandigarh. Prior to his doctoral work, he was working as Software Engineer with Computer Sciences Corporation (CSC), India. This work is a part of his doctoral research that has been awarded at reputed conferences and consortiums such as IDRBT Doctoral Colloquium (2016), Hyderabad, Ph.D. Consortium (2017), IIT Mumbai and ALLDATA Conference (2019), Valencia, Spain.

    Pradeep Kumar is currently an Associate Professor in IT and Systems area at Indian Institute of Management Lucknow, India. Prior to joining IIM, he was associated with SET Labs, Infosys Technologies Ltd. as a researcher. He served Institute for Development and Research in Banking Technology (IDRBT), established by Reserve Bank of India (RBI), as a Research Fellow. He received his Ph.D. from Department of Computer and Information Sciences, Hyderabad University, India. He holds M.Tech. and B.Sc.(Engg.) in Computer Science. His area of interest includes Data Warehousing, Data Mining, Web Mining, Text Mining and Big Data analytics. In his credit he has more than 40 authored research papers in international journals and conferences of repute.

    View full text