An overlapping community detection algorithm based on rough clustering of links
Introduction
Technological advancement in the contemporary digital world has led to the formation and mapping of systems that consist of many interconnected dynamical units. Such systems are collectively referred to as complex systems because their constituent units are capable of interacting not only with each other but also with the environment [1]. Moreover, the working of interconnected units of a complex system is not well understood and could change with time [2]. The examples of complex systems include a social club that requires cooperation among members to achieve a common goal, the Internet consisting of millions of interconnected routers, World Wide Web comprising of billions of interconnected webpages, telecom network consisting of billions of cell phones, the human brain comprising of millions of synaptically connected neurons and a power grid system consisting of electric substations connected through transmission lines [3]. The digital innovation offering cheap storage and fast data sharing has enabled us to keep track of the data pertaining to intricate networks operating behind such complex systems [4]. For instance, social network companies such as LinkedIn, Twitter and Facebook have developed accurate repositories of professional and friendship ties; the human connectome project aims to systematically map a network of neural connections in the human brain [5]. Since, all the networks arising from complex systems irrespective of the diversity in their origin, nature, size and scope follow a common set of organizing principles, they are essentially similar in architecture [6]. The organization of network nodes into cohesive subgroups (generally known as communities) is one of the fundamental principles of complex networks. Therefore, community detection is essential to explore and understand the dynamics of complex real-world systems.
Communities are defined as thickly connected subgroups of nodes within a complex network such that link density within subgroups is much higher than the density of links between subgroups [7]. The dense inter-community connectedness exists due to organizational or functional components within a network such as groups of friends in a social network of students, groups of companies with interlocking directorates in organizational networks, and groups of actors associated with common movies in actor networks [3]. Given the applicability and complexity of the community detection problem, an effective community identification approach can transform business and society by extracting unforeseen insights [6]. In today’s digital world, applications of community detection range from topic detection in collaborative tagging systems to event detection on social media [8]. In the past, community detection techniques have been used to devise antiterrorism strategies and identify key players in terrorist networks [9]. In the marketing domain, functional patterns of the human brain can be understood through the community structure of neuronal connections and help in positioning and pricing of products [10]. Moreover, community detection in the underlying interaction network of proteins in human cells can help in understanding the alterations between healthy and disease states [11].
Communities within complex networks can be interpreted in two ways: a closely connected group of nodes (node-centric) or a group of closely interrelated links (link-centric). Researchers have been investigating community detection in complex networks for more than a decade with a major focus on node-centric community detection but relatively few attempts at link-centric community detection. Moreover, state-of-the-art community detection algorithms still have several limitations that confine their applications in the real-world [11]. A majority of the existing algorithms assign each node to only one community such that the identified communities are disjoint, some algorithms are domain-specific, some algorithms require a priori knowledge about the number of communities in a complex network, some algorithms are not scalable to large networks, and some are unsusceptible to variations in size of communities. One of the major challenges currently encountered for community detection in complex networks is the identification of overlapping communities that are closer to the reality of today’s interconnected world [12].
The community structure within a network may not only have overlapping communities but also nested communities wherein one community is contained within another. For example, a number of ethnic communities may be present inside a location-based community [13]. Fig. 1 shows two examples, first one is a schematic network with only disjoint communities; second is a schematic network with disjoint, overlapping and nested communities occurring simultaneously. Since links are more idiosyncratic in nature as compared to nodes, pervasive overlaps in community structure are better discovered by clustering of links rather than nodes [14]. This fundamental difference in the characteristics of nodes and links motivated us to develop a new link-centric community detection algorithm that is receptive to distinctive features of community structure in complex networks while being capable of detecting co-occurring disjoint, nested and overlapping communities.
In this paper, we design a community detection algorithm based on the unexplored theoretical synergy of the concepts of link communities in network science and upper approximation in the rough set paradigm. The proposed Link Upper Approximation Method for COMmunity detection is abbreviated as LUAMCOM. To illustrate the functioning of the proposed algorithm, we explain it on a schematic network shown in Fig. 1(b). Further, experiments have been performed on ten complex network datasets from diverse domains. Experimental results and comparative analysis of the proposed algorithm with relevant state-of-the-art algorithms demonstrate its effectiveness and superiority. The contributions of this paper are as follows:
- 1.
We introduce the concept of mutual link reciprocity to deduce similarity between links of a complex network. The constraining and merging of link subsets in each iteration are performed using mutual link reciprocity.
- 2.
Based on the theoretical underpinning that relationships (links) are more unique as compared to actors (nodes), we interpret communities as groups of closely interrelated links and propose a community detection algorithm based on rough clustering of links.
- 3.
The integrated use of mutual link reciprocity and link-based rough clustering enables the detection of disjoint, nested and overlapping communities. The experimental and comparative evaluation of real-world networks from diverse domains establish the viability of the proposed algorithm.
This paper is organized as follows. In Section 2, we review the state-of-the-art community detection algorithms. Section 3 explains the methodology and complexity of the proposed algorithm. The functioning of the proposed algorithm on a schematic network dataset is illustrated in Section 4. Section 5 presents experimental results and comparative analysis with state-of-the-art algorithms. Finally, Section 6 concludes the paper with business implications and directions for future research.
Section snippets
Related work
Researchers across varied disciplines have devoted significant resources to discover communities in complex networks wherein communities have been referred to as clusters, modules, cohesive subgroups or complexes contingent upon research setting and discipline [15]. The increasing importance of networks has led to a number of research papers and survey reports that have discussed the advances in the field of community detection [8], [16], [17].
“Modularity” was the first metric proposed for
Proposed algorithm: Link Upper Approximation Method for Community Detection
This section presents the proposed community detection algorithm in detail. A complex network can be represented by where denotes a finite set of nodes and denotes a finite set of links. In an undirected and unweighted network, a link is an unordered pair of nodes such that and represent the same link. Formally, the objective of detecting communities in is to unravel a partition where nodes may have more than one community memberships such
Illustration on a schematic network
In order to explain the functioning of the proposed algorithm, we illustrate it on a schematic network shown in Fig. 1(b) constructed for evaluation of co-occurring disjoint, overlapping and nested community detection [27]. The real-world social, biological and technological networks collectively referred to as complex networks constitute intricate community configurations similar to those in the schematic network. A visual inspection of this schematic network consisting of 23 nodes and 64
Experimental setup
In order to examine the effectiveness of the proposed community detection algorithm, we conducted experiments on network datasets representing complex systems in diverse domains. These networks include a network of friendships at a karate club [64], network of bones in the human skull [65], network of associations between dolphins [66], network of friendships in a high school [67], network of political books sold online by Amazon.com [68], co-appearance network of football teams [69],
Conclusion
The digital revolution has led to the pervasiveness of complex networks in all domains of socio-techno-economic life. The detection of communities in complex networks aims to understand the dynamics between entities of a system and behavior of the system as a whole. Community detection techniques aid in designing efficient routing protocols for mobile ad hoc networks (MANETs) and search engine optimization for the worldwide web. Community detection cannot only be used for portfolio management
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Samrat Gupta is currently an Assistant Professor in Information Systems area at Indian Institute of Management (IIM), Ahmedabad. His current research interests include Analysis of Complex Networks, Soft Computing and Crowdsourcing. He did his doctorate in IT & Systems from Indian Institute of Management (IIM), Lucknow and B.E. in Information Technology from Punjab Engineering College (PEC), Chandigarh. Prior to his doctoral work, he was working as Software Engineer with Computer Sciences
References (76)
- et al.
Complex networks: Structure and dynamics
Phys. Rep.
(2006) - et al.
The WU–Minn human connectome project: An overview
Neuroimage
(2013) Community detection in graphs
Phys. Rep.
(2010)- et al.
Identifying and evaluating community structure in complex networks
Pattern Recognit. Lett.
(2010) - et al.
Detect overlapping and hierarchical community structure in networks
Physica A
(2009) - et al.
Core discovery in hidden networks
Data Knowl. Eng.
(2019) - et al.
An overlapping community detection algorithm in complex networks based on information theory
Data Knowl. Eng.
(2018) - et al.
Identification of overlapping community structure in complex networks using fuzzy c-means clustering
Physica A
(2007) - et al.
A link clustering based overlapping community detection algorithm
Data Knowl. Eng.
(2013) - et al.
An ant colony based algorithm for overlapping community detection in complex networks
Physica A
(2015)
An incremental approach to attribute reduction from dynamic incomplete decision systems in rough set theory
Data Knowl. Eng.
Fuzzy time series model based on probabilistic approach and rough set rule induction for empirical research in stock markets
Data Knowl. Eng.
MMR: An algorithm for clustering categorical data using Rough Set Theory
Data Knowl. Eng.
Rough clustering of sequential data
Data Knowl. Eng.
Fuzzy-rough community in social networks
Pattern Recognit. Lett.
Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic
Fuzzy Sets and Systems
An upper approximation based community detection algorithm for complex networks
Decis. Support Syst.
Graph clustering
Comp. Sci. Rev.
Fast hierarchical clustering and its validation
Data Knowl. Eng.
Hierarchical community detection with applications to real-world network analysis
Data Knowl. Eng.
Community detection in networks: A multidisciplinary review
J. Netw. Comput. Appl.
Complex networks
Eur. Phys. J. B
Statistical mechanics of complex networks
Rev. Modern Phys.
Research commentary—the new organizing logic of digital innovation: an agenda for information systems research
Inf. Syst. Res.
Network Science Introdution
Large-scale network analysis for online social brand advertising
MIS Q.
Community detection in Social Media: Performance and application considerations
Data Min. Knowl. Discov.
Detecting new trends in terrorist networks
A lateralized brain network for visuospatial attention
Nature Neurosci.
Detecting communities based on network topology
Sci. Rep.
Overlapping community detection at scale: a nonnegative matrix factorization approach
Network structure and observational learning: Evidence from a location-based social network
J. Manage. Inf. Syst.
Link communities reveal multiscale complexity in networks
Nature
A comparative analysis of community detection algorithms on artificial networks
Sci. Rep.
Overlapping community detection in networks: The state-of-the-art and comparative study
ACM Comput. Surv.
Finding and evaluating community structure in networks
Phys. Rev. E
Fast unfolding of communities in large networks
J. Stat. Mech. Theory Exp.
Finding community structure in very large networks
Phys. Rev. E
Cited by (32)
Fake news believability: The effects of political beliefs and espoused cultural values
2023, Information and ManagementCitation Excerpt :Thus, misinformation/disinformation cascades (based on fake news) could be more easily dissipated by focusing on the SNSs in which they are more likely to occur. Furthermore, fact-checking efforts could be even more focalized, and perhaps stay ahead of the fake news game, by means of well-intentioned, nonintrusive online community detection efforts [128–130]. In addition, specific fake news items being shared virally by conservatives using SNSs could be identified and then addressed in terms of their argument consistency, as well as their argument-induced belief change characteristics.
A spatial filtering inspired three-way clustering approach with application to outlier detection
2021, International Journal of Approximate ReasoningCitation Excerpt :There is no such constraint in 3WC and it is possible that an object belongs to the upper bound or support set of only one cluster. Some recent and notable works in rough clustering are found in the references [24,39,45,48]. 3WC and shadowed set clustering: In shadowed sets, it is possible that an object is not part of the core or shadowed set (or equivalently fringe set) of any cluster [47].
Understanding digitally enabled complex networks: a plural granulation based hybrid community detection approach
2024, Information Technology and PeopleA Three-Way Clustering Mechanism to Handle Overlapping Regions
2024, IEEE AccessA community-aware centrality framework based on overlapping modularity
2023, Social Network Analysis and MiningResearch on the Method of Hypergraph Construction of Information Systems Based on Set Pair Distance Measurement
2023, Electronics (Switzerland)
Samrat Gupta is currently an Assistant Professor in Information Systems area at Indian Institute of Management (IIM), Ahmedabad. His current research interests include Analysis of Complex Networks, Soft Computing and Crowdsourcing. He did his doctorate in IT & Systems from Indian Institute of Management (IIM), Lucknow and B.E. in Information Technology from Punjab Engineering College (PEC), Chandigarh. Prior to his doctoral work, he was working as Software Engineer with Computer Sciences Corporation (CSC), India. This work is a part of his doctoral research that has been awarded at reputed conferences and consortiums such as IDRBT Doctoral Colloquium (2016), Hyderabad, Ph.D. Consortium (2017), IIT Mumbai and ALLDATA Conference (2019), Valencia, Spain.
Pradeep Kumar is currently an Associate Professor in IT and Systems area at Indian Institute of Management Lucknow, India. Prior to joining IIM, he was associated with SET Labs, Infosys Technologies Ltd. as a researcher. He served Institute for Development and Research in Banking Technology (IDRBT), established by Reserve Bank of India (RBI), as a Research Fellow. He received his Ph.D. from Department of Computer and Information Sciences, Hyderabad University, India. He holds M.Tech. and B.Sc.(Engg.) in Computer Science. His area of interest includes Data Warehousing, Data Mining, Web Mining, Text Mining and Big Data analytics. In his credit he has more than 40 authored research papers in international journals and conferences of repute.