Abstract
We propose an algorithm for finding overlapping community structure in very large networks. The algorithm is based on the label propagation technique of Raghavan, Albert and Kumara, but is able to detect communities that overlap. Like the original algorithm, vertices have labels that propagate between neighbouring vertices so that members of a community reach a consensus on their community membership. Our main contribution is to extend the label and propagation step to include information about more than one community: each vertex can now belong to up to v communities, where v is the parameter of the algorithm. Our algorithm can also handle weighted and bipartite networks. Tests on an independently designed set of benchmarks, and on real networks, show the algorithm to be highly effective in recovering overlapping communities. It is also very fast and can process very large and dense networks in a short time.
Export citation and abstract BibTeX RIS
GENERAL SCIENTIFIC SUMMARY Introduction and background. Many networks possess community structure, such that the density of edges within some groups of vertices is greater than the density of edges between groups. Numerous methods exist for detecting these communities (or 'modules' or 'clusters') in networks, but most of them assume that communities are disjoint. However, in some networks it is possible for vertices to appear in more than one community. Different methods are needed to find these overlapping communities and fewer such algorithms have been designed.
Main results. We present a very fast algorithm to detect overlapping communities in networks. It is based on a previous algorithm by Raghavan, Albert, and Kumara for disjoint communities but is extended to find overlapping communities. It can also handle weighted and bipartite networks. We show experimentally that it is usually highly effective in discovering overlapping communities and that it is faster than other algorithms.
Wider implications. Many real-world network datasets contain overlapping communities, and extremely large networks are becoming increasingly common. A large network is potentially easier to understand and analyse at a mesoscopic level, organized into communities instead of individual vertices. This requires the development of overlapping community detection algorithms with a very low time complexity. This work is a step in that direction.