skip to main content
research-article
Open Access

Direction-optimizing Label Propagation Framework for Structure Detection in Graphs: Design, Implementation, and Experimental Analysis

Published:13 December 2022Publication History

Skip Abstract Section

Abstract

Label Propagation is not only a well-known machine learning algorithm for classification but also an effective method for discovering communities and connected components in networks. We propose a new Direction-optimizing Label Propagation Algorithm (DOLPA) framework that enhances the performance of the standard Label Propagation Algorithm (LPA), increases its scalability, and extends its versatility and application scope. As a central feature, the DOLPA framework relies on the use of frontiers and alternates between label push and label pull operations to attain high performance. It is formulated in such a way that the same basic algorithm can be used for finding communities or connected components in graphs by only changing the objective function used. Additionally, DOLPA has parameters for tuning the processing order of vertices in a graph to reduce the number of edges visited and improve the quality of solution obtained. We present the design and implementation of the enhanced algorithm as well as our shared-memory parallelization of it using OpenMP. We also present an extensive experimental evaluation of our implementations using the LFR benchmark and real-world networks drawn from various domains. Compared with an implementation of LPA for community detection available in a widely used network analysis software, we achieve at most five times the F-Score while maintaining similar runtime for graphs with overlapping communities. We also compare DOLPA against an implementation of the Louvain method for community detection using the same LFR-graphs and show that DOLPA achieves about three times the F-Score at just 10% of the runtime. For connected component decomposition, our algorithm achieves orders of magnitude speedups over the basic LP-based algorithm on large-diameter graphs, up to 13.2× speedup over the Shiloach-Vishkin algorithm, and up to 1.6× speedup over Afforest on an Intel Xeon processor using 40 threads.

Skip 1INTRODUCTION Section

1 INTRODUCTION

Background. The label propagation algorithm (LPA) is a machine learning approach for data classification where label information is propagated from labeled to unlabeled entities within a network [73]. The method is iterative: starting with an initial (typically small) subset of data points that have labels, the method successively propagates labels to unlabeled data points until all data points are properly labeled. Raghavan et al. [46] showed that LPA could be an effective method for identifying communities in networks. LPA has also been applied for finding other graph structures, including connected components [22, 29, 51].

Community detection is a fundamental structure and function discovery tool in network analysis. Its goal is to identify groups of tightly knit entities—known as communities—in social, biological, technological, and other types of complex networks. For example, video-sharing services such as YouTube cluster users with similar viewing interests together to enable recommendation systems that provide better services. As sizes of networks continue to increase, we generally need fast algorithms to enable large-scale graph analytics. Two of the main advantages of LPA over many other community detection algorithms are that its runtime is nearly linear in the size of the network and that it requires no a priori information about community structures in a network. Both of these make it practical for processing graphs with billions of edges.

Finding connected components in graphs is another well-studied fundamental problem in graph theory. Given an undirected, unweighted graph \(G=(V, E)\), finding a connected component amounts to finding a labeling \(L\) such that for any two vertices \(u\) and \(v\), \(L(u)=L(v)\) if \(u\) and \(v\) are in the same connected component and \(L(u)\ne L(v)\) if they are not.

Discovering connected components and community structures using LPA are essentially similar tasks. They differ only in how the objective function is defined and used. In both cases, initially, each vertex is assigned a unique label, for example, the ID of the vertex. In finding connected components, the goal then is to select the minimum label among a vertex’s adjacent vertices. The analogous goal in the case of community detection is to select the most frequently used label among a vertex’s adjacent vertices (or the label associated with the largest weight in a weighted graph). Therefore, the same algorithmic framework can be used to handle the two problems by appropriately defining the objective function and interpreting the labeling information \(L\) when the algorithms converge.

Proposed framework. In this article, we propose a frontier-based Direction-optimizing Label Propagation Algorithm (DOLPA) that enforces a desirable processing order on vertices, enhances performance, and offers a flexible design for scalable implementations for community detection and connected components identification in graphs within a unified framework.

In parallel graph processing, different graph algorithms use different processing orders on vertices. One extreme is the case where the processing order is random. Another extreme is where the processing order is strictly sequential. The Parallel Label Propagation (PLP) algorithm proposed by Staudt and Meyerhenke [56], which is a parallelization of the standard LPA algorithm, belongs to the first type (random order), achieved via multi-threading. DOLPA offers a tunable middle ground between the two extreme processing orders. More specifically, it provides a “knob” through which a tradeoff between runtime and quality of solution is achieved. To achieve this, DOLPA maintains frontiers and switches between two abstractions, called a pull operation and a push operation. DOLPA can be used for finding either connected components or community structures by merely customizing the objective functions. In this work, we apply DOLPA to both community detection and connected component decomposition. We experimentally show that the processing order that DOLPA enforces reduces the runtime compared to state-of-the-art algorithms in both community detection and connected component decomposition, and improves the quality of solution in community detection.

To jump-start DOLPA, we need to pre-select a set of vertices as “seeds” into the initial frontier using some strategy. Selection of seeds in the context of the LP algorithm has been looked at by a few previous studies but only to a limited extent. As part of the contributions of this article, we propose nine seeding strategies and conduct an extensive empirical study on their performance and provide insights on their operations.

We evaluate the performance of our OpenMP DOLPA implementation and the quality of solution produced with both synthetic graphs and real-world graphs. We show that, compared with PLP, DOLPA achieves at most five times the F-Score while maintaining similar runtime. We also compare DOLPA with a state-of-the-art parallel implementation of the Louvain method available in Grappolo [40] and find DOLPA to be both faster (speedup of 10) and more accurate (three times in F-Score). The Louvain method is based on modularity maximization.

Summary of contributions. In summary, in this work, we:

  • Introduce a new LP algorithm (called DOLPA) that uses frontier and applies direction optimization to LPA for graph structure detection (Section 3).

  • Apply DOLPA to community detection (Section 4) and connected component decomposition (Section 5).

  • Propose a variety of seeding strategies for community detection grouped as random, exact, and approximate (Section 4.4). The approximate seeding strategies use subgraph sampling to approximate a parent graph and save runtime.

  • Conduct extensive experimental analysis along three objectives:

    (i)

    Evaluate the runtime and the number of iterations obtained by DOLPA for connected component decomposition on synthetic and real-world graphs (Section 6)

    (ii)

    Empirically study how different seeding strategies behave in DOLPA in terms of time and quality of solution (Section 8)

    (iii)

    Evaluate the runtime and quality of solution obtained by DOLPA for community detection using both synthetic and real-world graphs (Section 9)

The DOLPA source code, scripts to reproduce all the experiments, and scripts to generate the plots in this article are made available at https://datascience.aeolus.wsu.edu/tlieu/dolpa. A portion of the material presented in this article has appeared in the conference paper [39] that focused on only community detection. The present work extends the conference paper in two major ways. First, the DOLPA framework is extended to apply to both community detection and finding connected components. In particular, the entire discussion of connected components algorithms (Section 5) and their experimental evaluations (Section 6) is new to this article. Second, the proposed seeding strategies (Section 4.4) and their detailed evaluation (Section 8) are also new to this article. In addition, the seeding strategies have informed design choices we made in crafting variants of DOLPA tailored for the best runtime performance and quality of solution depending on properties of the input graph and requirements of the application.

Skip 2LABEL PROPAGATION Section

2 LABEL PROPAGATION

Loosely speaking, the goal of community detection is to find a grouping (or clustering) of the vertices in a graph in such a way that intra-connection within a group is maximized and inter-connection between groups is minimized. The problem has attracted a lot of attention in the literature and a variety of different algorithms have been suggested for solving it. See the survey paper by Fortunato [14] for a comprehensive review.

One of the simplest and fastest algorithms for community detection is label propagation (LP). Garza and Schaeffer give a detailed investigation of LP-based approaches [16], and Section 10 of this article offers a brief review of related work. Since it forms the basis for our proposed algorithms, we briefly discuss the standard LP algorithm here.

In the standard LP algorithm, each vertex is initially assigned a unique label and at every iteration a vertex adopts a label that is the most commonly used among its adjacent vertices. As the algorithm progresses, densely connected groups of nodes form a consensus on their labels, and vertices that attain the same labels are grouped as communities.

Algorithm 1 outlines a parallelization of the standard LP as given in the PLP work [56]. In Line 1, a unique label is initially assigned to each vertex in the graph. In each iteration, each vertex’s adjacent vertices are examined in parallel so that the label of the vertex is updated with a maximum label among its adjacent vertices (Lines 6 to 1). A maximum label could be the most common label or the label associated with the highest weight. The stopping criterion in Line 4 is that the number of updated labels is less than a threshold \(\theta\), an input to the algorithm. The algorithm outputs an array of vertex labels, where a set of vertices having the same label signifies membership in the same community.

Lines 7–8 take O(\(d(v)\)) time, where \(d(v)\) is the degree of the vertex \(v\). In practice, LPA requires only a few iterations to converge, and therefore the runtime is nearly linear, i.e., \(O(k(m+n))\), where \(k\) is the number of iterations, \(m\) is the number of edges, and \(n\) is the number of vertices.

Skip 3DIRECTION-OPTIMIZING LABEL PROPAGATION Section

3 DIRECTION-OPTIMIZING LABEL PROPAGATION

In the early iterations of LPA, instead of passively propagating labels, we choose a more “important” label for a vertex \(v\) in the network and eagerly broadcast the label to \(v\)’s adjacent vertices. We abstract this kind of label update as a push operation. Furthermore, in LPA, for each vertex \(v\), the labels of \(v\)’s adjacent vertices are queried to obtain the maximum label or the minimum label. We abstract this label update operation as a pull operation.

To make best use of both label update operations, we use a frontier as a processing queue. The frontier is a set of paths originating from a given set of start vertices. By pre-selecting “seed” vertices into a frontier, we can jump-start the first iteration (we will discuss seeding strategies in Section 4.4). When the workload is small in the early iterations, we apply push for label updates; when the workload reaches a specified threshold or the iteration number reaches a direction switch threshold, we switch to pull for label updates. This idea of direction optimization was pioneered by Beamer et al. [6] in parallel Breadth-First Search and inspires our work.

DOLPA is summarized in Algorithm 2. The push abstraction is outlined in Algorithm 3, and the pull abstraction is given in Algorithm 4. In both Algorithms 3 and 4, the operations are specialized for community detection. The push abstraction tailored for finding connected components is given in Algorithm 5, and the variant of pull specialized for connected components is given in Algorithm 6.

Lines 1 to 5 in Algorithm 2 constitute the pre-processing steps. The purpose of Line 1 is to give a unique label to each vertex in the graph. We select a fraction \(\tau\) of vertices randomly as seed vertices and add them to the initial frontier. The fraction \(\tau\) is a seeding parameter passed as a part of the input. The pullswitch Boolean variable is initially turned off.

In each iteration, we process each vertex in the frontier and add all the adjacent vertices of the vertex to the next frontier if there is any label update. The direction optimization is triggered at Line 8 by the switch threshold \(\omega\), which is an input parameter. Specifically, when the number of iterations is equal to \(\omega\), pullswitch is turned on. Note that if \(\omega\) is equal to one, no push operation is applied. The stopping criterion for iterations at Line 6 is an empty frontier. The output of DOLPA is a set of labels associated with vertices, where vertices having the same label indicates that they belong to the same structure.

Complexity of DOLPA. We first analyze DOLPA’s complexity in the sequential setting. Both Push and Pull take O(\(d(v)\)) time each, where \(d(v)\) is the degree of the vertex \(v\). Selecting vertices and adding to the frontier is O(\(n\)) in time and also in space, where \(n\) is the number of vertices. The worst-case performance of DOLPA happens when the frontier holds all of the vertices in the graph, in which case the space complexity of the frontier is O(\(n\)). Therefore, the runtime of DOLPA is bounded by O(\(k(m+n)\)), where \(k\) is the number of iterations, \(m\) is the number of edges, and \(n\) is the number of vertices.

Let us consider next the upper bound for \(k\). For finding connected components, the upper bound for \(k\) is the diameter of the graph \(D\), which is the maximum steps required for a label to propagate from one end of a connected component to the other end. For community detection, the upper bound for \(k\) is in practice close to \(D\). However, there may be cases where two labels keep swapping (discussed shortly in Section 4.2) and \(k\) could be much larger than \(D\).

Now we analyze DOLPA in the parallel setting using the work-depth model [20]. In the work-depth model, the work \(w\) is the total number of operations executed by a computation, and the depth \(d\) is the longest chain of sequential dependencies in the computation. If \(p\) processors are available, with a randomized work-stealing scheduler, Brent’s scheduling principle dictates that the runtime is \(O(w/p + d)\). In DOLPA, the overall work is O(\(k(m+n)\)), and the depth is O(\(D \cdot logn\)). In the worst case, each frontier contains at most \(n\) vertices and there are at most \(D\) frontiers built throughout the algorithm. In this case, the depth is O(\(Dn\)), and the total work is O(\(D(m+n)\)).

Skip 4APPLICATION TO COMMUNITY DETECTION Section

4 APPLICATION TO COMMUNITY DETECTION

We discuss in this section in more detail how we adopt DOLPA for community detection. We also outline and address various challenges that arise in parallelizing DOLPA. Finally, we discuss nine seeding strategies for community detection proposed in this work.

4.1 Frontier Expansion

In DOLPA, the next frontier expands as either a push or pull operation updates the label of a vertex. For each vertex, the algorithm inserts all of the vertex’s neighbors into the next frontier without knowing whether any of its neighbors have been inserted. This situation happens in both the serial and the parallel cases. In DOLPA, we use a bitmap [43] to avoid duplicate entries in frontier expansion. A set bit indicates an ensuing insertion of that vertex, and hence a later insertion attempt is aborted. Line 15 and Line 21 in Algorithm 2 show how we test and set the bitmap.

The frontier expansion rate of DOLPA determines how fast the workload increases in each iteration. The rate is decided by the branching factor of the vertices in the current frontier. (A branching factor of a vertex is the number of unvisited vertices of it.)

4.2 Local Maxima and Label Swapping

Figure 1 shows a scenario where two vertices reach a local maxima and swap labels in parallel. The upper left vertex finds label 2 as the most common in its neighborhood, while the lower right vertex finds label 1 as the most common label in its neighborhood. In this scenario, these two vertices keep swapping their labels in each iteration. This is called label oscillation [46]. An oscillation also happens in the serial algorithm when the input graph is bipartite. To detect and prevent label oscillation, before a label is updated, the maximum label currently received is compared with the previous label. If the maximum label is the same as the previous label, a label swap is detected. When this happens, the vertex is manually marked as inactive and the algorithm moves on.

Fig. 1.

Fig. 1. An illustration of swapping labels.

A more practical way to handle label swapping is to ignore it. This can be achieved by applying the same stopping criterion as PLP: if the amount of labels updated in the previous iteration is less than a certain threshold, the iteration stops (Line 6). However, this stopping criterion depends on the condition that the termination threshold is greater than the number of label swapping.

4.3 Parallelizing Push and Pull

To effectively parallelize push and pull, we exploit two levels of parallelism in Algorithm 2: task level (coarse-grained) and vertex level (fine-grained). The coarse-grained parallelism is reflected in Line 11 and Line 17, where a push or pull task is performed on each vertex in the frontier. The fine-grained parallelism happens within push and pull, where the label propagation is performed on the neighborhood of a vertex \(v\). The reason we introduce two levels of parallelism is to address workload imbalance at the task level, where the vertices could have various number of neighbors. With the help of the two-level parallelism, DOLPA achieves a balanced workload.

When multiple vertices perform push concurrently, they access their neighborhood in parallel. Because of the shared neighbors, these vertices will “fight” for the labels of their common neighbors. This is a benign race that does not harm correctness of the algorithm but can hurt performance [32]. Benign race entails repeated work and makes the algorithm non-deterministic.

Each pull operation is almost data independent, disregarding insertion of vertices into the next frontier. But if we allow concurrent insertions into a frontier, the order of insertion into the frontier will force a processing order on the pull operations in the next iteration. We want to prevent this from happening. Therefore, we do not physically insert vertices into the next frontier during label propagation. Instead, we simply mark the bit belonging to that vertex. The frontier insertion actually happens, and does so only once, at the end of each iteration (at Line 25).

The frontier insertion is implemented in serial at the end of each iteration (at Line 25), instead of within multi-threading. A possible parallel implementation here is to let each thread insert the vertices into a thread-local frontier and then merge these into the global frontier in a critical section. Each thread would randomize the order of the insertion into the frontier, which is the same randomization that PLP algorithm uses. In practice, we observed that such a parallel implementation does converge faster than the serial implementation; however, the quality of the solution the parallel implementation produces is much lower than that of the serial implementation. Hence, we adopt the serial implementation in DOLPA.

Within pull, there are concurrent reads on the labels of the neighbors of a vertex \(v\), which is embarrassingly parallelizable. In practice, we accumulate the frequencies of the labels using std::unordered_map, which does not support concurrent write. Since we already have sufficient parallelism at the coarse-grained level, we adopt serial pull in our implementation.

4.4 Seeding Strategies

A set of seed (or influential) vertices forms the potential core of a community structure in the network. Nodes with high degree or clustering coefficient, fully connected cliques, and maximal cliques are usually treated as seeds of a community structure [72]. In general, a node with more connections could be viewed as more important in the network [64]. It has been proven that the membership contribution of a vertex to a community is highly related to its degree [48]. However, there has not been a principled study on how a seeding strategy should be chosen and how the combination of seeding strategy and parameter affects the performance of a community detection algorithm. We propose and investigate nine seeding strategies, grouped under three categories: random seeding strategy, exact seeding strategies, and approximate seeding strategies. Table 1 gives an overview of the strategies.

Table 1.
Seeding StrategyDescription
1.1. RandomRandomly sample a fraction \(\tau\) of vertices as seeds
2.1. High-degreeSort vertices on degrees and select a fraction \(\tau\) of high-degree vertices
2.2. Low-degreeSort vertices on degrees and select a fraction \(\tau\) of low-degree vertices
2.3. High-total degreeCount total degree of each vertex, sort the vertices on their total degrees, and select a fraction \(\tau\) of high-total-degree vertices
2.4. Low-total degreeCount total degree of each vertex, sort the vertices on their total degrees, and select a fraction \(\tau\) of low-total-degree vertices
3.1. High-degree sampling distributionRandomly sample a fraction \(\tau\) of vertices and sort them on degrees in descending order
3.2. Low-degree sampling distributionRandomly sample a fraction \(\tau\) of vertices and sort them on degrees in ascending order
3.3. High-total degreesampling distributionRandomly sample a fraction \(\tau\) of vertices, count total degree of these vertices, and sort them on total degrees in descending order
3.4. Low-total degreesampling distributionRandomly sample a fraction \(\tau\) of vertices, count total degree of these vertices, and sort them on total degrees in ascending order

Table 1. Nine Seeding Strategies in Three Categories: Random, Exact, and Approximate Seeding

4.4.1 Random Seeding Strategy.

Our baseline strategy, called the random seeding strategy, selects seed vertices randomly. In particular, each vertex is selected to be a seed independently with equal probability. Since the random seeding strategy has no bias over the vertices, it maintains a random order of the vertices in the frontier.

4.4.2 Exact Seeding Strategies.

Kloumann and Kleinberg [24] found that performance was higher when a large fraction of seed vertices’ edges were to vertices that lie within the same community. This suggests that a seed should have a good internal connectivity within the community relative to the rest of the graph. We adopt this idea and select high-degree vertices as seeds, a strategy we refer to as the High-degree seeding strategy.

In contrast to the High-degree seeding strategy, we also propose the Low-degree seeding strategy. The rationale for considering low-degree as a seeding strategy is that the label of a low-degree vertex has a high probability to join the community in which its neighbors belong. Due to the nature of a low-degree vertex—that it has a small number of adjacent vertices—a low-degree vertex often maintains its label information throughout the propagation process.

Gao and Zhang [15, 67] proposed selecting seed vertices based on the total degree of the neighborhood of a vertex. The total degree of a vertex is the sum of the degrees of its adjacent vertices. A neighborhood owning a high total degree thus means that it has a relatively high number of links to the community structure it resides in. We adopt this idea and propose selecting vertices with a high total degree of their neighborhood as a strategy, a strategy we call the High-total-degree seeding strategy. For a similar reason as in the Low-degree seeding strategy, we also propose selecting vertices having a low total degree of the neighborhood as a strategy, which we call the Low-total-degree seeding strategy.

Because the High-degree, Low-degree, High-total-degree, and Low-total-degree seeding strategies all compute the exact ordering of the selected seeds within the vertex set of the graph and maintain this ordering in the initial frontier, we categorize the four strategies as exact seeding strategies.

4.4.3 Approximate Seeding Strategies.

As a part of a pre-processing step, the cost of an exact seeding strategy can be quite high as the size of the network increases, due to the need to sort vertices based on either their degrees or the total degrees of their neighborhoods. A remedy here is to randomly sample a small portion of the graph that approximates the original graph. The two most common sampling schemes are subgraph sampling and neighborhood sampling [25]. A subgraph sampling scheme samples each vertex independently with equal probability and observes the subgraph induced by the sampled vertices. A neighborhood sampling scheme further observes the edges between the sampled vertices and their adjacent vertices.

Combining these sampling schemes with the exact seeding strategies, we propose several approximate seeding strategies. These strategies are High-degree sampling seeding strategy, Low-degree sampling seeding strategy, High-total-degree sampling seeding strategy, and Low-total-degree sampling seeding strategy. The first two of these use the subgraph sampling scheme, while the latter two use the neighborhood sampling scheme. In all of these approximate seeding strategies, we sample a fraction \(\tau\) of the vertices as seeds and sort them based on either the degrees of the sampled vertices or the total degrees of the sampled neighborhood. Then we insert seeds into the initial frontier.

Skip 5APPLICATION TO CONNECTED COMPONENT DECOMPOSITION Section

5 APPLICATION TO CONNECTED COMPONENT DECOMPOSITION

We discuss in this section in more detail how we adopt DOLPA for connected component decomposition. Finding connected components in a graph is an extensively studied problem. We focus on only parallel algorithms for the problem in this article. We group the algorithms into four categories: (1) algorithms based on using breadth-first search graph traversal [51, 53]; (2) the Shiloach-Vishkin (SV) algorithm [50] and its variants such as Afforest [59], which is the state-of-the-art SV algorithm specifically optimized for power-law degree graphs; (3) the Minimum Label Propagation for connected component decomposition (LPCC) [9, 36, 57, 65]; and (4) others [52]. The authors of these algorithms propose many practical optimizations for finding connected components. We discuss some of these optimizations in this section and apply them to our algorithm. We also show in our discussion that the SV algorithm and LPCC are not that different; most of the optimizations that are applicable to SV can also be applied to LPCC. We discuss the ones we adopt in our algorithms.

5.1 Fast SV Algorithm

Shiloach and Vishkin [50] introduced the first parallel connected components algorithm on the Parallel Random Access Machine (PRAM) model, as shown in Algorithm 7. The SV algorithm relies on two operations on a union-find data structure: “hook” (also known as “union”) and “compress” (also known as “pointer jumping”). The two operations are outlined in Line 9 and Line 18 of Algorithm 7, respectively. For a given edge, the hook operation combines vertices into trees such that the vertices in the same component belong to the same tree. The compress operation shortens the find path in the data structure by pointing the current vertex’s parent pointer directly to the root of the tree. Both operations can be performed in parallel. The complexity of this algorithm is \(O(mlogn)\), where \(m\) is the number of edges and \(n\) is the number of vertices.

5.2 push, pull, and hook

Notice that the purpose of the hook operation is essentially to “hook” the labels of two vertices: if the label of a vertex \(u\) is smaller than its adjacent vertex \(w\), the hook operation pushes the label of \(u\) to vertex \(w\); if the label of \(u\) is bigger than \(w\), the hook operation pulls the label of \(w\) to \(u\). In contrast, the push function in Algorithm 5 only pushes the label of \(u\) to \(w\) if label[\(u\)] is smaller than that of \(w\). The pull function in Algorithm 6 only pulls the label of \(w\) to \(u\) if label[\(w\)] is smaller than that of \(u\). Therefore, hook is a bidirectional operation combining both push and pull operations. Notice that hook may propagate labels two hops per iteration, where it pulls the label to itself first, then pushes the newly updated label.

Figure 2 shows how the labels are updated on a vertex (vertex 3 in the example) and its neighbors using push, pull, and hook operations, respectively. As we can see in the figure, each vertex starts with its own vertex ID as a label. push(vertex 3) pushes label 3 to vertices 1, 2, and 4, but only vertex 4 updates its label because it has a bigger label than 3. pull(vertex 3) pulls labels from vertices 1, 2, and 4 and updates its label to be the smallest one among them, 1. hook(vertex 3) pushes label 3 to vertex 4, which has a bigger label than vertex 3, and pulls labels of vertices 1 and 2, which have smaller labels than vertex 3. The label update in all three operations is monotonic, meaning that the labels being updated by these three operations never increase. Because of this, the order of visiting vertex 1 and vertex 2 does not matter, because the label of vertex 3 is going to end up with the smaller label among vertex 1 and vertex 2 at the end of the neighborhood traversal.

Fig. 2.

Fig. 2. An illustration of labels update on vertex 3 and its neighbors using push, pull, and hook, respectively. Each vertex starts with its own vertex ID as the initial label. The arrow in the figure represents the direction of the label propagation. The subfigures at top are the labels of vertices at the beginning of each operation. The subfigures at bottom are the labels of the vertices at the end of each operation.

If we combine push (Algorithm 5) with pull (Algorithm 6) for connected component decomposition, we end up with Algorithm 8, which is equivalent to the hook operation in the SV algorithm. Based on the observations we have made about the relationships among the operations push, pull, and hook, the SV algorithm is indeed an LP-based algorithm for connected component decomposition. Therefore, any optimizations/heuristics that are proposed for the SV algorithm are also applicable to LP-based algorithms.

5.3 Combining push, pull, and hook with Compaction Methods

Recent research [21] proposed randomized concurrent algorithms for disjoint set union, which combines the compaction methods with find operations. There are three classical compaction methods: path compression, path splitting, and path halving. The works [12, 71] have shown that the hook operation can also be combined with various compaction methods. With these compaction methods, the authors have shown that SV can take fewer iterations to reach convergence. We refer to this method as hookAndCompress.

Adopting the same idea, we propose to combine one of the compaction methods with our push and pull operations. We show in Algorithm 9 the pull operation combined with full path compression; the algorithm is referred to as pullAndCompress. Similarly, the algorithm pushAndCompress combines push with full path compression; its pseudocode is omitted for space considerations. Algorithm pullAndCompress pulls the smallest label among \(v\)’s neighbors, then updates the labels of the labeling path until its root is the smallest label met along the path. Similarly, pushAndCompress pushes \(v\)’s label to its neighbors as well as every vertex on the labeling path until the root, only if its label is smaller. Previously, without path compression, each label could only propagate one hop per iteration. With full path compression, each label can propagate O(\(logn\)) hops per iteration, where O(\(logn\)) is the depth of the labeling path from the root to the leaf. In the best case, such as a line graph, each label can propagate O(\(D\)) hops per iteration, where \(D\) is the diameter of the graph, making the LP-based algorithm with path compression converge in one iteration. Other compaction methods can also be combined with the push and pull operations. We leave this as the future work. Because the full path compression will flatten the connected component labeling tree into a star, we expect that pullAndCompress/pushAndCompress would require fewer iterations to reach convergence compared to the basic push and pull operations.

5.4 Slow Convergence for Large-diameter Graphs

A known issue of LPCC is its slow convergence for large-diameter graphs [57], where the diameter \(D\) of the graph is a large integer. Recall that the goal of finding connected components in a graph using Label Propagation is to propagate the minimum label of each component to its members. At the end of the algorithm, every vertex having the same label belongs to the same component. For LPCC, each label is propagated one hop per iteration. For any graph, the minimum label requires O(\(D\)) steps to propagate from one end of the largest component to the other. Therefore, a basic LPCC would converge in O(\(D\)) iterations, resulting in O(\(D(m+n)\)) total work. In the worst case, such as in line graphs, where the diameter is \(n\), LPCC converges in O(\(n\)) iterations.

One solution to address this issue is to contract the original graph repeatedly. However, contraction would mutate the input graph. We are interested in solutions that do not mutate the input graph. Another solution is to use a compress operation in the SV algorithm. Stergio et al. [57] introduce this in their distributed LPCC algorithm as “shortcut.” They have proven this would guarantee LPCC to converge in O(\(logn\)) iterations with O(\(mlogn\)) total work. We implement this approach in our algorithms to speed up convergence (see Algorithm 10).

5.5 Skipping the Largest Component in Scale-free Graphs

In scale-free graphs, the degree distribution follows a power law, resulting in a large number of vertices in the largest component [59]. Based on this observation, Afforest proposes to use subgraph sampling (we use a similar technique in approximate seeding strategy for community detection discussed in Section 4.4) to identify the largest component. In the later stages of hooking, Afforest skips the rest of the unprocessed edges in the largest component. This would either defer edge processing or skip it altogether. It first uses the neighbor sampling method to generate a subgraph. Within each neighborhood, a fixed amount of neighbors are randomly selected (for example, two neighbors are selected in Afforest). In this way, a subgraph consisting of O(\(n\)) random edges is created. This subgraph is a forest consisting of spanning trees of each connected component. Next, within the subgraph, a fixed number of component labels is selected (for example, 1,024 labels are randomly selected in Afforest) among the intermediate component labels. The most frequent label is the component ID of the largest component in the graph. Assume that the largest component contains \(m_L\) edges and the subgraph sampling phase processes \(m_{process}\) edges. It has been shown that by skipping the largest component, the connected component algorithm only processes \(m_{process} + m - m_L\) edges [12]. In a scale-free graph, \(m_L\) contains a substantial amount of edges.

We adopt the subgraph sampling heuristic to find the largest component ID as shown in Algorithm 11. There are two steps in the subgraph sampling: generate subgraph (via push or pull) and sample the most frequent label (the largest component ID). Since there are two ways to propagate the labels in DOLPA, we propose two ways to generate the subgraph in Algorithm 11, one using pull and the other using push. The input parameter \(neighbor\_rounds\) controls how many neighbors are randomly selected in each sampling step. In practice, we do not randomly select \(neighbor\_rounds\) neighbors, but select the first \(neighbor\_rounds\) neighbors of a vertex in the subgraph generation.

5.6 DOLPA with Subgraph Sampling, Path Compression, and Compress

With the above optimizations, we present our DOLPA set of algorithms for connected component decomposition as shown in Algorithm 12. Our algorithms adopt a combination of push and pull (with or without path compression) to propagate labels, incorporate the compress operation to speed up the convergence rate (especially for large-diameter graphs), and include a subgraph sampling operation to skip the largest component for scale-free graphs.

Skip 6EXPERIMENTAL EVALUATION: CONNECTED COMPONENT DECOMPOSITION PERFORMANCE Section

6 EXPERIMENTAL EVALUATION: CONNECTED COMPONENT DECOMPOSITION PERFORMANCE

In this first experimental section, we evaluate the speed of convergence as well as the runtime performance obtained by our connected component algorithms in comparison with other parallel connected component algorithms listed in Table 2. We use the synthetic graphs and real-world graphs listed in Table 3 for the evaluation. We also implement the frontier-based LPCC using push and hook to evaluate the cost of creating an active frontier in each iteration.

Table 2.
NotationDescription
PULLpull (Algorithm 6)
PUSHpush (Algorithm 5)
HOOKhook (Algorithm 8)
PUSHFpush, and Frontier based
HOOKFhook, and Frontier based
PSPLCCGenerateSubgraphviapush, pull, Path Compression, and Compress (Algorithm 12)
PLPSCCGenerateSubgraphviapull, push, Path Compression, and Compress (Algorithm 12)
PSPSCCGenerateSubgraphviapush, push, Path Compression, and Compress (Algorithm 12)
PLPLCCGenerateSubgraphviapull, pull, Path Compression, and Compress (Algorithm 12)
SVSV algorithm (our rewriting) [50]
AfforestSV algorithm with subgraph sampling [59]
  • Notations of different algorithms are generated with the combinations of different setups and techniques applied.

Table 2. Connected Component Decomposition Algorithms Studied

  • Notations of different algorithms are generated with the combinations of different setups and techniques applied.

Table 3.
InputDescription\(|V|\)\(|E|\)Ref.
road_usaUS road network1M9.5M [4]
europe_osmEuropean Open Street Map road networks51M108M [4]
rgg_n_2_24_s0Random geometric graphs17M265M [11]
kron_g500-logn20Synthetic graphs from the Graph500 benchmark1M89M [11]
uk-20052005 crawl of the .uk domain performed byUbiCrawler39M936M [8]
LiveJournalLiveJournal social network4M69M [66]
OrkutOrkut social network3M234M [66]

Table 3. Synthetic and Real-world Graphs for Performance Evaluation

6.1 Experimental Setup

For connected component decomposition experimental evaluation, our experiments are run on a machine with a two-socket Intel Xeon Gold 6230 processor, having 20 physical cores per socket, each running at 2.1 GHz, and 28 MB L3 cache. The system has 188 GB of main memory. Our code is compiled with GCC 10.2 compiler and -Ofast -march=native compilation flags. We used OpenMP 4.0 for parallelization with guided scheduling.

6.2 Dataset

We conducted real-world experiments from various domains and synthetic graphs, listed in Table 3. We selected two road networks from the United States and Europe from the 10th DIMACS Implementation Challenge [4], two synthetic random graphs [11], one web graph [8, 11], and two social network datasets obtained from the Stanford Large Network Dataset Collection (SNAP) [33]. The road networks are large-diameter graphs. The social networks are scale-free graphs.

6.3 Speed of Convergence and Runtime Performance

In this subsection, we present the runtime results and the number of iterations each algorithm in Table 2 takes to converge on the datasets in Table 3. We omit the first iteration at the beginning of the algorithm when each vertex initializes its label to be its vertex ID. We set the \(neighbor\_rounds\) parameter to two when generating subgraphs; i.e., we propagate two labels in each vertex’s neighborhood. All the results are obtained as the average of the runtime divided by the number of iterations of five runs using 40 threads.

In Table 4, we can see that our algorithm (PSPSCC) is 3.7 to 2,303.2 times faster than the basic LP-based algorithm (HOOK), especially on large-diameter datasets (road networks). DOLPA is 4.8 to 13.2 times faster than the SV algorithm, and 1.1 to 1.6 times faster than Afforest.

Table 4.
Algo.Road NetworkRandomWebSocial Network
road_usaeurope_osmrgg_n_2_24_s0uk-2005LiveJournalOrkut
Time/s#It.Time/s#It.Time/s#It.Time/s#It.Time/s#It.Time/s#It.
PULL314.9473,071353.2324,03128.59628719.230790.43270.2443
PUSH346.1393,071335.5074,03433.36028840.6812470.48970.3613
HOOK198.2771,613313.2843,86019.820746.866110.32550.1802
PUSHF128.7546,259550.42317,32320.56216646.4361990.13680.1703
HOOKF113.5516,250552.08317,30924.50718780.928150.12660.1632
PSPLCC0.10930.17730.06940.597350.02040.0174
PLPSCC0.10330.17330.06830.33930.08730.1253
PSPSCC0.08630.15430.06430.51830.01730.0143
PLPLCC0.13030.16920.06510.844340.08810.1171
SV0.76891.40970.84961.62670.15430.2002
Afforest0.09730.19630.06930.38130.02730.0193
  • The best runtime results are in bold. As we can see, the subgraph sampling does not reduce the number of iterations to reach convergence. Path compression dramatically reduces the number of iterations. The compress operation at the end of each iteration also reduces the number of iterations greatly.

Table 4. Runtime and Number of Iterations Obtained on the Connected Component Algorithms in Table 2 Using the Datasets from Various Domains in Table 3

  • The best runtime results are in bold. As we can see, the subgraph sampling does not reduce the number of iterations to reach convergence. Path compression dramatically reduces the number of iterations. The compress operation at the end of each iteration also reduces the number of iterations greatly.

(i) Subgraph Sampling. Subgraph sampling works well when combining with pushAndCompress, pullAndCompress, and hookAndCompress. If we compare PSPSCC with push (or PLPLCC with pull) in Table 4, we can see that the number of iterations as well as the runtime of PSPSCC is greatly reduced. With path compression, in GenerateSubgraphViapushAndCompress, each vertex propagates a label that now travels O(\(logn\)) hops. The newly constructed subgraph consists of neighbor_rounds\(\times logn\) edges, as well as a large intermediate component and a number of small components. Many vertices within each component hold the minimum component label. When sampling the most frequent label within this subgraph, the task is most likely going to detect the largest component of this graph.

(ii) Path Compression and the compress Operation. Path compression dramatically reduces the number of iterations, especially on large-diameter graphs. In Table 4, comparing PSPSCC with push, which does not apply path compression or the compress operation, PSPSCC has approximately 0.065% of the number of iterations of push on the road_usa graph. The path compression makes the LP-based algorithm propagate labels more than one hop per iteration. In fact, with full path compression, each label is propagated O(\(logn\)) hops per iteration, which is the depth of the labeling tree within each component.

(iii) Direction Optimization. To evaluate the effect of direction optimization (switching between push and pull operations), we compare PSPLCC, PLPSCC, PSPSCC, and PLPLCC. The method PSPSCC performs best overall, while PLPSCC gives the best performance on Web. In general, we find that direction optimization does not give connected component algorithms substantial performance benefits. This is because we are propagating labels conditionally; i.e., we only propagate the (minimum) label if the label is smaller than others. This means that if there exists only one label that is smaller than \(v\)’s label in \(v\)’s neighborhood, both push and pull would visit all neighbors but propagate one label per operation. Switching between push and pull does not contribute to fewer processed edges or more efficient label propagation. However, push is more efficient in propagating labels (with less amortized cost per label propagation) compared with pull; that’s why PSPSCC has the best overall performance.

(iv) Frontier. The frontier-based (PUSHF vs. PUSH, HOOKF vs. HOOK) algorithms are noticeably faster than the non-frontier-based ones (except for europe_osm), even though the number of iterations increases. However, after we apply the frontier to DOLPA (not reported in Table 4), we find that the frontier-based ones are always slower than their non-frontier-based variants. This is because path compression and the compress operation substantially reduce the number of iterations to converge. The overhead of creating a shared frontier and making sure there is no duplicate entry in it overcompensates its benefit, which only processes the vertices with recently updated labels.

6.4 Strong Scaling

Figure 3 shows the runtime scaling results of four parallel implementations of connected component algorithms on the real-world networks listed in Table 3. In the plot, PSPSCC has the best performance on four input graphs, followed by PSPLCC. push-based has better performance than pull-based on social networks. Our algorithms make the basic implementation more versatile and more adaptable to different types of graphs, as our implementations show no performance degradation on the type of graphs, ranging from road to random to social graphs.

Fig. 3.

Fig. 3. Strong scaling results on up to 40 threads on four graphs for the connected component algorithm variants listed in Table 2. The variant PSPSCC has the best performance on four graphs. PSPLCC has close to the best performance.

Skip 7EXPERIMENTAL SETUP: COMMUNITY DETECTION Section

7 EXPERIMENTAL SETUP: COMMUNITY DETECTION

In this and the subsequent two sections, we present various experimental evaluation results around community detection. This section presents the experimental setup and datasets. Section 8 focuses on evaluation of the proposed seeding strategies. Section 9 deals with evaluating the performance and scalability of DOLPA in comparison with other community detection methods.

We use the Lancichinetti-Fortunato-Radicchi (LFR) benchmark [30] with ground-truth communities to study the behavior of DOLPA and real-world graphs to evaluate the quality of solution and performance. The LFR benchmark is the most commonly used graph generator to evaluate community detection algorithms. We compare our implementation with PLP, and with the state-of-the-art parallel Louvain method implementation in Grappolo [40]. We begin the present section by detailing the experimental setup and the dataset used in this article. We quantify the quality of solutions (of community detection) using Precision, Recall, and F-Score (or F-Measure). We omit definitions for these metrics; for details on definitions, please refer to [39].

7.1 Experiment Platform

We used a system having one node with two Intel Xeon E5-2699v3 processors operating at 2.3 GHz. The system has 18 physical cores per socket, 72 logical cores per node with hyper-threading, 48 MB L3 cache per processor, and 128 GB total main memory. We used GCC 9.3.0 compiler with the -O3 compilation option to build the codes. We used OpenMP 4.0 for parallelization with guided scheduling.

7.2 Datasets

We generated two groups of the LFR benchmarks—Big and Small: the community size in the Big group ranges from 20 to 200, and the community size in the Small group ranges from 10 to 100. The fraction of overlapping vertices is 10, and the number of memberships of the overlapping vertices is 2. We generated six benchmarks in the Big group with mixing parameter values of \(\mu =\lbrace 0, 0.1, 0.3, 0.5, 0.7, 0.9\rbrace\) and two benchmarks in the Small group with \(\mu =\lbrace 0.1, 0.3\rbrace\). The mixing parameter \(\mu\) of the LFR benchmark indicates the amount of noise in the network, as it controls the fraction of edges that are between communities. The higher the mixing parameter, the more difficult it is for the algorithm to detect communities.

Each benchmark was generated with 1 million vertices, 9.5 million edges, average degree 20, and max degree 100. Both of the Big and Small groups have the same parameters except the mixing parameter. The synthetic graphs generated using the LFR benchmark are listed in the top portion in Table 5. The generating scripts for LFR benchmarks, including the parameters to generate the benchmarks, are provided in the Gitlab repo.1

Table 5.
InputDescription\(|V|\)\(|E|\)\(\Delta _{v}\)Ground TruthRef.
B0Generate using the LFR benchmark with \(\mu =0\)1M9.5M100Yes [30]
B1Generate using the LFR benchmark with \(\mu =0.1\)1M9.5M100Yes [30]
B3Generate using the LFR benchmark with \(\mu =0.3\)1M9.5M100Yes [30]
B5Generate using the LFR benchmark with \(\mu =0.5\)1M9.5M100Yes [30]
B7Generate using the LFR benchmark with \(\mu =0.7\)1M9.5M100Yes [30]
B9Generate using the LFR benchmark with \(\mu =0.9\)1M9.5M100Yes [30]
S1Generate using the LFR benchmark with \(\mu =0.1\)1M9.5M100Yes [30]
S3Generate using the LFR benchmark with \(\mu =0.3\)1M9.5M100Yes [30]
LLLow block overlap and low block size variation1M24M122Yes [23]
LHLow block overlap and high block size variation1M24M137Yes [23]
HLHigh block overlap and low block size variation1M24M104Yes [23]
HHHigh block overlap and high block size variation1M24M180Yes [23]
fbntFacebook network4M24M4,915No [47]
dblpCoauthor-ship from DBLP0.5M15M3,299No [4]
zbrpZhishi Baidu related pages416K2.4M127,090No [27]
condCondensed matter collaborations40K176K278No [4]
  • The number of vertices (\(|V|\)) and edges (\(|E|\)) along with the maximum degree (\(\Delta\)) for the inputs are tabulated here.

Table 5. Synthetic and Real-world Graphs for Performance and Quality of Solution Evaluation

  • The number of vertices (\(|V|\)) and edges (\(|E|\)) along with the maximum degree (\(\Delta\)) for the inputs are tabulated here.

The middle portion of Table 5 lists synthetic graphs we downloaded as images from the 2019 Stochastic Block Partitioning Graph Challenge [23]. The bottom portion of the table lists the real-world graphs in our testbed. Since these graphs do not have ground-truth communities, we obtained the community structure data from the fast-tracking resistance (FTR) algorithm as ground-truth data. The FTR algorithm is a hierarchical multiresolution method to overcome the resolution limit of a community detection algorithm in a complex network [17]. It is not true ground truth, but we use it as a simple reference.

7.3 Parameter Choice for pushpull Switch

In results reported in detail in previous work [39], we designed a microbenchmark to study the behaviors of push and pull. Based on the results of our microbenchmark, we find the best switch threshold \(\omega\) for DOLPA. To achieve the best runtime with reasonable quality of solution, we set \(\omega = 2\). To obtain the best quality of solution in reasonable runtime, we set \(\omega = 1\). Note that when \(\omega =1\), DOLPA does pull only for label propagation.

Skip 8EXPERIMENTAL EVALUATION: SEEDING STRATEGIES AND PARAMETERS Section

8 EXPERIMENTAL EVALUATION: SEEDING STRATEGIES AND PARAMETERS

We present in this section results of experiments conducted to understand the behavior and performance of the proposed seeding strategies in conjunction with the choice of the seeding parameter \(\tau\).

8.1 Experimental Design

For each of the nine seeding strategies, we run DOLPA on the LFR benchmark Big (B0–B9) group and collect the average of their runtime and F-Score. While we report results from only one of three groups of the benchmarks, we note that similar results were observed in the other two groups.

To determine the seeding parameter \(\tau\) that produces the best performance for each seeding strategy, we do a grid search on \(\tau\) through a manually specified subset of \(\tau\) where the lower bound is selecting one vertex as seed and the upper bound is selecting the whole \(V\) as seeds. We provide the subset of \(\tau\) as follows.

Let the density of the graph be defined as \(\delta _G = \frac{2|E|}{|V|(|V|-1)}\) and the average degree of the graph be defined as \(\overline{\rm d}_G = \frac{2|E|}{|V|}\). We run DOLPA with \(\tau =\lbrace 1e-6, 0.025, 0.05, 0.1, 0.2, 0.4, 0.6, 0.8, 1, \frac{1}{\overline{\rm d}_G}, \frac{2}{\overline{\rm d}_G}, \delta _G\rbrace\). We do so 5 times for each input. Considering that selecting one vertex as seed out of 1 million vertices and running this for 5 times may not fairly reveal the actual results, we run DOLPA 20 times when \(\tau =1e-6\). Notice that for the benchmarks we generated (B0–B9), \(\frac{1}{\overline{\rm d}_G} \approx 0.05\), \(\frac{2}{\overline{\rm d}_G} \approx 0.1\).

8.2 Runtime and Quality of Solution Results

We collect two metrics of performance: runtime and F-Score. Figure 4 shows runtime (in log scale) versus mixing parameter results, and Figure 6 shows F-Score versus mixing parameter results for the nine seeding strategies under various values of \(\tau\). We also plot runtime versus the seeding parameter \(\tau\) of the nine seeding strategies under various LFR benchmarks; these are shown in Figure 5. Each subplot in Figure 5 has the same mixing parameter. In the experiments that led to results in Figure 6, similar patterns were observed for \(\tau = 0.2, 0.4, 0.6, 0.8, 1.0\). Hence, we plot only for \(\tau =0.2\). We do not report the F-Score results for \(\tau =1e-6\) and \(\tau =\delta _G\) because the F-Score results are too low to be meaningful.

Fig. 4.

Fig. 4. Runtime results (in log scale) versus mixing parameter for the nine seeding strategies under various seeding parameters \(\tau\) and various LFR benchmarks. For \(\tau\) between 0.025 and 0.1, we observe that the High-degree seeding strategy achieves the best runtime compared to all other seeding strategies. For \(\tau\) greater than 0.2, there is no clear winner in terms of runtime among all nine seeding strategies.

Fig. 5.

Fig. 5. Runtime versus seeding parameter \(\tau\) results for the nine seeding strategies under various LFR benchmarks generated under different mixing parameters. The mixing parameters \(\mu\) from the top subfigure to the bottom are \(0, 0.3, 0.7\) . A clear pattern can be observed that when \(\tau \gt 0.4\) , every seeding strategy has a dramatically increased runtime within each subfigure. From the top subfigure to the bottom the y-axis of each subfigure is also increasing. While we do not report the F-Score when \(\tau =1e-6\) and \(\tau =0.025\) in Figure 6, we report their runtime results as references.

Fig. 6.

Fig. 6. F-Score versus mixing parameter plots for the nine seeding strategies under various seeding parameters \(\tau\) and various LFR benchmarks. Similar patterns are observed for \(\tau \ge 0.2\) ; hence, we only plot results for \(\tau =0.2\) . Moreover, we do not report the F-Score results for \(\tau =1e-6\) and \(\tau =\delta _G\) because their results are too low.

The approximate seeding strategies (except for highdegreeSD) have much slower runtime results than those of the random and the exact seeding strategies as can be seen in Figure 4. Figure 5 shows that the normalized runtime results of all the strategies when \(\tau \ge 0.6\) are much larger than the corresponding results when \(\tau \le 0.6\). It can also be seen that the runtime of every strategy gets larger when the benchmark graphs become harder for an algorithm to tackle (i.e., the mixing parameter increases).

In Figure 6, the F-Score results of Random, Low-degree, and Low-total-degree seeding strategies deteriorate slower than others when \(\tau\) is between 0.025 and 0.1 and the mixing parameter of the LFR benchmark varies from 0 to 0.7. The F-Score results are similar when \(\tau \ge 0.2\) for all strategies. Low F-Score results are observed in Figure 6 on the approximate strategies when the mixing parameter of the LFR benchmark is 0.

8.3 Analysis

Below we make several important observations from an analysis of the results in Figure 4, Figure 5, and Figure 6.

(i) A tiny \(\tau\) does not produce a reasonable solution. The F-Score results for \(\tau =1e-6\) and \(\tau =\delta _G\) are too low to be reported in Figure 6 even though both cases converge fairly fast in Figure 5. This shows that a tiny \(\tau\) (i.e., only one seed or a few seeds are selected) is not practical for DOLPA to achieve reasonable quality of solution. It is impractical to assume that there are only one or a few community structures in the graphs.

(ii) A large \(\tau\) dramatically increases runtime. Figure 5 shows that the runtime of every seeding strategy increases dramatically when \(\tau \ge 0.6\) (we called this a large \(\tau\)). The approximate seeding strategies are particularly highly impacted by this, especially when the mixing parameter is 0. This indicates that \(\tau\) should not be made greater than 0.6 to achieve reasonable runtime. In particular, we conclude that \(\tau\) should always be smaller than 0.5. Let us elaborate on this. When \(\tau \gt 0.5\), more than 50% of the vertices are treated as seeds of a community structure. This means there exists an edge \(e \in E\) where at least one endpoint of \(e\) has its label updated twice. The larger \(\tau\) is, the more labels are updated redundantly. This results in slow convergence.

(iii) The approximate seeding strategies perform poorly when the sample size is small. When the sample size is small, there is a higher probability that the sampled vertex set or the sampled neighborhood set has a bias over the original graph, either in terms of the vertex degree or the total degree of a neighborhood. Hence, a small \(\tau\) for an approximate seeding strategy often fails to represent the parent graph faithfully. When the sample size is large enough, the sampled vertex set or the sampled neighborhood set can properly approximate the parent graph. This is why when \(\tau \ge 0.2\), the approximate seeding strategies become as robust as the exact seeding strategies and achieve similar F-Scores.

The runtime results of the approximate seeding strategies in Figure 4 are (surprisingly) higher than those of the exact seeding strategies as \(\tau\) increases. As \(\tau\) increases, the advantage of saving time on sorting is overcompensated by the increased time for sampling. In addition, the approximate strategies involve random selection of vertices/neighborhoods during sampling, which in turn results in slower convergence.

(iv) Random, Low-degree, and Low-total-degree seeding strategies are robust. First, we observe that Random, Low-degree, and Low-total-degree seeding strategies are not sensitive to the value of \(\tau\). They have stable F-Score results no matter the value of \(\tau\) (excluding a tiny \(\tau =\) \(1e-6\) or \(\delta _G\)), as can be seen in Figure 6. Second, we notice that these strategies are also not sensitive to noise (for benchmark graphs whose mixing parameter \(\mu \gt 0\)). Their F-Score results do not deteriorate dramatically as the mixing parameter increases, as can be seen in Figure 6. Among these three strategies, Low-degree and Low-total-degree (the top two lines in Figure 6) have better performance than Random.

However, the reasons behind the robustness of the three seeding strategies are different. The Random seeding strategy is robust because it has no bias over the vertices in the graph. Each vertex is selected independently with equal probability. Low-degree and Low-total-degree seeding strategies are robust because the labels of a low-degree vertex or a low-total-degree neighborhood have a higher probability to survive the early propagation stages. A low-degree vertex or a low-total-degree neighborhood has a small number of neighbors, making their labels have fewer competitors. This makes their labels vigorous and thus makes a low-degree vertex or a low-total-degree neighborhood an effective seed.

(v) Noise makes DOLPA converge slower. No matter which seeding strategy is used in Figure 5, DOLPA converges slower (the y-axis scale of each subplot increases) as the mixing parameter \(\mu\) of the LFR benchmark graph increases. It requires more time for DOLPA to get rid of noises when the noises increase. We observe similar linear correlation between the runtime of DOLPA and the mixing parameter in Figure 4 as was observed in Lancichinetti et al. [30].

8.4 What Is a Good Seeding Parameter τ Value?

To find out the best \(\tau\) setting for the best performance (both runtime and quality of solution), we summarize the number of \(\tau\) settings that produce the best runtime and best F-Score results on the graphs LL, LH, HL, HH and the graphs B0–B9 separately in Figure 7. The figure shows that \(\tau =1e-6\) is clearly the winner for producing the best runtime. However, the F-Scores of \(\tau =1e-6\) are too low to be useful. We therefore recommend a larger \(\tau\) for a reasonable combination of runtime and F-Score.

Fig. 7.

Fig. 7. We count the number of cases a specific \(\tau\) value produces the best runtime results (left pane) and best F-Score results (right pane) as \(\tau\) is varied. Results for the graphs LL, LH, HL, and HH (top row) and the graph B0–B9 (bottom row) are plotted separately. The setting \(\tau =1e-6\) produced 16 of the best runtime results out of 36 cases on graphchallenge data, and 36 of the best runtime results out of 54 cases on the LFR graphs. The setting \(\tau =\frac{2}{\overline{\rm d}_G}\) produced 21 (out of 36) of the best F-Scores for the graphs LL, LH, HL, and HH, while the setting \(\tau =1.0\) produced 22 (out of 54) of the best F-Score results for the LFR graphs.

In Figure 7, we notice that the best F-Score results of the LFR graphs and the graphchallenge graphs are achieved at two extremely different \(\tau\) values. Most of the best F-Scores of the graphchallenge graphs cluster around \(\tau =\frac{2}{\overline{\rm d}_G}\), which is a small \(\tau\). In contrast, most of the best F-Scores of the LFR graphs are achieved at \(\tau =1.0\), which is a big \(\tau\). However, the runtime results of \(\tau =1.0\) are extremely high due to redundant label updates.

To conclude, a small \(\tau\) such as \(\tau =\frac{2}{\overline{\rm d}_G}\) gives the best runtime at a reasonable (even best F-Score results) quality of solution; a large \(\tau\) such as \(\tau =1.0\) can provide the best quality of solutions at the cost of slow convergence. With this conclusion, we can adjust \(\tau\) to achieve a balance between runtime and quality of solution.

Skip 9EXPERIMENTAL EVALUATION: COMMUNITY DETECTION RUNTIME AND SOLUTION QUALITY Section

9 EXPERIMENTAL EVALUATION: COMMUNITY DETECTION RUNTIME AND SOLUTION QUALITY

In this final experimental section, we evaluate the quality of solution and runtime obtained by DOLPA in comparison with other community detection methods using the synthetic graphs and real-world graphs listed in Table 5.

Table 6 lists the variants of parallel LPA we study (PLP, PU, and DO) and the Louvain method (LV). The methods PUR and DOR employ the Random seeding strategy. Methods PUH and DOH employ the High-degree seeding strategy, and methods PUL and DOL use the Low-degree seeding strategy. In both PU and DO, we set \(\tau =\frac{2}{\overline{\rm d}_G}\) to achieve good runtime results with reasonable quality of solution. We use the switch threshold \(\omega\) of 1 for PU and that of 2 for DO. Recall that when \(\omega\) is 1, no push operation is applied.

Table 6.
VersionDescription
PLPParallel Label Propagation algorithm [56]
PURDOLPA using pull only and with Random seeding
PUHDOLPA using pull only and with High-degree seeding
PULDOLPA using pull only and with Low-degree seeding
DORDOLPA using push & pull and with Random seeding
DOHDOLPA using push & pull and with High-degree seeding
DOLDOLPA using push & pull and with Low-degree seeding
LVParallel Louvain method implementation [40]

Table 6. Community Detection Algorithms Studied

Each experiment is run 10 times and the average of the results is reported. We omit the scaling and runtime results in this section due to space limits. For details of them, please refer to [39].

9.1 Runtime and Quality of Solution Results

Table 7 shows the F-Score and runtime results on the eight methods listed in Table 6. The results show that DOLPA outperforms PLP as well as the Louvain method in most of the runtime and F-Score results. In particular, DOLPA achieves 8 of the best runtime results out of 11 cases, and 9 of the best F-Score results out of the 11 cases. DOLPA adopting a High-degree seeding strategy (DOH and PUH) achieves five of the best runtime results among eight cases. DOLPA adopting Random seeding strategy (PUR) achieves three of the best F-Score results among the nine best F-Scores, while DOLPA adopting Low-degree seeding strategy (DOL and PUL) achieves six of the best F-Scores among the nine best F-Scores. Compared with PLP, DOLPA achieves at least 2 times the F-Score while maintaining similar runtime for the LFR graphs; DOLPA adopting random seeding strategy has up to 48 times the F-Score on the graph HL. Compared with the Louvain method using the same graphs, the best results achieved by DOLPA have an average of three times the F-Score at a tenth of the runtime. We provide further analysis on the results summarized in Table 7 in the remainder of this subsection.

Table 7.
InputB1S1B3S3LLLHHLHHfnbtdblpcond
PLPTime0.093s0.110s0.108s0.234s0.573s0.291s0.165s0.231s2.668s0.248s0.044s
Fscore0.88670.42640.8240.38710.14320.1660.02010.05590.10380.07510.034
Prec.0.79650.2710.70070.24010.07720.09050.01020.02880.05650.03920.0196
Recall10.999910.999910.99940.999910.92770.90640.431
PUHTime0.121s0.158s0.116s0.166s0.246s0.248s0.337s0.232s1.857s0.217s0.037s
Fscore0.31030.12850.2850.08170.02550.03480.04140.02690.06860.00340.0117
Prec.0.18370.06870.16620.04260.01290.01770.02110.01360.03550.00170.0059
Recall0.99940.99990.99970.9999110.999910.99710.90080.4348
DOHTime0.094s0.256s0.113s0.164s0.316s0.200s0.231s0.173s0.998s0.299s0.024s
Fscore0.1610.060.13040.02930.08770.050.02410.03480.06730.00290.0068
Prec.0.08760.03090.06970.01490.04620.02580.01220.01770.03480.00140.0034
Recall0.99990.999810.9998110.99990.99990.99560.90520.458
PURTime0.158s0.179s0.138s0.172s0.370s0.479s0.504s0.417s2.619s0.193s0.027s
Fscore0.88130.74650.86210.71010.9970.82770.80950.1270.15740.35310.0436
Prec.0.80280.59650.78730.55220.99410.70720.69540.06810.09770.22120.0257
Recall0.97680.99830.95250.994510.999610.99990.89470.87490.4001
DORTime0.106s0.168s0.106s0.181s0.279s0.312s0.319s0.278s2.186s0.301s0.029s
Fscore0.26510.0960.25940.05180.78980.60690.08120.06130.06480.35050.0507
Prec.0.15280.05040.1490.02660.65290.44370.04290.03170.03350.21930.0305
Recall10.999810.999910.99970.99990.99990.98720.87570.4125
PULTime0.126s0.214s0.127s0.166s0.418s1.066s0.630s0.416s2.485s0.463s0.031s
Fscore0.9990.99460.99660.99090.39790.24940.07130.0290.06790.20720.0558
Prec.0.9980.98940.99390.98210.2710.14660.03810.01470.03520.11710.0348
Recall10.99980.99940.999910.99970.996910.98220.92850.4153
DOLTime0.106s0.168s0.178s0.162s0.348s0.297s0.326s0.285s2.118s0.436s0.026s
Fscore0.32230.12110.38150.07640.91980.90030.21350.09840.1340.21550.0425
Prec.0.19220.06450.23580.03970.85190.82180.13180.05190.07750.12190.024
Recall0.99990.999910.999910.99970.99980.99960.88290.92810.4131
LVTime4.224s4.135s5.698s5.038s15.980s21.904s21.748s17.098s13.671s1.002s0.084s
Fscore0.15190.07720.00430.00140.34490.42050.18760.17590.25470.01050.0378
Prec.0.08220.04020.00220.00070.20840.28430.1080.09990.15450.00530.0193
Recall0.99820.99840.98160.98650.99950.80710.71390.73310.72370.97760.935
  • The real-world graphs use as “ground truth” the results obtained from the FTR method [17]. All results are obtained under 64 threads.

  • The input graphs in this table are listed from easy to hard to detect for community detection algorithms. Among the eight methods, the best runtime and the best F-Score results are shown in bold text. The runtime includes pre-processing steps such as degree sorting, frontier insertion, etc., in PU/DO/LV. There are no such steps in the PLP algorithm.

Table 7. Runtime and F-score Results of the Eight Methods Listed in Table 6 on Eight Synthetic Graphs with Ground-truth Information and Three Real-world Graphs

  • The real-world graphs use as “ground truth” the results obtained from the FTR method [17]. All results are obtained under 64 threads.

  • The input graphs in this table are listed from easy to hard to detect for community detection algorithms. Among the eight methods, the best runtime and the best F-Score results are shown in bold text. The runtime includes pre-processing steps such as degree sorting, frontier insertion, etc., in PU/DO/LV. There are no such steps in the PLP algorithm.

(i) Frontier. The use of frontier in DOLPA is beneficial. A frontier can process the vertices in any order desired. During the initialization of DOLPA, the seeds are added to the initial frontier so that the labels of seeds can be propagated first. This makes the labels of the seeds have a higher probability of forming a strong community core without being eliminated. As can be observed from the experiments, PLP and PU have similar numbers of label updates, propagated steps, and processed edges. The high Precision of PU shows that PU is more efficient and accurate in finding the “right” maximum label. In Table 7, the variants PUL and PUR respectively have 2.3 and 2.5 times the F-Score of PLP on the graphs B3 and S3. The method PUR obtains more than 10 times the F-Score of PLP on the graphs LL and HL. The method PUL has comparable runtime with that of PLP. Note that the runtime of PUL includes steps such as degree sorting and frontier insertion.

In addition, during the later iterations, frontier guarantees that DO only processes nodes that were activated in the previous iteration. On the contrary, PLP also processes nodes that were activated in the current iteration if they are visited after being activated. Doing so is harmful to the quality of the solution, because the importance of the seeds’ labels is overlooked. Further, PLP only deactivates nodes that were not moved, while DO always deactivates a processed node.

(ii) Seed Vertices. The “right” seeds improve accuracy. Seeds propagate before others in the first iteration. With a unique label initially, each label is a maximum label in the first iteration. When a seed applies pull for the first time, it abandons its own label by randomly selecting a maximum label in its neighborhood. This promotes that label as the “true” maximum label because it now appears twice in the neighborhood. If the label of the seed chosen survives in the next few iterations, a core of the community is formed. With the “right” choice on seeds, the likelihood of the seeds to form a strong and stable core is higher than other vertices. This shows that updating more important vertices in the network earlier than others is effective [64].

Conversely, the “wrong” seeds decrease accuracy. The method DOH has the worst F-Score in most instances in Table 7. Pushing labels of high-degree vertices to their neighbors contributes negatively to the quality of the solution even though this almost guarantees a good runtime. The method DOR has less chance to “poison” other communities’ labels when DOLPA uses random seeds instead of high-degree seeds. The method DOL behaves similarly. This is reflected in the much greater F-Score value of the methods DOR and DOL compared to the method DOH.

(iii) Direction Optimization. Direction optimization provides a tradeoff between time and accuracy by adjusting the switch threshold \(\omega\) to obtain a balance of push and pull operations. When selecting the seeds in the same strategy, DO has shorter runtime than PU; DOH is faster than PUH; and DOR is faster than PUR. Compared with PLP, the method DOR has an average 50% runtime decrease on the LFR benchmarks and DOH has an average 15% runtime decrease on the graph-challenge graphs. With a better seed selection, DOR has an average of 14 times F-Scores compared to PLP on the graph-challenge datasets.

Hence, we can adjust \(\omega\) for a good balance in different computation scenarios. The method DO fits best for a time-sensitive scenario. An appropriate switch threshold reduces work by applying a certain amount of the push operations on seeds, thus providing higher performance. However, push is harmful to the quality of solution in general since it forces an undesirable label choice to all neighbors. The amortized cost for each label update of push is constant, while the cost of pull is O(\(d(v)\)). The method PU is well suited for a precision-demanding scenario, where the switch threshold can be set as small as one so that there is no push operation in pursuit of higher accuracy.

Skip 10RELATED WORK Section

10 RELATED WORK

Community detection. Since LPA was first introduced for community detection, many other works that improve or extend it have been proposed. We discuss only a few here. Xie and Szymanski introduce a method to “stabilize” LPA by eliminating the need for tie breaking [63]. Other works alleviate the randomness of tie breaking with other node preference or edge preference methods instead of treating each node/edge with the same preference. Preference methods studied include \(k\)-shell value [64], local cycle [70], and modularity [5, 37]. Leung et al. use hop weight to prevent the occurrence of a “monster” community [34]. Yet other common measures for node preference include degree centrality and clustering coefficient [54]. The works using node preferences are proven to produce a deterministic solution, but the approaches have high computational cost and/or evaluation metric bias.

The label propagation algorithm can be made faster by maintaining all label information in memory instead of computing on the fly [13, 18]. But this approach requires data synchronization for the label information and it is not scalable. The state-of-the-art parallel LPA is the PLP algorithm implemented on multi-core architecture [56]. Other works parallelize LPA on multi-core [28], on GPU [26, 55], in the Map-Reduce model [69], and in distributed memories [3]. Liu et al. [38] also combine LPA with the direction optimization technique, but their work is implemented in distributed memory with an active-message-based runtime system.

The quality of solution produced by LPA can be improved by selecting influential nodes [72] or community kernels [35] in the pre-processing phase and then growing community structures from them. This method is similar to a standard of method in overlapping community detection called seed set expansion. Some works select maximal cliques as seeds [45, 49], some other works select high degree vertices [42, 61], and yet others select random vertices [31]. Stoica et al. [58] did a comprehensive study on seed set selection in the context of social influence maximization.

Direction optimization has been applied in other graph algorithms, including PageRank [62], Betweenness Centrality [41], Connected Components [19], and Single Source Shortest Path [10] and in graph frameworks such as Polymer [68] and Ligra [51]. Besta et al. [7] study push-pull dichotomy in graph computations in terms of performance, speed of convergence, and code complexity. Tithi et al. [60] propose a push-pull-based Louvain method that can prune a significant amount of edges to speed up performance.

Connected component decomposition. Shiloach and Vishkin [50] introduced the first parallel connected components algorithm on the PRAM model. The algorithm relies on the disjoint-set data structure based on the union-find algorithm. Afforest [59] is a variant of this approach but focusing on small-world graphs. It identifies the dominant component ID via subgraph sampling, then skips those vertices residing in the component with the dominant component ID. LACC [1, 71] is the latest distributed implementation of fast SV using linear algebra operations in GraphBLAS.

The minimum label propagation approach is another parallel method [44, 65]. Each vertex is initialized to its vertex ID and holds as a label of its component ID. Repeatedly, each vertex updates its label to the smallest label among its neighbors until no further update is possible. In the end, the smallest label in each component serves as the unique component ID.

Given a graph \(G\), let \(D\) denote the diameter, i.e., the maximum distance in \(G\), where distance is the length of the shortest path between two vertices. Paul Burkhardt [9] introduces a method that works in log-\(D\) steps and \(O((m+n)log D)\) work with \(O(m+n)\) processors using label propagation in the PRAM model that does not require pointer operation. Andoni et al. [2] proposed a \(O(logDloglog_{m/n}n)\) time connectivity algorithm for diameter-D graphs using \(\Theta (m)\) total memory in the massive parallel computing (MPC) model. Liu and Tarjan [36] introduced a method that works in \(O(logn)\) steps and sends \(O(mlogn)\) total messages in the MPC model. Stergiou et al. [57] implement this approach with shortcutting in the bulk synchronous parallel (BSP) model. Label-propagation-based methods are implemented in graph processing frameworks such as Ligra [51], PEGASUS [22], and GraphChi [29].

Shun et al. [52] implement the breadth-first search approach for finding connected components in the PRAM model. Each unprocessed vertex is processed in parallel BFS and marks all the reached vertices as the same component. This approach runs in \(O(m)\) time. ConnectIt [12] studies all the above connected component decomposition algorithms in depth and summarizes several subgraph sampling strategies in their framework.

Skip 11CONCLUSION Section

11 CONCLUSION

We presented a new Label Propagation algorithm, called Direction-optimizing Label Propagation Algorithm (DOLPA), for graph structure detection and showed its efficacy as a method for community detection and connected component decomposition in networks. We introduced a new label update heuristic called push and abstracted the currently known label update operation as pull. The algorithm applies push for label update in the early iterations and switches to pull for label update in later iterations. We incorporated several heuristics for connected component decomposition and combined them with push and pull. Using a carefully designed microbenchmark, we analyzed the characteristics of push and pull. We proposed a total of nine seeding strategies and extensively studied their performance.

We validated our implementation on benchmarks with known ground truth and demonstrated increased accuracy and decreased runtime compared to the state-of-the-art parallel implementations. The time-to-solution/quality-of-solution tradeoff that our algorithm provides (a combination of seeding parameter \(\tau\) and switching threshold \(\omega\)) enables effectively addressing many community detection scenarios. For fast community detection, push saves time, but too many push operations can harm the precision; also, a small \(\tau\) guarantees the best runtime. For accurate community detection, pull is precise but can be costly; a large \(\tau\) will most likely produce reasonable quality of solution.

We investigated our algorithm for finding connected components on datasets from various domains and demonstrated orders of magnitudes speedup over the basic LP-based algorithm, up to 13.2 speedup over the SV algorithm, and competitive performance to the Afforest algorithm.

Footnotes

REFERENCES

  1. [1] Azad A. and Buluç A.. 2019. LACC: A linear-algebraic algorithm for finding connected components in distributed memory. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS’19). Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Andoni A., Song Z., Stein C., Wang Z., and Zhong P.. 2018. Parallel graph connectivity in log diameter rounds. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS’18). 674685. Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Attal J., Malek M., and Zolghadri M.. 2019. Parallel and distributed core label propagation with graph coloring. Concurrency and Computation: Practice and Experience 31, 2 (2019), e4355. Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Bader D. A., Kappes A., Meyerhenke H., Sanders P., Schulz C., and Wagner D.. 2017. Benchmarking for Graph Clustering and Partitioning. Springer New York, New York, NY, 111. Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Barber M. J. and Clark J. W.. 2009. Detecting network communities by propagating labels under constraints. Phys. Rev. E 80, 2 (Sept.2009), 026129. Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Beamer S., Asanović K., and Patterson D.. 2013. Direction-optimizing breadth-first search. Scientific Programming 21, 3–4 (2013), 137148. Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Besta M., Podstawski M., Groner L., Solomonik E., and Hoefler T.. 2017. To push or to pull: On reducing communication and synchronization in graph computations. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing (HPDC’17). ACM, 93104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Boldi P. and Vigna S.. 2004. The webgraph framework I: Compression techniques. In Proceedings of the 13th International Conference on World Wide Web (WWW’04). Association for Computing Machinery, New York, NY, 595602. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Burkhardt P.. 2021. Graph connectivity in log steps using label propagation. Parallel Processing Letters 31, 4 (2021), 2150021. Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Chakaravarthy V. T., Checconi F., Murali P., Petrini F., and Sabharwal Y.. 2017. Scalable single source shortest path algorithms for massively parallel systems. IEEE Transactions on Parallel and Distributed Systems 28, 7 (2017), 20312045. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Davis T. A. and Hu Y.. 2011. The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38, 1, Article 1 (2011), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Dhulipala L., Hong C., and Shun J.. 2020. ConnectIt: A framework for static and incremental parallel graph connectivity algorithms. Proceedings of the VLDB Endowment 14, 4 (2020), 653667. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Fiscarelli A. M., Brust M. R., Danoy G., and Bouvry P.. 2019. A memory-based label propagation algorithm for community detection. In Complex Networks and Their Applications VII (Studies in Computational Intelligence). Springer International Publishing, 171182. Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Fortunato S.. 2010. Community detection in graphs. Physics Reports 486, 3 (Feb.2010), 75174. Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Gao Y., Yu X., and Zhang H.. 2020. Uncovering overlapping community structure in static and dynamic networks. Knowledge-based Systems 201–202 (2020), 106060. Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Garza S. E. and Schaeffer S. E.. 2019. Community detection with the label propagation algorithm: A survey. Physica A: Statistical Mechanics and its Applications 534 (2019), 122058. Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Granell C., Gómez S., and Arenas A.. 2012. Hierarchical multiresolution method to overcome the resolution limit in complex networks. International Journal of Bifurcation and Chaos 22, 7 (2012), 1250171. Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Hosseini R. and Azmi R.. 2015. Memory-based label propagation algorithm for community detection in social networks. In 2015 The International Symposium on Artificial Intelligence and Signal Processing (AISP’15). IEEE, 256260. Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Jaiganesh J. and Burtscher M.. 2018. A high-performance connected components implementation for GPUs. In Proceedings of the 27th International Symposium on High-performance Parallel and Distributed Computing (HPDC’18). ACM, 92104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Jájá J.. 1992. An Introduction to Parallel Algorithms. Addison-Wesley.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Jayanti S. V. and Tarjan R. E.. 2016. A randomized concurrent algorithm for disjoint set union. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing (PODC’16). Association for Computing Machinery, 7582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Kang U., Tsourakakis C. E., and Faloutsos C.. 2009. PEGASUS: A peta-scale graph mining system implementation and observations. In Proceedings of the 2009 9th IEEE International Conference on Data Mining (ICDM’09). IEEE Computer Society, 229238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Kao E., Gadepally V., Hurley M., Jones M., Kepner J., Mohindra S., Monticciolo P., Reuther A., Samsi S., Song W., Staheli D., and Smith S.. 2017. Streaming graph challenge: Stochastic block partition. In 2017 IEEE High Performance Extreme Computing Conference (HPEC’17). IEEE, 112. Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Kloumann I. M. and Kleinberg J. M.. 2014. Community membership identification from small seed sets. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, 13661375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Klusowski J. M. and Wu Y.. 2018. Counting motifs with graph sampling. In Proceedings of the 31st Conference On Learning Theory(Proceedings of Machine Learning Research, Vol. 75), Bubeck S., Perchet V., and Rigollet P. (Eds.). PMLR, 19662011. http://proceedings.mlr.press/v75/klusowski18a.html.Google ScholarGoogle Scholar
  26. [26] Kozawa Y., Amagasa T., and Kitagawa H.. 2017. GPU-accelerated graph clustering via parallel label propagation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM’17). ACM, 567576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Kunegis J.. 2013. KONECT: The Koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13 Companion). ACM, New York, NY, 13431350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Kuzmin K., Chen M., and Szymanski B. K.. 2015. Parallelizing SLPA for scalable overlapping community detection. Scientific Programming 2015 (2015), 4:4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Kyrola A., Blelloch G., and Guestrin C.. 2012. GraphChi: Large-scale graph computation on just a PC. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI’12). USENIX Association, Hollywood, CA, 3146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Lancichinetti A., Fortunato S., and Radicchi F.. 2008. Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78, 4 (Oct.2008), 046110. Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Lancichinetti A., Radicchi F., Ramasco J. J., and Fortunato S.. 2011. Finding statistically significant communities in networks. PLOS ONE 6, 4 (2011), e18961. Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Leiserson C. E. and Schardl T. B.. 2010. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’10). ACM, 303314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Leskovec J. and Krevl A.. 2016. SNAP datasets: Stanford large network dataset collection; 2014. http://snap.stanford.edu/data (2016).Google ScholarGoogle Scholar
  34. [34] Leung I. X. Y., Hui P., Liò P., and Crowcroft J.. 2009. Towards real-time community detection in large networks. Phys. Rev. E 79, 6 (2009), 066107. Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Lin Z., Zheng X., Xin N., and Chen D.. 2014. CK-LPA: Efficient community detection algorithm based on label propagation with community kernel. Physica A: Statistical Mechanics and Its Applications 416 (2014), 386399. Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Liu S. and Tarjan R. E.. 2018. Simple concurrent labeling algorithms for connected components. In 2nd Symposium on Simplicity in Algorithms (SOSA’19)(OpenAccess Series in Informatics (OASIcs), Vol. 69), Fineman Jeremy T. and Mitzenmacher Michael (Eds.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 3:1–3:20. Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Liu X. and Murata T.. 2010. Advanced modularity-specialized label propagation algorithm for detecting communities in networks. Physica A: Statistical Mechanics and Its Applications 389, 7 (2010), 14931500. Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Liu X. T., Firoz J. S., Zalewski M., Halappanavar M., Barker K. J., Lumsdaine A., and Gebremedhin A. H.. 2019. Distributed direction-optimizing label propagation for community detection. In 2019 IEEE High Performance Extreme Computing Conference (HPEC’19). IEEE, 16. Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Liu X. T., Halappanavar M., Barker K. J., Lumsdaine A., and Gebremedhin A. H.. 2020. Direction-optimizing label propagation and its application to community detection. In Proceedings of the 17th ACM International Conference on Computing Frontiers (CF’20). ACM, New York, NY, 192201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Lu H., Halappanavar M., and Kalyanaraman A.. 2015. Parallel heuristics for scalable community detection. Parallel Comput. 47 (2015), 1937. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Madduri K., Ediger D., Jiang K., Bader D. A., and Chavarria-Miranda D.. 2009. A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets. In 2009 IEEE International Symposium on Parallel Distributed Processing. IEEE, 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] McDaid A. and Hurley N.. 2010. Detecting highly overlapping communities with model-based overlapping seed expansion. In 2010 International Conference on Advances in Social Networks Analysis and Mining. IEEE, 112119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Merrill D., Garland M., and Grimshaw A.. 2012. Scalable GPU graph traversal. SIGPLAN Not. 47, 8 (2012), 117128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Orzan S. M.. 2004. On Distributed Verification and Verified Distribution. Ph.D. thesis. VRIJE UNIVERSITEIT. http://dare.ubvu.vu.nl/handle/1871/10338.Google ScholarGoogle Scholar
  45. [45] Palla G., Derényi I., Farkas I., and Vicsek T.. 2005. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 7043 (2005), 814818. Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Raghavan U. N., Albert R., and Kumara S.. 2007. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76, 3 (2007), 036106. Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Rossi R. A. and Ahmed N. K.. 2015. The network data repository with interactive graph analytics and visualization. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15). AAAI Press, 42924293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Scripps J., Tan P., and Esfahanian A.. 2007. Exploration of link structure and community-based node roles in network analysis. In 7th IEEE International Conference on Data Mining (ICDM’07). IEEE, 649654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Shen H., Cheng X., Cai K., and Hu M.. 2009. Detect overlapping and hierarchical community structure in networks. Physica A: Statistical Mechanics and its Applications 388, 8 (2009), 17061712. Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Shiloach Y. and Vishkin U.. 1982. An o(logn) parallel connectivity algorithm. Journal of Algorithms 3, 1 (1982), 5767. Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Shun J. and Blelloch G. E.. 2013. Ligra: A lightweight graph processing framework for shared memory. SIGPLAN Notices 48, 8 (2013), 135146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Shun J., Dhulipala L., and Blelloch G.. 2014. A simple and practical linear-work parallel algorithm for connectivity. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’14). ACM, 143153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Slota G. M., Rajamanickam S., and Madduri K.. 2014. BFS and coloring-based parallel algorithms for strongly connected components and related problems. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium. 550559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Soffer S. N. and Vàzquez A.. 2005. Network clustering coefficient without degree-correlation biases. Phys. Rev. E 71, 5 (2005), 057101. Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Soman J. and Narang A.. 2011. Fast community detection algorithm with GPUs and multicore architectures. In 2011 IEEE International Parallel Distributed Processing Symposium. IEEE, 568579. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Staudt C. L. and Meyerhenke H.. 2016. Engineering parallel algorithms for community detection in massive networks. IEEE Transactions on Parallel and Distributed Systems 27, 1 (2016), 171184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Stergiou S., Rughwani D., and Tsioutsiouliklis K.. 2018. Shortcutting label propagation for distributed connected components. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM’18). ACM, 540546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Stoica A., Han J. X., and Chaintreau A.. 2020. Seeding network influence in biased networks and the benefits of diversity. In Proceedings of The Web Conference 2020 (WWW’20). ACM, 20892098. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Sutton M., Ben-Nun T., and Barak A.. 2018. Optimizing parallel graph connectivity computation via subgraph sampling. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS’18). IEEE, 1221. Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Tithi J. J., Stasiak A., Aananthakrishnan S., and Petrini F.. 2020. Prune the unnecessary: Parallel pull-push Louvain algorithms with automatic edge pruning. In 49th International Conference on Parallel Processing ICPP (ICPP’20). ACM, 111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Whang J. J., Gleich D. F., and Dhillon I. S.. 2016. Overlapping community detection using neighborhood-inflated seed expansion. IEEE Transactions on Knowledge and Data Engineering 28, 5 (2016), 12721284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Whang J. J., Lenharth A., Dhillon I. S., and Pingali K.. 2015. Scalable data-driven pagerank: Algorithms, system issues, and lessons learned. In Euro-Par 2015: Parallel Processing, Träff J. L., Hunold S., and Versaci F. (Eds.). Springer, Berlin, 438450. Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Xie J. and Szymanski B. K.. 2013. LabelRank: A stabilized label propagation algorithm for community detection in networks. In 2013 IEEE 2nd Network Science Workshop (NSW’13). IEEE, 138143. Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Xing Y., Meng F., Zhou Y., Zhu M., Shi M., and Sun G.. 2014. A node influence based label propagation algorithm for community detection in networks. Scientific World Journal 2014 (2014). Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Yan D., Cheng J., Xing K., Lu Y., Ng W., and Bu Y.. 2014. Pregel algorithms for graph connectivity problems with performance guarantees. Proc. VLDB Endow. 7, 14 (2014), 18211832. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. [66] Yang J. and Leskovec J.. 2015. Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42, 1 (2015), 181213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. [67] Zhang H., Gao Y., and Zhang Y.. 2018. Overlapping communities from dense disjoint and high total degree clusters. Physica A: Statistical Mechanics and its Applications 496 (2018), 286298. Google ScholarGoogle ScholarCross RefCross Ref
  68. [68] Zhang K., Chen R., and Chen H.. 2015. NUMA-aware graph-structured analytics. SIGPLAN Not. 50, 8 (Jan.2015), 183193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. [69] Zhang Q., Qiu Q., Guo W., Guo K., and Xiong N.. 2016. A social community detection algorithm based on parallel grey label propagation. Computer Networks 107 (2016), 133143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. [70] Zhang X., Song F., Chen S., Tian X., and Ao Y.. 2015. Label propagation algorithm based on local cycles for community detection. International Journal of Modern Physics B 29, 5 (2015), 1550029. Google ScholarGoogle ScholarCross RefCross Ref
  71. [71] Zhang Y., Azad A., and Buluç A.. 2020. Parallel algorithms for finding connected components using linear algebra. J. Parallel and Distrib. Comput. 144 (2020), 1427. Google ScholarGoogle ScholarCross RefCross Ref
  72. [72] Zhao Y., Li S., and Jin F.. 2016. Identification of influential nodes in social networks with community structure based on label propagation. Neurocomputing 210 (2016), 3444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. [73] Zhu X. and Ghahramani Z.. 2002. Learning from Labeled and Unlabeled Data with Label Propagation. Technical Report. CMU.Google ScholarGoogle Scholar

Index Terms

  1. Direction-optimizing Label Propagation Framework for Structure Detection in Graphs: Design, Implementation, and Experimental Analysis

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Journal of Experimental Algorithmics
        ACM Journal of Experimental Algorithmics  Volume 27, Issue
        December 2022
        776 pages
        ISSN:1084-6654
        EISSN:1084-6654
        DOI:10.1145/3505192
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 December 2022
        • Online AM: 27 October 2022
        • Accepted: 12 September 2022
        • Revised: 8 August 2022
        • Received: 3 June 2021
        Published in jea Volume 27, Issue

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)509
        • Downloads (Last 6 weeks)51

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format