Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Dynamic graphs are becoming ubiquitous formats for representing relational datasets such as social, collaboration, communication and computer networks. One of the vital tasks for gaining an insight into the behavioral patterns of such datasets is anomaly detection. Anomaly detection in time-evolving graphs is the task of finding timestamps that correspond to an unusual event in a sequence of graphs [2]. For instance, a social network anomaly may correspond to the merging or splitting of its communities. Anomaly detection plays an important role in numerous applications, such as network intrusion detection, credit card fraud [9] and discontinuity detection in social networks [3].

However, there are many challenges associated with event detection in dynamic graphs. Networks such as Facebook or Twitter comprise billions of interacting users where the structure of the network is constantly updated. Moreover, there is often a lack of labels for normal and anomalous graph instances, which requires learning to be unsupervised. Due to these challenges, graph anomaly detection has attracted growing interest over time.

To address these challenges, many anomaly detection techniques use a pre-processing phase where they extract structural features from graph representations. These features may include node centrality [14], ego-nets [3] and eigenvalues [10]. They then apply well-known similarity measures to compare graph changes over a period of time. In this scenario, the graphs are converted into feature sets and therefore they do not pose the complexities associated with the inter-dependencies of nodes, in addition to causing a considerable decrease in the time and space requirements for the anomaly detection scheme.

However, the process of generating structure-aware features for graphs can be challenging in itself. For instance, the eigenvalues of a graph can be a suitable representation for its patterns of connectivity, but they have high storage and time requirements. A common shortcoming between these approaches is the need to perform matrix inversions, where the graphs are too sparse to be invertible. Another property of graph summarization techniques should be their interpretability. Revealing structural information such as communities, node roles or maximum independent sets can be very useful in further analysis of graphs.

To address these issues we propose an approach for detecting graph anomalies based on the ranking of the nodes. The novelty of our method lies in a scalable pre-processing scheme that produces stable results. Our matrix re-ordering approach efficiently assigns ranks to each node in the graph, where the resulting ranks can be used directly as a basis for comparing consecutive graph snapshots. Our re-ordering approach reduces the input dimension of a graph from \(O(n^{2})\) to O(n). We can easily use a rank correlation coefficient as a similarity measure over pairs of graphs. Another advantage of our approach is its capability to produce interpretable results that identify large independent sets. The compact representation of the graphs yields faster and simpler anomaly detection schemes.

We review some of the algorithms previously introduced in the domain of graph anomaly detection in Sect. 2. We then define our notation and outline the problem statement in Sect. 3. The details of the proposed method and its properties are summarized in Sect. 4. The benchmark datasets in addition to the baseline algorithms for comparison are discussed in Sect. 5. We then show the results of anomaly detection and discuss the scalability and stability of our algorithm in Sect. 6. Finally, we conclude the paper and present future directions for research in Sect. 7.

2 Related Work

One of the most valuable tasks in data analysis is to recognize what stands out in a dataset. This type of analysis provides actionable information and improves our knowledge of the underlying data generation scheme. Various approaches have been developed for detection of such abnormalities [4], however many of these techniques disregard relational datasets where data instances demonstrate complex inter-dependencies. Due to the abundance and cross-disciplinary property of relational datasets, graph-based anomaly detection techniques have received growing attention in social networks, web graphs, road map networks and so forth [3].

We review some of the dominant techniques for the detection of anomalies. We focus on graphs that are plain where nodes and/or edges are not associated with attributes and the nodes are consistently labeled over time.

2.1 Graph-Based Anomaly Detection

Several approaches to pattern mining in graphs stem from distance based techniques, which utilize a distance measure in order to detect abnormal vs. normal structures. An example of such an approach is the k-medians algorithm [8], which employs graph edit distance as a measure of graph similarity. Other approaches take advantage of graph kernels [15], where kernel-based algorithms are applied to graphs. They compare graphs based on common sequences of nodes, or subgraphs. However, the computational complexity of these kernels can become problematic when applied to large graphs.

Other graph similarity metrics use the intuition of information flow when comparing graphs. The first step in these approaches is to compute the pairwise node affinity matrices in each graph and then determine the distance between these matrices. There are several approaches for determining node affinities in a graph, such as Pagerank and various extensions of random walks [6]. Another recent approach in this category is called Delta connectivity, which can be used for the purpose of anomaly detection. This approach calculates the graph distance by comparing node affinities [16]. It measures the differences in the immediate and second-hop neighborhoods of graphs. These approaches also suffer from the curse of dimensionality in large graphs.

Moreover, there are approaches that try to extract properties such as graph centric features before performing anomaly detection. These features can be computed from the combination of two, three or more nodes, i.e., dyads, triads and communities. They can also be extracted from the combination of all nodes in a more general manner [1]. Many anomaly detection approaches [12] have utilized graph centric features in their process of anomaly detection. Since the graph is summarized as a vector of features, the problem of graph-based anomaly detection transforms to the well-known problem of spotting outliers in an n-dimensional space. Therefore standard unsupervised anomaly detection schemes such as ellipsoidal cluster based approaches can be employed [19]. A thorough survey of such techniques can be found in [4]. It is worth noting that the extracted features cause information loss that can affect the performance of the anomaly detection scheme.

Another approach for graph mining is tensor decomposition. These techniques represent the time-evolving graphs as a tensor that can be considered as a multidimensional array, and perform tensor factorization. Tensor factorization approximates the input graph, where the reconstruction error can highlight anomalous events, subgraphs and/or vertices [20].

Although this field of research has received growing attention in recent years, the problem of scalability and interpretability of results still remains. Graph-centric features can reduce the dimensionality of the input graphs, but they may not be able to provide visually interpretable results. On the other hand, decomposition-based methods provide meaningful representations of graphs but suffer from the curse of dimensionality. The trade-off between these two issues has motivated us to find a compact representation of graphs that preserves the structural properties of networks. This can help further analysis of the data to become computationally efficient. Specifically for the task of anomaly detection, we provide experiments that demonstrate the efficiency and utility of our approach.

3 Preliminaries and Problem Statement

We start by describing the basic notation and assumptions of our anomaly detection task. A graph \(G=(V,E)\) is defined as a set of nodes V and edges \(E \subseteq V\times V\), where an edge \(e\in E\) denotes a relationship between its corresponding nodes \(v_{i},v_{j}\). The degree \(d_{i}\) of a vertex \(v_{i}\) is defined as the sum of the number of its incoming (in-degree) and outgoing (out-degree) edges. A Maximum Independent Set (MIS) is the largest subset of vertices \(V_{MIS} \subseteq V \) such that there is no edge between any pair of vertices in \(V_{MIS}\).

The maximum independent set problem is closely related to common graph theoretical problems such as maximum common induced subgraphs, minimum vertex covers, graph coloring, and maximum common edge subgraphs. Finding MISs in a graph can be considered a sub-problem of indexing for shortest path and distance queries, automated labeling of maps, information coding, and signal transmission analysis [18].

Graphs are often represented by binary adjacency matrices, \(A_{n\times n}\), where \(n=|V|\) denotes the number of nodes. An element of the adjacency matrix \(a_{ij}=1\) if there is an edge from \(v_{i}\) to \(v_{j}\). The simultaneous re-ordering of rows and columns of the adjacency matrix is called matrix permutation.

We formulate the problem of anomaly detection as follows: Given a sequence of graphs \(\{G\}_{1...m}\), where m is the number of input graphs, we want to determine the time stamp(s), \(i \in \{1...m\}\), when an event has occurred and changed the structural properties of the graph \(G_{i}\). We consider the following assumptions about the input graphs:

  • The vertices and edges in the graph are unweighted.

  • There is no external vertex ordering.

  • The input graphs are plain, i.e., no attributes are assigned to edges or vertices.

  • The number of nodes remains the same throughout the graph sequence.

  • The labeling of nodes between graphs is consistent.

An important issue for the design of a scalable anomaly detection scheme is the number of input features or dimensions that are required to be processed. If a graph-based anomaly detection uses a raw adjacency matrix as input, then the input dimensionality is \(O(n^{2})\), which is impractical for large graphs. In order to address the issue of scalability, we need to find a compact representation for each graph. We propose a pre-processing algorithm that extracts a rank feature for each node that is associated with the maximum independent sets in each graph. Therefore, instead of storing and processing an adjacency matrix of size \(n\times n\), we reduce the input dimensionality and computational requirements for our anomaly detector to n.

For each graph in the sequence \(\{G_{1}=(V_{1},E_{1}), G_{2}=(V_{2}, E_{2}),..., G_{m}=(V_{m}, E_{m})\}\), we determine the new matrix re-ordering vector \(\{{V_{1}}^{'}, {V_{2}}^{'}, ..., {V_{m}}^{'}\}\). We then compute the rank correlation coefficient between every two consequent tuples, \(({V_{i}}^{'}, {V_{i+1}}^{'})\). We employ the Spearman rank correlation coefficient as shown in Eq. 1 between two input rank vectors, \(\overrightarrow{V}_{i}^{'},\overrightarrow{V}_{i+1}^{'}\), where \(d_{i} = v_{i}-v_{i+1}\):

$$\begin{aligned} \rho =1-\frac{6\sum {d_{i}}^2}{n(n^{2}-1)} \end{aligned}$$
(1)

The computational complexity of Eq. 1 is O(n), where n is the length of the input vectors. The intuition behind our approach is to design a stable and scalable algorithm for determining the significance of each node and revealing structural information by manipulating the adjacency matrix \(A_{n\times n}\). We need to find a matrix permutation that satisfies the following properties:

  • Locality: Non-zero elements of the matrix should be in close vicinity in the ordering after the permutation.

  • Stability: The initial ordering of the rows and columns should have no effect on the final outcome of the re-ordering.

  • Scalability: The algorithm should have low computational complexity in order to handle large scale graphs.

  • Interpretability: The permuted matrix should reveal structural information such as MISs about the graph.

4 Our Approach: Amplay

In order to achieve the above objectives, we propose an approach entitled Amplay (Adjacency matrix permutation based on layers). In each iteration, Amplay sorts vertices according to their total degree, and picks the vertex with the highest degree. Ties are resolved according to the ordering in the previous iteration. We then remove the vertex and its incidental edges, and recursively apply the algorithm. The outline of the re-ordering approach is given in Algorithm 1. In order to clarify the process of Amplay implementation, we have provided an example of Amplay operation in Figs. 1a and 1b.

Fig. 1.
figure 1

Examples of Amplay algorithm operation

figure a

One of the interesting properties of Amplay is its capability to reveal MISs associated with each input graph. Figure 2 shows the permuted adjacency matrix of the Enron email dataset where the MISs are denoted as \(S_{1},S_{2},...\). The groupings of nodes into the MISs indicates that Amplay can be used as a heuristic to determine the MISs of a graph in various problem domains. A prominent feature of the matrices produced by the Amplay method is a front line such that all non-zero matrix elements are located above the line. Indeed, we can consider an adjacency matrix as a grid with integer coordinates. Here the first coordinate spans rows from top to bottom, the second coordinate spans columns from left to right. We define the front line as follows: \((1, n), (1, n-a_{1}+1), (2, n-a_{1}),(2, n-a_{1}-a_{2}+1), . . . , (s, s), . . . , (n-a_{1}+1, 1), (n, 1)\), where \(\{a_{i}\}\) is the sequence produced by Algorithm 1 and s is the number of iterations of the algorithm.

Lemma 1

Every matrix element below the front line is zero.

Proof

The front line spans intersections of vertices from sets \(A_{x}\) with their respective \(v_{x}\). By definition, \(A_{x}\) are vertices that are only incidental to vertices placed before \(v_{x}\) or to \(v_{x}\), which implies that matrix elements below and to the right from the intersections of \(A_{x}\) and \(v_{x}\) are zero.

As we explain below, the front line is important in visualization, because it allows us to grasp (1) the degree distribution of the graph, and (2) the relative size of the largest independent set revealed by Amplay. Note that the shape of the front line is defined by the sequence \(\{a_{i}\}\), where \(a_{i}\) is closely related to the degree of the vertex placed at position i. As a consequence, the front line reflects the degree distribution in a graph.

A key property of Amplay is multiple vertex sorting. Recall that at each iteration, vertices are sorted according to the total degree of the remaining graph, and ties are resolved using the ordering from the previous iteration. Such a sorting has two consequences. First, the resulting index of each vertex depends not only on the vertex degree, but also on a vertex connectivity pattern (e.g., the number of connections to high-degree nodes). This pattern is reflected in the positions of the vertex in subsequent sorting rounds. While many vertices can have the same degree, the vertices tend to differ in their connectivity patterns. As such, Amplay tends to produce a relatively deterministic ordering. This in turn results in a relatively small variance in the behavior of subsequent graph processing algorithms. Second, vertices that have a similar connectivity pattern will have similar positions during sorting across subsequent iterations, and thus have similar positions in the resulting Amplay ordering. This explains why Amplay tends to produce matrices with a smooth visual appearance.

Lemma 2

Graph \(G=(V,E)\) contains an independent set with at least \(n-n_{tail}\) vertices, where \(n_{tail}\) is the value from Amplay at the moment of termination.

Proof

At the end of each iteration of Amplay, vertices assigned to indices larger than or equal to \(n_{tail}\) are incidental only to vertices assigned to indices smaller than \(n_{head}\). At the point of termination \(n_{head}=n_{tail}\). Hence, vertices assigned to indices larger than \(n_{tail}\) are pairwise disjoint and form an independent set.

Fig. 2.
figure 2

The Amplay re-ordered adjacency matrix of the Enron email dataset.

In addition to revealing structural properties of the graph, Amplay proves to be scalable. We describe the computational complexity of this re-ordering approach in Lemma 3.

Lemma 3

The complexity of Amplay is \(O(\sum _{i=0}^{s} n_{i}\log n_{i})\) where \(n_{i}=|V_{i}|\) defined in Amplay, and \(s\le |V|\) is the number of iterations.

Proof

Each iteration of the algorithm operates on a subgraph with \(n_{i}\) vertices, and involves sorting (which can be performed in \(O(n_{i}\log n_{i})\) time), finding neighbors of the chosen vertex \(v_{x}\) (linear in \(n_{i}\)), and removing incidental edges (linear in \(n_{i}\)). As such the overall complexity of one iteration is bounded by \(O(n_{i}\log n_{i})\) and the total complexity is bounded by \(O(\sum _{i=0}^{s} n_{i}\log n_{i})\).

It is worth mentioning that in many real-world graphs, \(n_{i}\) rapidly decreases, which reduces the total running time. Moreover, we can improve the scalability of Amplay further, by choosing k vertices with the largest total degrees, place them, and advance the \(n_{head}\) pointer by k at each iteration (line 4 in Algorithm 1). Furthermore, in line 6 of Algorithm 1, we can define \(A_{x}\) as a set of vertices incidental only to the chosen k vertices. The front line is now defined as \((k, n), (k, n-a_{1}+1), (2k, n-a_{1}), (2k, n-a_{1}-a_{2}+1), ... , (s.k,s.k), ..., (n-a_{1}+1,k), (n, k)\), and it is easy to verify that Lemmas 1 and 2 hold. If we increase k, we can see that the prominent structural features of the graph are preserved. Moreover the computational complexity of Amplay when \(k > 1\) is \(O(\sum _{i=0}^{s^{'}}{n_{i}}^{'}\times r_{i})\) where \(r_{i} =\max (\log {n_{i}}^{'},k)\). Using \(k > 1\) is beneficial because it reduces the number of iterations \(s^{'}\), and sequence \({n_{i}}^{'}\) decreases faster than \(n_{i}\).

5 Evaluation Methodology

In this section, we describe each dataset used in our experiments and elaborate on the baseline algorithms for comparison.

5.1 Benchmark Datasets

For the purpose of anomaly detection, we have selected a representative sample of sparse real-world datasets. The first real dataset is the Facebook wall posts data collected from September 26th, 2006 to January 22nd, 2009 from users in the New Orleans network [22]. The number of users is 90,269, however only 60,290 exhibited activity.

The next real dataset is the Autonomous Systems (AS) data [17]. The graphs comprising the AS dataset represent snapshots of the backbone Internet routing topology, where each node corresponds to a subnetwork in the Internet. The edges represent the traffic flows exchanged between neighbors. The dataset is collected daily from November 8, 1997 to January 2, 2000 with nodes being added or deleted.

Another real dataset is the Enron email network that gathers the email communications within the Enron corporation from January 1999 to January 2003 [7]. There are 36,692 nodes in this network, where each node corresponds to an email address. We have used the nodes with a minimum activity level and reduced the graph to 184 nodes.

The final real data is the DBLPFootnote 1 dataset that consists of co-authorship information in computer science. The number of nodes is 1,631,698 and the data is gathered from 1954 to 2010. The description of these datasets is summarized in Table 1. DBLP graphs are used to test the scalability of our approach.

Table 1. Benchmark description where * denotes undirected graphs.
Table 2. Computational complexity for baseline and proposed approaches.

5.2 Baseline Algorithm

For the purpose of comparison, we have used a recent approach for computing graph similarity with applications in anomaly detection as our baseline. This algorithm is called delta connectivity (DeltaCon) [16], where the node affinity matrices for each graph are calculated using a belief propagation strategy shown in Eq. 2. This approach considers first-hop and second-hop neighborhoods for calculating the influence of the nodes on each other and has been proven to converge.

$$\begin{aligned} S = [s_{ij}] =[I+\eta ^{2}D -\eta {A^{'}}^{-1}] \end{aligned}$$
(2)

After determining the node affinity matrices, they compare the consecutive graphs by calculating the root Euclidean distance shown in Eq. 3, which varies in the range [0, 1]. We empirically have chosen \(\eta =0.1\) in our experiments.

$$\begin{aligned} sim (S_{1},S_{2})=\sqrt{\sum _{i=1}^{n}\sum _{j=1}^{j=n}(\sqrt{S_{1,ij}}-\sqrt{S_{2,ij}})^{2}} \end{aligned}$$
(3)

The computational complexity of this algorithm is reported to be linear in the number of edges of each graph, O(|E|).

Another baseline algorithm is an approach called Random Projection (RP) that has shown to be effective in determining anomalous graphs in block-structured networks [21]. The intuition behind RP comes from the Johnson and Lindenstrauss lemma [11] as presented in Lemma 4. This lemma asserts that a set of points in Euclidean space, \(P^{1...n} \in \mathbb {R}^{n\times m}\), can be embedded into a d-dimensional Euclidean space, \(P^{\prime 1...n} \in R^{n\times d}\) while preserving all pairwise distances within a small factor \(\epsilon \) with high probability.

Lemma 4

Given an integer n and \(\epsilon >0\), let d be a positive integer such that \(d \ge d_{0}=O(\epsilon ^{-2}\log n)\). For every set P of n points in \(\mathbb {R}^{m}\), there exists \(f: \mathbb {R}^{m}\rightarrow \mathbb {R}^{d}\) such that with probability \(1-n^{-\beta }\), \(\beta >0\), for all \(u, v\in P\)

$$\begin{aligned} (1-\epsilon )||u-v||^{2} \le ||f(u)-f(v)||^{2}\le (1+\epsilon )||u-v||^{2} \end{aligned}$$
(4)

One of the algorithms for generating a random projection matrix that has been shown to preserve pairwise distances [11] is presented in Eq. 5:

$$\begin{aligned} r_{ij}= \sqrt{3}{\left\{ \begin{array}{ll} +1 &{} with \, probability \, 1/6\\ 0 &{} with \, probability \, 2/3\\ -1 &{} with \, probability \, 1/6\\ \end{array}\right. } \end{aligned}$$
(5)

6 Results and Discussion

In this section, we outline our experimental setup in five sections and report the observed results. We first demonstrate the effectiveness of Amplay and rank correlation in prioritizing nodes that can contribute the most to the structural change in consecutive graphs. We then investigate the capability of our algorithm in detecting anomalous graphs based on the produced similarity score. Thereafter, we discuss the scalability of our approach empirically by changing parameter k. We provide our empirical studies regarding the stability of the Amplay algorithm on static graphs.

Experiment I: Gradual Change Detection. The effectiveness of Amplay lies in its ability to reveal maximum independent sets. The nodes that comprise each set can be considered the most influential nodes collected from every community in the graph. Figure 3 shows the gradual change in the graph structure by removing the edge \(e_{3,10}\) connecting \(v_{3} \) and \(v_{10}\). \(e_{3,10}\) is the connecting bridge between two of the present communities in the graph and its elimination may lead to discontinuity in the entire graph structure. As can be seen, \(v_{3}\) is the node that contributes the most to the dissimilarity between \(G_{1}\) and \(G_{2}\).

Fig. 3.
figure 3

Example of gradual change in the structure of the graph and the importance of each node in the overall similarity score.

Initial Node Ordering for \(G_{1}\), \(G_{2}\): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15

Amplay and Rank Correlation Node Importance: 3, 5, 13, 15, 6, 7, 1, 2, 4, 8, 9, 10, 11, 12, 14, 16

DeltaCon Node Importance: 3, 10, 14, 16, 12, 13, 2, 5, 15, 4, 6, 7, 11, 1, 9, 8

Experiment II: Anomaly Detection. We have applied the proposed approach (with parameter \(k=1\)) and the baseline algorithms on the benchmark datasets, and compared their computed similarity score between consecutive days. The implementations were run in Matlab using a machine with a 3 GHz Processor and 8 GB RAM. Due to the computational complexity of the random projection approach, we only use this algorithm as a baseline for comparing scalability.

Our proposed method and DeltaCon generate scores in the range [0, 1]. Figures 4, 5 and 6 demonstrate the graph similarity scores for the Autonomous Systems, Facebook and Enron datasets respectively. As can be seen, the trend of similarity scores is the same for DeltaCon and our proposed method.

Fig. 4.
figure 4

Comparison of graph similarity scores based on the correlation score of the Amplay-permuted adjacency matrix and DeltaCon on the Autonomous Systems dataset.

Fig. 5.
figure 5

Comparison of graph similarity scores based on the correlation score of the Amplay-permuted adjacency matrix and DeltaCon on the Facebook dataset.

Fig. 6.
figure 6

Comparison of graph similarity scores based on the correlation score of the Amplay-permuted adjacency matrix and DeltaCon on the Enron dataset.

Experiment III: Computational Scalability. The reported results for anomaly detection were achieved by setting parameter \(k=1\), where k was defined at the end of Sect. 4 as the number of vertices that are processed and removed from the graph in a single iteration. We decided to increase k and investigate the performance of our anomaly detection scheme. It is worth recalling that we are using only a subset of nodes for the purpose of anomaly detection. We consider the top l elements in the rank vectors where \(l=n_{head}\) after the termination of Amplay.

Increasing parameter k leads to an exponential decrease in computation time. This observation can be explained by the sparsity of real-world graphs, i.e., the small proportion of fully-connected cliques. Since k is the number of vertices that are processed and removed from the graph within a single iteration, increasing k leads to a more rapid graph reduction. However, at some value of k, all highly connected vertices are processed within a single iteration, and the remaining graph contains only vertices with low degrees. Therefore, subsequent increases of k do not lead to a significant performance improvement. Figure 7 demonstrates the effect of parameter k on the processing time of Amplay for the Enron dataset. Although the parameter k is increased to 100, we can still observe the maximum independent sets \(S_{1},S_{2},...,S_{n}\) as demonstrated in Fig. 2. Another attractive property of our scheme is the compact representation of the graph produced by Amplay. This compact representation scales linearly in the number of input nodes n. The real-world graphs are mainly comprised of sets of dense cores and sparse periphery nodes. Therefore, the number of nodes to consider for graph similarity computation is only a fraction of the total number of nodes in a graph. Amplay discards the peripheral nodes that are connected to only a few vertices from the core. The influential nodes usually appear as \(V^{'}_{1},V^{'}_{2},...,V^{'}_{n_{head}}\), where \(n_{head} \ll n\). The upper bound of n denotes the worst case scenario where the input graph is fully-connected. Table 3 demonstrates the computation time and number of considered nodes in calculating graph similarity. The upper bounds for time complexity of the embedding approaches is demonstrated in Table 2. As can be seen, our proposed method and DeltaCon outperform random projection, and both are scalable when the adjacency matrices are sparse. The advantage of our approach lies in its ability to generate an interpretable result where structural features of a graph, such as MISs, are revealed as shown in Fig. 2.

Fig. 7.
figure 7

Amplay computation time as the parameter k is increased in the Enron dataset where k is the number of vertices that are processed and removed from the graph in a single iteration.

Table 3. Computation time of Amplay on different datasets.

Experiment IV: Amplay Stability. We compare Amplay with other ordering methods, namely random, RCM [5], and SlashBurn [13]. Random permutation serves as a naive baseline; RCM is a classical bandwidth reduction algorithm [5]; and SlashBurn is a recent method that is shown to produce adjacency matrices with localized non-zero elements. This method is shown to be one of the best state-of-the-art methods [13].

We use a representative sample of sparse real-world graphs of different sizes for quantitative evaluation (Table 4) where all graphs were downloaded from the Stanford Large Network CollectionFootnote 2. The table shows graph names as they appear in the Collection, however in the paper we use simplified names (e.g., gnutella instead of p2p-Gnutella08).

We first load each graph as an adjacency matrix S and produce \(N + 1\) random permutations of the graph vertices \(RND_{i}(S), i = 0,1,...,N\). We then take each random permutation as input and either leave it as it is (method Random), or apply RCM, SlashBurn or Amplay permutation -respectively, \(RCM(RND_{i}(S))\), \(SlashBurn(RND_{i}(S))\) and \(Amplay(RND_{i}(S))\).

We then evaluate ordering stability by selecting one of the random permutations as a reference (e.g., \(i_{ref} = 0\)), and comparing the vertex ordering between each of the other permutations and the reference (e.g., compare \(RND_{0}(S)\) with \(RND_{j}(S)\)). In this section, we use both Amplay and SlashBurn with \(k = 1\). That is, we evaluate the basic forms of these algorithms, as opposed to more coarse scalable versions.

Table 4. Real-world graphs used in our stability analysis. * mark undirected graphs.

We compare two vertex orderings using the Kendall correlation coefficient. This coefficient takes values in \([-1, 1]\), where 1 is reached in the case of equivalence of the orderings. If the two orderings are independent, one would expect the coefficient to be approximately 0. Intuitively, vertices with higher degrees tend to have a higher impact on matrix operations and visual appearance. Therefore, we also separately look at ordering stability for higher degree vertices only. Specifically, we compute the Kendall correlation while ignoring a certain proportion (\(0, 80, 90, 95\,\%\)) of vertices with low degrees. Here \(0\,\%\) means that we compare orderings for all graph vertices. On the other hand, \(95\,\%\) means that we only consider the ordering of the top \(5\,\%\) of vertices with the highest degrees. We present our results in Fig. 8 and Table 5 (permutations with \(k = 1\) were slow for large graphs, therefore we have fewer runs for large graphs). Overall, Amplay outperforms the other methods by a large margin (\(p < 0.01\), Wilcoxon signed rank test). In other words, Amplay tends to be less dependent on the input ordering.

Table 5. Stability measured with Kendall Tau at \(90\,\%\) for large graphs. The table shows the means for three comparisons.
Fig. 8.
figure 8

Amplay stability in comparison to the rival approaches, SlashBurn [13], RCM [5] and Random ordering, as we vary the percentage of ignored low-degree vertices.

7 Conclusion and Future Work

In this paper, we presented an unsupervised approach for detecting anomalous graphs in time-evolving networks. We created a compact yet structure-aware feature set for each graph using a matrix permutation technique called Amplay. The resulting feature set included the rank of each node in a graph and this rank ordering was used by rank correlation for comparing a pair of graphs. This simple yet effective approach overcomes the issues of scalability when handling large-scale graphs. We showed the low time complexity and structure-aware property of our re-ordering approach both empirically and theoretically. Moreover, we designed experiments for the purpose of anomaly detection in four real datasets, where our approach was compared against an effective graph similarity method and proved to be successful in highlighting abnormal events. In future work, we will explore the possibilities of reducing the dimensionality of the graph even further by using a random projection approach. Since we reduce the dimensionality from \(O(n^{2})\) to O(n), we can consider the rank vectors of each graph as a data stream. Thereafter, we will investigate a window-based approach for determining anomalous graphs given a history of past normal instances.