Community detection in sparse networks via Grothendieck’s inequality

Guédon, Olivier; Vershynin, Roman

doi:10.1007/s00440-015-0659-z

Community detection in sparse networks via Grothendieck’s inequality

Published: 03 September 2015

Volume 165, pages 1025–1049, (2016)
Cite this article

Download PDF

Probability Theory and Related Fields Aims and scope Submit manuscript

Community detection in sparse networks via Grothendieck’s inequality

Download PDF

Olivier Guédon¹ &
Roman Vershynin²

1957 Accesses
79 Citations
Explore all metrics

Abstract

We present a simple and flexible method to prove consistency of semidefinite optimization problems on random graphs. The method is based on Grothendieck’s inequality. Unlike the previous uses of this inequality that lead to constant relative accuracy, we achieve any given relative accuracy by leveraging randomness. We illustrate the method with the problem of community detection in sparse networks, those with bounded average degrees. We demonstrate that even in this regime, various simple and natural semidefinite programs can be used to recover the community structure up to an arbitrarily small fraction of misclassified vertices. The method is general; it can be applied to a variety of stochastic models of networks and semidefinite programs.

Community detection with a subsampled semidefinite program

Article Open access 06 May 2022

Finding One Community in a Sparse Graph

Article 02 August 2015

Finding communities in sparse networks

Article Open access 06 March 2015

1 Introduction

1.1 Semidefinite problems on random graphs

In this paper we present a simple and general method to prove consistency of various semidefinite optimization problems on random graphs.

Suppose we observe one instance of an $n \times n$ symmetric random matrix A with unknown expectation $\bar{A} := \mathbb {E}A$. We would like to estimate the solution of the discrete optimization problem

$$\begin{aligned} \text {maximize}\, x^\mathsf {T}\bar{A} x \quad \text {subject to} \quad x \in \{-1,1\}^n. \end{aligned}$$

(1.1)

A motivating example of A is the adjacency matrix of a random graph; the Boolean vector x can represent a partition of vertices of the graph into two classes. Such Boolean problems can be encountered in the context of community detection in networks which we will discuss shortly. For now, let us keep working with the general class of problems (1.1).

Since $\bar{A}$ is unknown, one might hope to estimate the solution $\bar{x}$ of (1.1) by solving the random instance of this problem, that is

$$\begin{aligned} \text {maximize}\, x^\mathsf {T}A x \quad \text {subject to} \quad x \in \{-1,1\}^n. \end{aligned}$$

(1.2)

The integer quadratic problem (1.2) is NP-hard for general (non-random) matrices A. Semidefinite relaxations of many problems of this type have been proposed; see [6, 34, 49, 57] and the references therein. Such relaxations are known to have constant relative accuracy. For example, a semidefinite relaxation in [6] computes, for any given positive semidefinite matrix A, a vector $x_0 \in \{-1,1\}^n$ such that $x_0^\mathsf {T}A x_0 \ge 0.56 \, \max _{x \in \{-1,1\}^n} x^\mathsf {T}A x$.

In this paper we demonstrate how semidefinite relaxations of (1.2) can recover a solution of (1.1) with any given relative accuracy. Like several previously known methods, our approach is based on Grothendieck’s inequality. We refer the reader to the surveys [41, 61] for many reformulations and applications of this inequality in mathematics, computer science, optimization and other fields. In contrast to the previous methods, we are going to apply Grothendieck’s inequality for the (random) error $A - \bar{A}$ rather that the original matrix A, and this will be responsible for the arbitrary accuracy.

We will describe the general method in Sect. 2. It is simple and flexible, and it can be used for showing consistency of a variety of semidefinite programs, which may or may not be related to Boolean problems like (1.1). But before describing the method, we would like to pause and give some concrete examples of results it yields for community detection.

For simplicity, we will first focus on the classical stochastic block model, which is a random network whose nodes are split into two equal-sized clusters. In Sect. 1.3 we will extend our discussion for broader models of networks almost without extra effort.

1.2 Community detection: the classical stochastic block model

It is now customary to model networks as inhomogeneous random graphs [13], which generalize the classical Erdös-Rényi model G(n, p). A benchmark example is the stochastic block model [40]. In this section we focus on the basic model with two communities of equal sizes; in Sect. 1.3 we will consider a more general situation.

We define a random graph on vertices $\{1,\ldots ,n\}$ as follows. Partition the set of vertices into two communities $\mathcal {C}_1$ and $\mathcal {C}_2$ of size n / 2 each. For each pair of distinct vertices, we draw an edge independently with probability p if both vertices belong to the same community, and q (with $q \le p$) if they belong to different communities. For convenience we include the loops, so each vertex has an edge connecting it to itself with probability 1. This defines a distribution on random graphs which is denoted G(n, p, q) and called the (classical) stochastic block model. When $p=q$, we recover the classical Erdös-Rényi model of random graphs G(n, p).

The community detection problem asks to recover the communities $\mathcal {C}_1$ and $\mathcal {C}_2$ by observing one instance of a random graph drawn from G(n, p, q). As we will discuss in detail in Sect. 1.4, an array of algorithms is known to succeed for this problem for relatively dense graphs, those whose expected average degree (which is of order pn) is $\Omega (\log n)$, while less is known for totally sparse graphs—those with bounded average degrees, i.e. with $pn = O(1)$. Our paper focuses on this sparse regime.

Recovery of the communities $\mathcal {C}_1$ and $\mathcal {C}_2$ is equivalent to estimating the community membership vector, which we can define as

$$\begin{aligned} \bar{x}\in \{-1,1\}^n, \quad \bar{x}_i = {\left\{ \begin{array}{ll} 1, &{} i \in \mathcal {C}_1 \\ -1, &{} i \in \mathcal {C}_2. \end{array}\right. } \end{aligned}$$

(1.3)

We will estimate $\bar{x}$ using the following semidefinite optimization problem:

$$\begin{aligned} \begin{aligned}&\text {maximize}\, \langle A,Z\rangle - \lambda \langle E_n,Z\rangle \\&\text {subject to}\, Z \succeq 0, \; \hbox {diag}(Z) \preceq \mathbf{I }_n. \end{aligned} \end{aligned}$$

(1.4)

Here the inner product of matrices is defined in the usual way, that is $\langle A,B\rangle = \hbox {tr}(AB) = \sum _{i,j} A_{ij} B_{ij}$, $\mathbf{I }_n$ denotes the identity matrix, the matrix $E_n$ has all entries equal 1, and $A \succeq B$ means that $A-B$ is positive semidefinite. Observe that $E_n = \mathbf{1}_n \mathbf{1}_n^\mathsf {T}$ where $\mathbf{1}_n \in \mathbb {R}^n$ is the vector whose all coordinates equal 1. The constraint $\hbox {diag}(Z) \preceq \mathbf{I }_n$ in (1.4) simply means that all diagonal entries of Z are bounded by 1.

For the value of $\lambda $ we choose the average degree of the graph (with loops removed), which is

$$\begin{aligned} \lambda = \frac{2}{n(n-1)} \sum _{i < j} a_{ij} \end{aligned}$$

(1.5)

where $a_{ij} \in \{0,1\}$ denote the entries of the adjacency matrix A.

Theorem 1.1

(Community detection in classical stochastic block model) Let $\varepsilon \in (0,1)$ and $n \ge 10^4 \varepsilon ^{-2}$. Let A be the adjacency matrix of the random graph drawn from the stochastic block model G(n, p, q) with $\max \{p(1-p), q(1-q)\} \ge \frac{20}{n}$. Assume that $p=\frac{a}{n} > q=\frac{b}{n}$, and

$$\begin{aligned} (a-b)^2 \ge 1 0^4 \, \varepsilon ^{-2} (a+b). \end{aligned}$$

(1.6)

Let $\widehat{Z}$ be a solution of the semidefinite program (1.4). Then, with probability at least $1- e^3 5^{-n}$, we have

$$\begin{aligned} \Vert \widehat{Z}- \bar{x}\bar{x}^\mathsf {T}\Vert _2^2 \le \varepsilon n^2 = \varepsilon \Vert \bar{x}\bar{x}^\mathsf {T}\Vert _2^2. \end{aligned}$$

(1.7)

Here and in the rest of this paper, $\Vert \cdot \Vert _2$ denotes the Frobenius norm of matrices and the Euclidean norm of vectors.

Once we have estimated the rank-one matrix $\bar{x} \bar{x}^\mathsf {T}$ using Theorem 1.1, we can also estimate the community membership vector $\bar{x}$ itself in a standard way, namely by computing the leading eigenvector.

Corollary 1.2

(Community detection with o(n) misclassified vertices) In the setting of Theorem 1.1, let $\widehat{x}$ denote an eigenvector of $\widehat{Z}$ corresponding to the largest eigenvalue, and with $\Vert \widehat{x}\Vert _2 = \sqrt{n}$. Then

$$\begin{aligned} \min _{\alpha = \pm 1} \Vert \alpha \widehat{x}- \bar{x}\Vert _2^2 \le \varepsilon n = \varepsilon \Vert \bar{x}\Vert _2^2. \end{aligned}$$

In particular, the signs of the coefficients of $\widehat{x}$ correctly estimate the partition of the vertices into the two communities, up to at most $\varepsilon n$ misclassified vertices.

As we will discuss in Sect. 1.4.2 in more detail, there are previously known algorithms for recovery of two communities under conditions similar to (1.6). These include a spectral clustering algorithm based on truncating the high degree vertices (whose analysis can be derived from [31, 32]), combinatorial algorithms of [54, 59] based on path counting, and an algorithm [52] based on belief propagation, which minimizes the fraction of misclassified vertices.

An array of simple semidefinite programs like (1.4) and (1.9) has been proposed in networks community. Such programs have been analyzed for relatively dense graphs; see [9] for a review. It has been unknown if they could succeed for totally sparse graphs, where the expected degree is of constant order. Theorem 1.1 provides a positive answer to this question. Moreover, the method of this paper is flexible enough to analyze many semidefinite programs, and it can be applied for more general models of sparse networks than any previous results. To illustrate this point, we will now choose a different semidefinite program and show that it succeeds for a large class of stochastic models of networks.

1.3 Community detection: general stochastic block models

Let us describe a model of networks where one can have multiple communities of arbitrary sizes, arbitrarily many outliers, and unequal edge probabilities.

To define such general stochastic block model, we assume that the set of vertices $\{1,\ldots ,n\}$ is partitioned into communities $\mathcal {C}_1,\ldots ,\mathcal {C}_K$ of arbitrary sizes. We do not restrict the sizes of the communities, so in particular this model can automatically handle outliers, the vertices that form communities of size 1. For each pair of distinct vertices (i, j), we draw an edge between i and j independently and with certain fixed probability $p_{ij}$. For convenience we include the loops like in the classical stochastic block model, so $p_{ii} = 1$. To promote more edges within than across the communities, we assume that there exist numbers $p > q$ (thresholds) such that

$$\begin{aligned} \begin{aligned}&p_{ij} \ge p \quad \text {if}\, i\, \text {and}\, j\, \text {belong to the same community};\\&p_{ij} \le q \quad \text {if}\, i\, \text {and}\, j\, \text {belong to different communities}. \end{aligned} \end{aligned}$$

(1.8)

The community structure of such a network is captured by the cluster matrix matrix $\bar{Z} \in \{0,1\}^{n \times n}$ defined as

$$\begin{aligned} \bar{Z}_{ij} = {\left\{ \begin{array}{ll} 1 &{} \text {if}\, i\, \text {and}\, j\, \text {belong to the same community}; \\ 0 &{} \text {if}\, i\, \text {and}\, j\, \text {belong to different communities}. \end{array}\right. } \end{aligned}$$

(1.9)

We will estimate $\bar{Z}$ using the following semidefinite optimization program:

$$\begin{aligned} \begin{aligned}&\text {maximize}\, \langle A,Z\rangle \\&\text {subject to}\, Z \succeq 0, \; Z \ge 0, \; \hbox {diag}(Z) \preceq \mathbf{I }_n, \; \textstyle {\sum _{i,j=1}^n Z_{ij} = \lambda }. \end{aligned} \end{aligned}$$

(1.10)

Here as usual $Z \succeq 0$ means that Z is positive semidefinite, and $Z \ge 0$ means that all entries of Z are non-negative. We choose the value of $\lambda $ to be the number of elements in the cluster matrix, that is

$$\begin{aligned} \lambda = \sum _{i,j=1}^n \bar{Z}_{ij} = \sum _{k=1}^K |\mathcal {C}_k|^2. \end{aligned}$$

(1.11)

If all communities have the same size s, then $\lambda = K s^2 = n s$.

Theorem 1.3

(Community detection in general stochastic block model) Let $\varepsilon \in (0,1)$. Let A be the adjacency matrix of the random graph drawn from the general stochastic block model described above. Denote by $\bar{p}$ the expected variance of the edges, that is $\bar{p}= \frac{2}{n(n-1)} \sum _{i<j} p_{ij} (1 - p_{ij})$. Assume that $p=\frac{a}{n} > q=\frac{b}{n}$, $\bar{p}= \frac{g}{n}$, $g \ge 9$ and

$$\begin{aligned} (a-b)^2 \ge 484\, \varepsilon ^{-2} g. \end{aligned}$$

(1.12)

Let $\widehat{Z}$ be a solution of the semidefinite program (1.10). Then, with probability at least $1-e^3 5^{-n}$, we have

$$\begin{aligned} \Vert \widehat{Z}- \bar{Z}\Vert _2^2 \le \Vert \widehat{Z}- \bar{Z}\Vert _1 \le \varepsilon n^2. \end{aligned}$$

(1.13)

Here as usual $\Vert \cdot \Vert _2$ denotes the Frobenius norm of matrices, and $\Vert \cdot \Vert _1$ denotes the $\ell _1$ norm of the matrices considered as vectors, that is $\Vert (a_{ij})\Vert _1 = \sum _{i,j} |a_{ij}|$.

Remark 1.4

(General community structure) The power of Theorem 1.3 does not depend on the community structure, i.e. on the number and sizes of the communities. This seemingly surprising observation can be explained by the fact that small communities, those with sizes o(n), can get absorbed in the error term in (1.13), so they will not be recovered.

Remark 1.5

(If the sizes of communities are not known) Our choice of the parameter $\lambda $ in (1.11) assumes that we know the sizes of the communities. What if they are not known? From the proof of Theorem 1.3 it will be clear what happens when $\lambda >0$ is chosen arbitrarily. Assume that we choose $\lambda $ so that $\lambda \le \lambda _0 := \sum _k |\mathcal {C}_k|^2$. Then instead of estimating the full cluster graph (described in Remark 1.6), the solution $\widehat{Z}$ will only estimate a certain subgraph of the cluster graph, which may miss at most $\lambda _0 - \lambda $ edges. On the other hand, if we choose $\lambda $ so that $\lambda \ge \lambda _0$, then the solution $\widehat{Z}$ will estimate a certain supergraph of the cluster graph, which may have at most $\lambda - \lambda _0$ extra edges. In either case, such solution could be meaningful in practice.

Remark 1.6

(Cluster graph) It may be convenient to view the cluster matrix $\bar{Z}$ as the adjacency matrix of the cluster graph, in which all vertices within each community are connected and there are no connections across the communities. This way, the semidefinite program (1.10) takes a sparse graph as an input, and it returns an estimate of the cluster graph as an output. The effect of the program is thus to “densify” the network inside the communities and “sparsify” it across the communities.

Remark 1.7

(Other semidefinite programs) There is nothing special about the semidefinite programs (1.4) and (1.10). For example, one can tighten the constraints and instead of $\hbox {diag}(Z) \preceq \mathbf{I }_n$ require that $\hbox {diag}(Z) = \mathbf{I }_n$ in both programs. Similarly, instead of placing in (1.10) the constraint on the sum of all entries of Z, one can place constraints on the sums of each row. In a similar fashion, one should be able to analyze other semidefinite relaxations, both new and those proposed in the previous literature on community detection, see [9].

For one more illustration for the method described here, we refer the reader to Section 7 of the extended version of this paper [36]. There we consider a minor modification of the semidefinite program (1.4), and we show that it succeeds in presence of multiple communities of equal sizes (the so-called balanced planted partition model). The sufficient condition for that is $(a-b)^2 \ge 50^2 \varepsilon ^{-2} (a + b(K-1))$ where K is the number of communities, s the size of the communities and $p=a/s$, $q=b/s$.

1.4 Related work

Community detection in stochastic block models is a fundamental problem that has been extensively studied in theoretical computer science and statistics. A plethora or algorithmic approaches have been proposed, in particular those based on combinatorial techniques [18, 30], spectral clustering [4, 5, 14, 22, 38, 43, 50, 56, 58, 62, 63], likelihood maximization [8, 10, 64], variational methods [3, 11, 20], Markov chain Monte Carlo [29, 64], belief propagation [29], and convex optimization including semidefinite programming [2, 7, 9, 18, 19, 23–25, 37, 60].

1.4.1 Relatively dense networks: average degrees are $\Omega (\log n)$

Most known rigorous results on community detection are proved for relatively dense networks whose expected degrees go to infinity with n. If the degrees grow no slower than $\log n$, it may be possible to recover the community structure perfectly, without any misclassified vertices. A variety of community detection methods are known to succeed in this regime, including those based on spectral clustering, likelihood maximization and convex optimization mentioned above; see e.g. [18, 50] and the references therein.

The semidefinite programs (1.4) and (1.10) are similar to those proposed in the recent literature, most notably in [9, 18, 19, 23, 25]. The semidefinite relaxations discussed in [19, 25] can perfectly recover the community structure if $(a-b)^2 \ge C (a \log n+b)$ for a sufficiently large constant C; see [9] for a review of these results.

1.4.2 Totally sparse networks: bounded average degrees

The problem becomes more difficult for sparser networks, whose expected average degrees grow to infinity arbitrarily slowly or even remain bounded in n. Although studying such networks is well motivated from the practical perspective [45, 65], little has been known on the theoretical level.

If the degrees grow slower than $\log n$, it is impossible to correctly classify all vertices, since with high probability a positive fraction of the vertices will be isolated. Still, the fraction of isolated vertices tends to zero with n, so we can hope to correctly classify a majority of the vertices in this regime.

The spectral method developed by J. Kahn and E. Szemeredi for random regular graphs [32] can be adapted for Erdös-Rényi random graphs [5, 31] and, more generally, for the stochastic block model $G(n, \frac{a}{n}, \frac{b}{n})$. If one truncates the graph by removing all vertices with too large degrees (say, larger than $10(a+b)$), then the argument of [31, 32] can be adapted to conclude that with some positive probability, the truncated adjacency matrix concentrates near its expectation in the spectral norm. The communities can then be approximately recovered using the spectral clustering, which is based on the signs of the coefficients of the second eigenvector. Working out the details, one finds that a sufficient condition for this method to succeed is similar to (1.6), that is

$$\begin{aligned} (a-b)^2 \ge C_\varepsilon (a+b) \end{aligned}$$

(1.14)

where $C_\varepsilon $ depends only on the desired accuracy $\varepsilon $ or recovery. However, for real networks it is usually impractical to remove high degree vertices and the probabilistic estimate from [31] is not sharp.

A. Coja-Oghlan [27] proposed a different, complicated adaptive spectral algorithm that can approximately recover communities under the condition $(a-b)^2 \ge C_\varepsilon (a+b) \log (a+b)$. Recently, L. Massoulié [59] and E. Mossel, J. Neeman and A. Sly [54] came up with combinatorial algorithms based on path counting, which can approximately recover communities under the condition (1.14). These results are stated in the asymptotic regime for $n \rightarrow \infty $ and without explicit dependence of $C_\varepsilon $ on the desired accuracy $\varepsilon $. Furthermore, E. Mossel, J. Neeman and A. Sly developed an algorithm based on belief propagation [52], which minimizes the fraction of misclassified vertices.

Condition (1.14) has the optimal form. Indeed, it was shown in [55] that the lower bound (1.14) is required for any algorithm to be able to recover communities with at most $\varepsilon n$ misclassified vertices, where $C_\varepsilon \rightarrow \infty $ as $\varepsilon \rightarrow 0$. A conjecture of A. Decelle, F. Krzakala, C. Moore and L. Zdeborova proved recently by E. Mossel, J. Neeman and A. Sly [53, 54] and Massouile [59] states that one can find a partition correlated with the true community partition (i.e. with the fraction of misclassified vertices bounded away from $50~\%$ as $n \rightarrow \infty $) if $(a-b)^2 \ge C(a+b)$ with some constant $C>2$. Moreover, this result achieves information-theoretic limit: no algorithm can succeed if $C \le 2$.

It remains an open question whether semidefinite programing can achieve similar information-theoretic limits. Theorem 1.1 does not achieve them; addressing this problem will require to tighten the absolute constant and the dependence on $\varepsilon $ in (1.6).

1.4.3 The new results in historical perspective

A variety of simple semidefinite programs like (1.4) and (1.9) have been proposed in the network literature. Such programs have been analyzed only for dense networks where the degrees grow as $\Omega (\log n)$ in which case perfect community detection is possible. The present paper shows that the same semidefinite programs succeed for totally sparse networks as well, producing a small number of misclassified vertices; moreover the sufficient condition (1.14) is optimal up to an absolute constant.

Furthermore, the method of the present paper generalizes smoothly for a broad classes of sparse networks. We saw in Sect. 1.3 that semidefinite programming succeeds for networks with variable edge probabilities $p_{ij}$; community detection in such networks seems to be out of reach for known spectral methods.

We also saw how networks with multiple communities be handled with semidefinite approach. This has been studied in the statistical literature before; the semidefinite relaxations proposed in [9, 19, 23, 25] were designed for multiple communities and outliers. However, previous theoretical results for multiple communities were only available for dense regime where the degrees grow as $\Omega (\log n)$, in which case perfect community detection is possible.

1.4.4 Follow up work

After this paper had been submitted, several new results appeared on community detection in stochastic block models. We will mention here only results that apply for totally sparse networks. The initial discovery of [54, 59] mentioned in Sect. 1.4.2 was followed by the work [16]. Semidefinite programs on random graphs were further analyzed in [51] using higher-rank Grothendieck inequalities and insights from mathematical physics. Stochastic block models with labeled edges were addressed in [44] using truncated spectral clustering (with high degree vertices removed, based on [31]) and semidefinite programming (whose analysis is based on the method of the present paper). A two-stage algorithm based on truncated spectral clustering and swapping vertices (like e.g. in [55]) was analyzed in [26]; the swapping stage leads to the sufficient condition (1.14) with with an optimal dependence on the accuracy, $C_\varepsilon \sim \log (1/\varepsilon )$. A different combinatorial method was proposed and analyzed in [1]; regularized spectral clustering was shown to succeed in [46, 47]; and a computationally feasible likelihood-based algorithm that minimizes the risk for misclassification proportion was found in [33]. Some of the mentioned work can be used for networks with multiple communities, see [1, 26, 33, 46, 47].

1.5 Plan of the paper

We discuss the method in general terms in Sect. 2. We explain how Grothendieck’s inequality can be used to show tightness of various semidefinite programs on random graphs. Section 3 is devoted to Grothendieck’s inequality and its implications for semidefinite programming. In Sect. 4 we prove a simple concentration inequality for random matrices in the cut norm. In Sect. 5 we specialize to the community detection problem for the classical stochastic block model, and we prove Theorem 1.1 and Corollary 1.2 there. In Sect. 6 we consider the general classical stochastic block model, and we prove Theorem 1.3 there.

2 Semidefinite optimization on random graphs: the method in a nutshell

In this section we explain the general method of this paper, which can be applied to a variety of optimization problems. To be specific, let us return to the problem we described in Sect. 1.1, which is to estimate the solution $\bar{x}$ of the optimization problem (1.1) from a single observation of the random matrix A. We suggested there to approximate $\bar{x}$ by the solution of the (random) program (1.2), which we can rewrite as follows:

$$\begin{aligned} \text {maximize } \langle A,x x^\mathsf {T}\rangle \quad \text {subject to} \quad x \in \{-1,1\}^n. \end{aligned}$$

(2.1)

Note that if we maximized $\langle A,x x^\mathsf {T}\rangle $ over the Euclidean ball $B(0,\sqrt{n})$, then the problem would be simple – the solution x would be the eigenvector corresponding to the eigenvalue of A of largest magnitude. This simpler problem underlies the most basic algorithm for community detection called spectral clustering, where the communities are recovered based on the signs of an eigenvector of the adjacency matrix (going back to [14, 39, 50], see [63]). The optimization problem (2.1) is harder and more subtle; the replacement of the Euclidean ball by the cube introduces a strong restriction on the coordinates of x. This restruction rules out localized solutions x where most of the mass of x is concentrated on a small fraction of coordinates. Since eigenvectors of sparse matrices tend to be localized (see [15]), basic spectral clustering is often unsuccessful for sparse networks.

Let us choose a convex subset $\mathcal {M}_{\mathrm {opt}}$ of the set of positive semidefinite matrices whose all entries are bounded by 1 in absolute value. (For now, it can be any subset.) Note that $x x^\mathsf {T}$ appearing in (2.1) are examples of such matrices. We consider the following semidefinite relaxation of (2.1):

$$\begin{aligned} \text {maximize}\, \langle A,Z\rangle \quad \text {subject to} \quad Z \in \mathcal {M}_{\mathrm {opt}}. \end{aligned}$$

(2.2)

We might hope that the solution $\widehat{Z}$ of this program would enable us to estimate the solution $\bar{x}$ of (1.1).

To realize this hope, one needs to check a few things, which may or may not be true depending on the application. First, one needs to design the feasible set $\mathcal {M}_{\mathrm {opt}}$ in such a way that the semidefinite relaxation of the expected problem (1.1) is tight. This means that the solution $\bar{Z}$ of the program

$$\begin{aligned} \text {maximize}\, \langle \bar{A},Z\rangle \quad \text {subject to} \quad Z \in \mathcal {M}_{\mathrm {opt}}\end{aligned}$$

(2.3)

satisfies

$$\begin{aligned} \bar{Z} = \bar{x} \bar{x}^\mathsf {T}. \end{aligned}$$

(2.4)

This condition can be arranged for in various applications. In particular, this is the case in the setting of Theorem 1.1; we show this in Lemma 5.1.

Second, one needs a uniform deviation inequality, which would guarantee with high probability that

$$\begin{aligned} \max _{x,y \in \{-1,1\}^n} |\langle A-\bar{A},xy^\mathsf {T}\rangle | \le \varepsilon . \end{aligned}$$

(2.5)

This can often be proved by applying standard deviation inequalities for a fixed pair (x, y), followed by a union bound over all such pairs. We prove such a deviation inequality in Sect. 4.

Now we make the crucial step, which is an application of Grothendieck’s inequality. A reformulation of this remarkable inequality, which we explain in Sect. 3, states that (2.5) automatically implies that

$$\begin{aligned} \max _{Z \in \mathcal {M}_{\mathrm {opt}}} |\langle A-\bar{A},Z\rangle | \le C\varepsilon . \end{aligned}$$

(2.6)

This will allow us to conclude that the solution $\widehat{Z}$ of (2.2) approximates the solution $\bar{Z}$ of (2.3). To see this, let us compare the value of the expected objective function $\langle \bar{A},Z\rangle $ at these two vectors. We have

$$\begin{aligned} \langle \bar{A},\widehat{Z}\rangle&\ge \langle A,\widehat{Z}\rangle - C\varepsilon \quad \text {(replacing}\, \bar{A}\, \text {by}\, A\, \text {using (2.6))} \nonumber \\&\ge \langle A,\bar{Z}\rangle - C\varepsilon \quad \text {(since}\, \widehat{Z}\,\text {is the maximizer in (2.2))} \nonumber \\&\ge \langle \bar{A},\bar{Z}\rangle - 2C\varepsilon \quad \text {(replacing}\, A\, \text {by}\, \bar{A}\, \text {back using (2.6)).} \end{aligned}$$

(2.7)

This means that $\widehat{Z}$ almost maximizes the objective function $\langle \bar{A},Z\rangle $ in (2.3).

The final piece of information we require is that the expected objective function $\langle \bar{A},Z\rangle $ distinguishes points near its maximizer $\bar{Z}$. This would allow one to automatically conclude from (2.7) that the almost maximizer $\widehat{Z}$ is close to the true maximizer, i.e. that

$$\begin{aligned} \Vert \widehat{Z}- \bar{Z}\Vert \le \text {something small} \end{aligned}$$

(2.8)

where $\Vert \cdot \Vert $ can be the Frobenius or operator norm. Intuitively, the requirement that the objective function distinguishes points amounts to a non-trivial curvature of the feasible set $\mathcal {M}_{\mathrm {opt}}$ at the maximizer $\bar{Z}$. In many situations, this property is easy to verify. In the setting of Theorems 1.1 and 1.3, we check it in Lemma 5.2 and Lemmas 6.2–6.3 respectively.

Finally, we can recall from (2.4) that $\bar{Z} = \bar{x} \bar{x}^\mathsf {T}$. Together with (2.8), this yields that $\widehat{Z}$ is approximately a rank-one matrix, and its leading eigenvector $\widehat{x}$ satisfies

$$\begin{aligned} \Vert \widehat{x}- \bar{x}\Vert _2 \le \text {something small}. \end{aligned}$$

Thus we estimated the solution $\bar{x}$ of the problem (1.1) as desired.

Remark 2.1

(General semidefinite programs) For this method to work, it is not crucial that the semidefinite program be a relaxation of any vector optimization problem. Indeed, one can analyze semidefinite programs of the type (2.2) without any vector optimization problem (2.1) in the background. In such cases, the requirement (2.4) of tightness of relaxation can be dropped. The solution $\bar{Z}$ may itself be informative. An example of such situation is Theorem 1.3 where the community membership matrix $\bar{Z}$ is important by itself. However, $\bar{Z}$ can not be represented as $\bar{x} \bar{x}^\mathsf {T}$ for any $\bar{x}$, since $\bar{Z}$ is not a rank one matrix.

3 Grothendieck’s inequality and semidefinite programming

Grothendieck’s inequality is a remarkable result proved originally in the functional analytic context [35] and reformulated in [48] in the form we are going to describe below. This inequality had found applications in several areas [41, 61]. It has already been used to analyze semidefinite relaxations of hard combinatorial optimization problems [6, 57], although previous relaxations lead to constant (rather than arbitrary) accuracy.

Theorem 3.1

(Grothendieck’s inequality) Consider an $n \times n$ matrix of real numbers $B = (b_{ij})$. Assume that

$$\begin{aligned} \Big | \sum _{i,j} b_{ij} s_i t_j \Big | \le 1 \end{aligned}$$

for all numbers $s_i, t_i \in \{-1,1\}$. Then

$$\begin{aligned} \Big | \sum _{i,j} b_{ij} \langle X_i,Y_j\rangle \Big | \le K_\mathrm {G}\end{aligned}$$

for all vectors $X_i,Y_i \in B_2^n$.

Here $B_2^n = \{ x \in \mathbb {R}^n : \Vert x\Vert _2 \le 1\}$ is the unit ball for the Euclidean norm, and $K_\mathrm {G}$ is an absolute constant referred to as Grothendieck’s constant. The best value of $K_\mathrm {G}$ is still unknown, and the best known bound [17] is

$$\begin{aligned} K_\mathrm {G}< \frac{\pi }{2 \ln (1+\sqrt{2})} \le 1.783. \end{aligned}$$

(3.1)

3.1 Grothendieck’s inequality in matrix form

To restate Grothendieck’s inequality in a matrix form, let us assume for simplicity that $m=n$ and observe that $\sum _{i,j} b_{ij} s_i t_j = \langle B,s t^\mathsf {T}\rangle $ where s and t are the vectors in $\mathbb {R}^n$ with coordinates $s_i$ and $t_j$ respectively. Similarly, $\sum _{i,j} b_{ij} \langle X_i,Y_j\rangle = \langle B,X Y^\mathsf {T}\rangle $ where X and Y are the $n \times n$ matrices with rows $X_i^\mathsf {T}$ and $Y_j^\mathsf {T}$ respectively. This motivates us to consider the following two sets of matrices:

$$\begin{aligned} \mathcal {M}_1 := \left\{ s t^\mathsf {T}:\; s, t \in \{-1,1\}^n \right\} , \quad \mathcal {M}_\mathrm {G}:= \left\{ XY^\mathsf {T}:\; \text {all rows}\, X_i, Y_j \in B_2^n \right\} . \end{aligned}$$

Clearly, $\mathcal {M}_1 \subset \mathcal {M}_\mathrm {G}$. Grothendieck’s inequality can be stated as follows:

$$\begin{aligned} \forall B \in \mathbb {R}^{n \times n}, \quad \max _{Z \in \mathcal {M}_\mathrm {G}} \left| \langle B,Z\rangle \right| \le K_\mathrm {G}\max _{Z \in \mathcal {M}_1} \left| \langle B,Z\rangle \right| . \end{aligned}$$

(3.2)

We can view this inequality as a relation between two matrix norms. The right side of (3.2) defines the $\ell _\infty \rightarrow \ell _1$ norm of $B = (b_{ij})$, which is

$$\begin{aligned} \Vert B\Vert _{\infty \rightarrow 1}&= \max _{\Vert s\Vert _\infty \le 1} \Vert Bs\Vert _1 = \max _{s,t \in \{-1,1\}^n} \langle B,s t^\mathsf {T}\rangle = \max _{s,t \in \{-1,1\}^n} \sum _{i,j=1}^n b_{ij} s_i t_j \nonumber \\&= \max _{Z \in \mathcal {M}_1} \left| \langle B,Z\rangle \right| . \end{aligned}$$

(3.3)

We note in passing that this norm is equivalent to the so-called cut norm, whose importance in algorithmic problems is well understood in theoretical computer science community, see e.g. [6, 41].

Let us restrict our attention to the part of Grothendieck’s set $\mathcal {M}_\mathrm {G}$ consisting of positive semidefinite matrices. To do so, we consider the following set of $n \times n$ matrices:

$$\begin{aligned} \mathcal {M}_\mathrm {G}^+ := \left\{ Z:\; Z \succeq 0, \; \hbox {diag}(Z) \preceq \mathbf{I }_n \right\} \subset \mathcal {M}_\mathrm {G}\subset [-1,1]^{n \times n}. \end{aligned}$$

(3.4)

To check the first inclusion in (3.4), let $Z \in \mathcal {M}_\mathrm {G}^+$. Since $Z \succeq 0$, there exists a matrix X such that $Z = X^2$. The rows $X_i^\mathsf {T}$ of X satisfy $ \Vert X_i\Vert _2^2 = \langle X_i,X_i\rangle = (X^\mathsf {T}X)_{ii} = Z_{ii} \le 1, $ where the last inequality follows from the assumption $\hbox {diag}(Z) \preceq \mathbf{I }_n$. Choosing $Y=X$ in the definition of $\mathcal {M}_\mathrm {G}$, we conclude that $Z \in \mathcal {M}_\mathrm {G}$. To check the second inclusion in (3.4), note that for every matrix $X Y^\mathsf {T}\in \mathcal {M}_\mathrm {G}$, we have $(XY^\mathsf {T})_{ij} = \langle X_i,Y_j\rangle \le \Vert X_i\Vert _2 \; \Vert Y_j\Vert _2 \le 1.$

Combining (3.2) with (3.4) and the identity (3.3), we obtain the following form of Grothendieck inequality for positive semidefinite matrices.

Fact 3.2

(Grothendieck’s inequality, PSD) Every matrix $B \in \mathbb {R}^{n \times n}$ satisfies

$$\begin{aligned} \max _{Z \in \mathcal {M}_\mathrm {G}^+} \left| \langle B,Z\rangle \right| \le K_\mathrm {G}\, \Vert B\Vert _{\infty \rightarrow 1}. \end{aligned}$$

3.2 Semidefinite programming

To keep the discussion sufficiently general, let us consider the following class of optimization programs:

$$\begin{aligned} \text {maximize}\, \langle B,Z\rangle \quad \text {subject to} \quad Z \in \mathcal {M}_{\mathrm {opt}}. \end{aligned}$$

(3.5)

Here $\mathcal {M}_{\mathrm {opt}}$ can be any subset of the Grothendieck’s set $\mathcal {M}_\mathrm {G}^+$ defined in (3.4). A good example is where B is the adjacency matrix of a random graph, possibly dilated by a constant matrix. For example, the semidefinite program (1.4) is of the form (3.5) with $\mathcal {M}_{\mathrm {opt}}= \mathcal {M}_\mathrm {G}^+$ and $B = A - \lambda E_n$.

Imagine that there is a similar but simpler problem where B is replaced by a certain reference matrix R, that is

$$\begin{aligned} \text {maximize}\, \langle R,Z\rangle \quad \text {subject to} \quad Z \in \mathcal {M}_{\mathrm {opt}}. \end{aligned}$$

(3.6)

A good example is where B is a random matrix and $R = \mathbb {E}B$; this will be the case in the proof of Theorem 1.3. Let $\widehat{Z}$ and $Z_R$ be the solutions of the original problem (3.5) and the reference problem (3.6) respectively, thus

$$\begin{aligned} \widehat{Z}:= \arg \max _{Z \in \mathcal {M}_{\mathrm {opt}}} \langle B,Z\rangle , \quad Z_R := \arg \max _{Z \in \mathcal {M}_{\mathrm {opt}}} \langle R,Z\rangle . \end{aligned}$$

The next lemma shows that $\widehat{Z}$ provides an almost optimal solution to the reference problem if the original and reference matrices B and R are close.

Lemma 3.3

($\widehat{Z}$ almost maximizes the reference objective function) We have

$$\begin{aligned} \langle R,Z_R\rangle - 2 K_\mathrm {G}\Vert B-R\Vert _{\infty \rightarrow 1} \le \langle R,\widehat{Z}\rangle \le \langle R,Z_R\rangle . \end{aligned}$$

(3.7)

Proof

The upper bound is trivial by definition of $Z_R$. The lower bound is based on fact 3.2, which implies that for every $Z \in \mathcal {M}_{\mathrm {opt}}$, one has

$$\begin{aligned} |\langle B-R,Z\rangle | \le K_\mathrm {G}\Vert B-R\Vert _{\infty \rightarrow 1} =: \varepsilon . \end{aligned}$$

(3.8)

Now, to prove the lower bound in (3.7), we will first replace R by B using (3.8), then replace $\widehat{Z}$ by $Z_R$ using the fact that $\widehat{Z}$ is a maximizer for $\langle B,Z\rangle $, and finally replace back B by R using (3.8) again. This way we obtain

$$\begin{aligned} \langle R,\widehat{Z}\rangle \ge \langle B,\widehat{Z}\rangle - \varepsilon \ge \langle B,Z_R\rangle - \varepsilon \ge \langle R,Z_R\rangle - 2\varepsilon . \end{aligned}$$

This completes the proof of Lemma 3.3. $\square $

4 Deviation in the cut norm

To be able to effectively use Lemma 3.3, we will now show how to bound the cut norm of random matrices.

Lemma 4.1

(Deviation in $\ell _\infty \rightarrow \ell _1$ norm) Let $A = (a_{ij}) \in \mathbb {R}^{n \times n}$ be a symmetric matrix whose diagonal entries equal 1, whose entries above the diagonal are independent random variables satisfying $0 \le a_{ij} \le 1$. Assume that

$$\begin{aligned} \bar{p}:= \frac{2}{n(n-1)} \sum _{i<j} \hbox {Var}(a_{ij})\ge \frac{9}{n}. \end{aligned}$$

(4.1)

Then, with probability at least $1- e^3 5^{-n}$, we have

$$\begin{aligned} \Vert A - \mathbb {E}A\Vert _{\infty \rightarrow 1} \le 3 \, \bar{p}^{1/2} n^{3/2}. \end{aligned}$$

We will shortly deduce Lemma 4.1 from Bernstein’s inequality followed by a union bound over $x,y \in \{-1,1\}^n$; arguments of this type are standard in the analysis of random graphs (see e.g. [12, Section 2.3]). But before we do this, let us pause to explain the conclusion of Lemma 4.1.

Remark 4.2

(Regularization effect of $\ell _\infty \rightarrow \ell _1$ norm) Let us test Lemma 4.1 on the simple example where A is the adjacency matrix of a sparse Erdös-Renyi random graph G(n, p) with $p=a/n$, $a \ge 1$. Here we have $\bar{p}= p(1-p) \le p = a/n$. Lemma 4.1 states that $\Vert A - \mathbb {E}A\Vert _{\infty \rightarrow 1} \le 3 a^{1/2} n$. This can be compared with $\Vert \mathbb {E}A\Vert _{\infty \rightarrow 1} = (1 + p(n-1)) n \ge an$. So we obtain

$$\begin{aligned} \Vert A - \mathbb {E}A\Vert _{\infty \rightarrow 1} \le 3 a^{-1/2} \, \Vert \mathbb {E}A\Vert _{\infty \rightarrow 1}. \end{aligned}$$

This deviation inequality is good when a exceeds a sufficiently large absolute constant. Since that $a = pn$ is the expected average degree of the graph, it follows that we can handle graphs with bounded expected degrees.

This is a good place to note the importance of the $\ell _\infty \rightarrow \ell _1$ norm. Indeed, for the spectral norm a similar concentration inequality would fail. As is well known and easy to check, for $a =O(1)$ one would have $\Vert A - \mathbb {E}A\Vert \gg \Vert \mathbb {E}A\Vert $ due to contributions from high degree vertices. In fact, those are the only obstructions to concentration. Indeed, according to a result of U. Feige and E. Ofek [31], the removal of high-degree vertices forces a non-trivial concentration inequality to hold in the spectral norm. In contrast to this, the $\ell _\infty \rightarrow \ell _1$ norm does not feel the vertices with high degrees. It has an automatic regularization effect, which averages the contributions of all vertices, and in particular the few high degree vertices.

The proof of Lemma 4.1 will be based on Bernstein’s inequality, which we quote here (see, for example, Theorem 1.2.6 in [21]).

Theorem 4.3

(Bernstein’s inequality) Let $Y_1,\ldots ,Y_N$ be independent random variables such that $\mathbb {E}Y_k = 0$ and $|Y_k| \le M$. Denote $\sigma ^2 = \frac{1}{N} \sum _{k=1}^N \hbox {Var}(Y_k)$. Then for any $t \ge 0$, one has

$$\begin{aligned} \mathbb {P}_{} \left\{ \frac{1}{N} \sum _{k=1}^N Y_k > t \right\} \le \exp \left( - \frac{N t^2/2}{\sigma ^2 + Mt/3} \right) . \end{aligned}$$

Proof of Lemma 4.1

Recalling the definition (3.3) of the $\ell _\infty \rightarrow \ell _1$ norm, we see that we need to bound

$$\begin{aligned} \Vert A - \mathbb {E}A\Vert _{\infty \rightarrow 1} = \max _{x,y \in \{-1,1\}^n} \sum _{i,j=1}^n (a_{ij} - \mathbb {E}a_{ij}) x_i y_j. \end{aligned}$$

(4.2)

Let us fix $x,y \in \{-1,1\}^n$. Using the symmetry of $A - \mathbb {E}A$, the fact that diagonal entries of $A - \mathbb {E}A$ vanish and collecting the identical terms, we can express the sum in (4.2) as a sum of independent random variables

$$\begin{aligned} \sum _{i < j} X_{ij}, \quad \text {where} \quad X_{ij} = 2(a_{ij} - \mathbb {E}a_{ij}) x_i y_j. \end{aligned}$$

To control the sum $\sum _{i < j} X_{ij}$ we can use Bernstein’s inequality, Theorem 4.3. There are $N = \frac{n(n-1)}{2}$ terms in this sum. Since $|x_i| = |y_i| = 1$ for all i, the average variance $\sigma ^2$ of all terms $X_{ij}$ is at most $2^2$ times the average variance of all $a_{ij}$, which is $\bar{p}$. In other words, $\sigma ^2 \le 4 \bar{p}$. Furthermore, $|X_{ij}| \le 2 |a_{ij} - \mathbb {E}a_{ij}| \le 2$ since $0 \le a_{ij} \le 1$ by assumption. Hence $M \le 2$. It follows that

$$\begin{aligned} \mathbb {P}_{} \left\{ \frac{1}{N} \sum _{i < j} X_{ij} > t \right\} \le \exp \left( - \frac{N t^2/2}{4 \bar{p}+ 2t/3} \right) . \end{aligned}$$

(4.3)

Let us substitute $t = 6 \, (\bar{p}/n)^{1/2}$ here. Rearranging the terms and using that $N = \frac{n(n-1)}{2}$ and $\bar{p}> 9/n$ (so that $t < 2 \bar{p}$), we conclude that the probability in (4.3) is bounded by $\exp (-3(n -1))$.

Summarizing, we have proved that for every ${x,y \in \{-1,1\}^n}$

$$\begin{aligned} \mathbb {P}_{} \left\{ \frac{2}{n(n-1)} \sum _{i,j=1}^n (a_{ij} - \mathbb {E}a_{ij}) x_i y_j > 6 \Big ( \frac{\bar{p}}{n} \Big )^{1/2} \right\} \le e^{-3(n-1)}. \end{aligned}$$

Taking a union bound over all $2^{2n}$ pairs (x, y), we conclude that

$$\begin{aligned} \mathbb {P}_{} \left\{ \max _{x,y \in \{-1,1\}^n} \frac{2}{n(n-1)} \sum _{i,j=1}^n (a_{ij} - \mathbb {E}a_{ij}) x_i y_j > 6 \Big ( \frac{\bar{p}}{n} \Big )^{1/2} \right\}&\le 2^{2n} \cdot e^{-3(n-1)} \\&\le e^3 \cdot 5^{-n}. \end{aligned}$$

Rearranging the terms and using the definition (4.2) of the $\ell _\infty \rightarrow \ell _1$ norm, we conclude the proof of Lemma 4.1. $\square $

Remark 4.4

(The sum of entries) Note that by definition, the quantity $\big | \sum _{i,j=1}^n (a_{ij} - \mathbb {E}a_{ij}) \big |$ is bounded by $\Vert A - \mathbb {E}A\Vert _{\infty \rightarrow 1}$, and thus it can be controlled by Lemma 4.1. Alternatively, a bound on this quantity follows directly from the last line of the proof of Lemma 4.1. For a future reference, we express it in the following way:

$$\begin{aligned} \frac{2}{n(n-1)} \Big | \sum _{i < j} (a_{ij} - \mathbb {E}a_{ij}) \Big | \le 3 \, \bar{p}^{1/2} n^{-1/2}. \end{aligned}$$

5 Stochastic block model: proof of Theorem 1.1

So far our discussion has been general, and the results could be applied to a variety of semidefinite programs on random graphs. In this section, we specialize to the community detection problem considered in Theorem 1.1. Thus we are going to analyze the optimization problem (1.4), where A is the adjacency matrix of a random graph distributed according to the classical stochastic block model G(n, p, q).

As we already noticed, this is a particular case of the class of problems (3.5) that we analyzed in Sect. 3.2. In our case,

$$\begin{aligned} B := A - \lambda E_n \end{aligned}$$

with $\lambda $ defined in (1.5), and the feasible set is

$$\begin{aligned} \mathcal {M}_{\mathrm {opt}}:= \mathcal {M}_\mathrm {G}^+ = \left\{ Z :\; Z \succeq 0, \; \hbox {diag}(Z) \preceq \mathbf{I }_n \right\} . \end{aligned}$$

5.1 The maximizer of the reference objective function

In order to successfully apply Lemma 3.3, we will now choose a reference matrix R so that it is close to (but also conveniently simpler than) the expectation of B. To do so, we can assume without loss of generality that $\mathcal {C}_1 = \{1,\ldots , n/2\}$ and $\mathcal {C}_2 = \{ n/2+1,\ldots ,n\}$. Then we define R as a block matrix

$$\begin{aligned} R = \frac{p-q}{2} \begin{bmatrix} E_{n/2}&-E_{n/2} \\ -E_{n/2}&E_{n/2} \end{bmatrix} \end{aligned}$$

(5.1)

where as usual $E_{n/2}$ denotes the $n/2 \times n/2$ matrix whose all entries equal 1.

Let us compute the expected value $\mathbb {E}B = \mathbb {E}A - (\mathbb {E}\lambda ) E_n$ and compare it to R. To do so, note that the expected value of A has the form

$$\begin{aligned} \mathbb {E}A = \begin{bmatrix} p E_{n/2}&\quad q E_{n/2} \\ q E_{n/2}&\quad p E_{n/2} \end{bmatrix} +(1-p) I_n. \end{aligned}$$

(The contribution of the identity matrix $I_n$ is required here since the diagonal entries of A and thus of $\mathbb {E}A$ equal 1 due to the self-loops.) Furthermore, the definition of $\lambda $ in (1.5) easily implies that

$$\begin{aligned} \mathbb {E}\lambda = \frac{1}{n(n-1)} \sum _{i \ne j} \mathbb {E}a_{ij} = \frac{p+q}{2} \frac{n^2}{n(n-1)} - \frac{p}{n-1} = \frac{p+q}{2} - \frac{p-q}{n-1}. \end{aligned}$$

(5.2)

Thus

$$\begin{aligned} \mathbb {E}B = \mathbb {E}A - (\mathbb {E}\lambda ) E_n = R + (1-p) \mathbf{I }_n - \frac{p-q}{n-1} E_n. \end{aligned}$$

(5.3)

In the near future we will think of R as the leading term and other two terms as being negligible, so (5.3) intuitively states that $R \approx \mathbb {E}B$. We save this fact for later.

Using the simple form of R, we can easily determine the form of the solution $Z_R$ of the reference problem (3.6).

Lemma 5.1

(The maximizer of the reference objective function) We have

$$\begin{aligned} Z_R := \arg \max _{Z \in \mathcal {M}_{\mathrm {opt}}} \langle R,Z\rangle = \begin{bmatrix} E_{n/2}&-E_{n/2} \\ -E_{n/2}&E_{n/2} \end{bmatrix}. \end{aligned}$$

Proof

Let us first evaluate the maximizer of $\langle R,Z\rangle $ on the larger set $[-1,1]^{n \times n}$, which contains the feasible set $\mathcal {M}_{\mathrm {opt}}$ according to (3.4). Taking into account the form of R in (5.1), one can quickly check that the maximizer of $\langle R,Z\rangle $ on $[-1,1]^{n \times n}$ is $Z_R$. Since $Z_R$ belongs to the smaller set $\mathcal {M}_{\mathrm {opt}}$, it must be the maximizer on that set as well. $\square $

5.2 Bounding the error

We are going to conclude from Lemma 4.1 and Lemma 3.3 that the maximizer of the actual objective function,

$$\begin{aligned} \widehat{Z}= \arg \max _{Z \in \mathcal {M}_{\mathrm {opt}}} \langle B,Z\rangle , \end{aligned}$$

must be close to $Z_R$, the maximizer of the reference objective function.

Lemma 5.2

(Maximizers of random and reference functions are close) Assume that $\bar{p}$ satisfies (4.1). Then, with probability at least $1- e^3 5^{-n}$, we have

$$\begin{aligned} \Vert \widehat{Z}- Z_R\Vert _2^2 \le \frac{116 \, \bar{p}^{1/2} n^{3/2}}{p-q}. \end{aligned}$$

Proof

We expand

$$\begin{aligned} \Vert \widehat{Z}- Z_R\Vert _2^2 = \Vert \widehat{Z}\Vert _2^2 + \Vert Z_R\Vert _2^2 - 2 \langle \widehat{Z},Z_R\rangle \end{aligned}$$

(5.4)

and control the three terms separately.

Note that $\Vert \widehat{Z}\Vert _2^2 \le n^2$ since $\widehat{Z}\in \mathcal {M}_{\mathrm {opt}}\subset [-1,1]^{n \times n}$ according to (3.4). Next, we have $\Vert Z_R\Vert _2^2 = n^2$ by Lemma 5.1. Thus

$$\begin{aligned} \Vert \widehat{Z}\Vert _2^2 \le \Vert Z_R\Vert _2^2. \end{aligned}$$

(5.5)

Finally, we use Lemma 3.3 to control the cross term in (5.4). To do this, notice that (5.1) and Lemma 5.1 imply that $R = \frac{p-q}{2} \cdot Z_R$. Then, by homogeneity, the conclusion of Lemma 3.3 implies that

$$\begin{aligned} \langle Z_R,\widehat{Z}\rangle \ge \langle Z_R,Z_R\rangle -\frac{4 K_\mathrm {G}}{p-q} \Vert R-B \Vert _{\infty \rightarrow 1}. \end{aligned}$$

(5.6)

To bound the norm of $R-B$, let us express this matrix as

$$\begin{aligned} B - R = (B - \mathbb {E}B) + (\mathbb {E}B - R) = (A - \mathbb {E}A) - (\lambda - \mathbb {E}\lambda ) E_n + (\mathbb {E}B - R)\quad \end{aligned}$$

(5.7)

and bound each of the three terms separately. According to Lemma 4.1 and Remark 4.4, we obtain that with probability larger than $1 - e^3 5^{-n}$,

$$\begin{aligned} \Vert A - \mathbb {E}A\Vert _{\infty \rightarrow 1} \le 3 \bar{p}^{1/2} n^{3/2} \quad \text {and} \quad |\lambda - \mathbb {E}\lambda | \le 3 \bar{p}^{1/2} n^{-1/2}. \end{aligned}$$

Moreover, according to (5.3),

$$\begin{aligned} \mathbb {E}B - R = (1-p) \mathbf{I }_n - \frac{p-q}{n-1} E_n. \end{aligned}$$

Substituting these bounds into (5.7) and using triangle inequality along with the facts that $\Vert E_n\Vert _{\infty \rightarrow 1} = n^2$, $\Vert \mathbf{I }_n\Vert _{\infty \rightarrow 1} = n$, we obtain

$$\begin{aligned} \Vert B - R \Vert _{\infty \rightarrow 1} \le 6 \bar{p}^{1/2} n^{3/2} + (1-p)n + \frac{(p-q) n^2}{n-1}. \end{aligned}$$

Since $\bar{p}\ge 9/n$, one can check that each of the last two terms is bounded by $\bar{p}^{1/2} n^{3/2}$. Thus we obtain $\Vert B - R \Vert _{\infty \rightarrow 1} \le 8 \bar{p}^{1/2} n^{3/2}$. Substituting into (5.6), we conclude that

$$\begin{aligned} \langle Z_R,\widehat{Z}\rangle \ge \langle Z_R,Z_R\rangle - 8 \bar{p}^{1/2} n^{3/2} \cdot \frac{4 K_\mathrm {G}}{p-q}. \end{aligned}$$

Recalling from (3.1) that Grothendieck’s constant $K_\mathrm {G}$ is bounded by 1.783, we can replace $8 \cdot 4 K_\mathrm {G}$ by 58 in this bound. Substituting it and (5.5) and into (5.4), we conclude that

$$\begin{aligned} \Vert \widehat{Z}- Z_R\Vert _2^2 \le 2 \Vert Z_R\Vert _2^2 - 2 \langle \widehat{Z},Z_R\rangle \le \frac{116 \, \bar{p}^{1/2} n^{3/2}}{p-q}. \end{aligned}$$

The proof of Lemma 5.2 is complete. $\square $

Proof of Theorem 1.1

The conclusion of the theorem will quickly follow from Lemma 5.2. Let us check the lemma’s assumption (4.1) on $\bar{p}$. A quick computation yields

$$\begin{aligned} \bar{p}= \frac{2}{n(n-1)} \sum _{i <j} \hbox {Var}(a_{ij}) = \frac{p(1-p)(n-2)}{2(n-1)} + \frac{q(1-q)n}{2(n-1)}. \end{aligned}$$

(5.8)

Since $p(1-p) \le 1/4$, we get

$$\begin{aligned} \bar{p}\ge \frac{1}{2} \max \left\{ p(1-p), q(1-q) \right\} - \frac{1}{8(n-1)} > \frac{9}{n} \end{aligned}$$

where the last inequality follows from an assumption of Theorem 1.1. Thus the assumption (4.1) holds, and we can apply Lemma 5.2. It states that

$$\begin{aligned} \Vert \widehat{Z}- Z_R\Vert _2^2 \le \frac{116 \, \bar{p}^{1/2} n^{3/2}}{p-q} \end{aligned}$$

(5.9)

with probability at least $1- e^3 5^{-n}$. From (5.8), it is not difficult to see that $\bar{p}\le \frac{p+q}{2}$. Substituting this into (5.9) and expressing $p = a/n$ and $q=b/n$, we conclude that

$$\begin{aligned} \Vert \widehat{Z}- Z_R\Vert _2^2 \le \frac{116 \sqrt{(a+b)/2}}{a-b} \cdot n^2. \end{aligned}$$

Rearranging the terms, we can see that this expression is bounded by $\varepsilon n^2$ if

$$\begin{aligned} (a-b)^2 \ge 7 \cdot 10^3 \varepsilon ^{-2} (a+b). \end{aligned}$$

But this inequality follows from the assumption (1.6).

It remains to recall that according to Lemma 5.1, we have $Z_R = \bar{x}\bar{x}^\mathsf {T}$ where $\bar{x}= [\mathbf{1}_{n/2} \; -\mathbf{1}_{n/2}] \in \mathbb {R}^n$ is the community membership vector defined in (1.3). Theorem 1.1 is proved. $\square $

Proof of Corollary 1.2

The result follows from Davis-Kahan Theorem [28] about the stability of the eigenvectors under matrix perturbations. The largest eigenvalue of $ \bar{x}\bar{x}^\mathsf {T}$ is n while all the others are 0, so the spectral gap equals n. Expressing $\widehat{Z}= (\widehat{Z}- \bar{x}\bar{x}^\mathsf {T})+ \bar{x}\bar{x}^\mathsf {T}$ and using that $\Vert \widehat{Z}- \bar{x}\bar{x}^\mathsf {T}\Vert _2 \le \sqrt{\varepsilon }n$, we obtain from Davis-Kahan’s theorem (see for example Corollary 3 in [66]) that

$$\begin{aligned} \Vert \widehat{v} - \bar{v} \Vert _2 = 2 | \sin (\theta /2) | \le C \sqrt{\varepsilon }. \end{aligned}$$

Here $\hat{v}$ and $\bar{v}$ denote the unit-norm eigenvectors associated to the largest eigenvalues of $\widehat{Z}$ and $\bar{x}\bar{x}^\mathsf {T}$ respectively, and $\theta \in [0, \pi /2]$ is the angle between these two vectors. By definition, $\widehat{x}= \sqrt{n} \widehat{v}$ and $\bar{x}= \sqrt{n} \bar{v}$. This concludes the proof. $\square $

6 General stochastic block model: proof of Theorem 1.3

In this section we focus on the community detection problem for the general stochastic block-model considered in Theorem 1.3. The semidefinite program (1.10) is a particular case of the class of problems (3.5) that we analyzed in Sect. 3.2. In our case, we set $B:=A$, choose the reference matrix to be

$$\begin{aligned} R := \bar{A} = \mathbb {E}A, \end{aligned}$$

and consider the feasible set

$$\begin{aligned} \mathcal {M}_{\mathrm {opt}}:= \Big \{ Z \succeq 0, \; Z \ge 0, \; \hbox {diag}(Z) \preceq \mathbf{I }_n, \; \sum _{i,j=1}^n Z_{ij} = \lambda \Big \}. \end{aligned}$$

Then $\mathcal {M}_{\mathrm {opt}}$ is a subset of the Grothendieck’s set $\mathcal {M}_\mathrm {G}^+$ defined in (3.4). Using (3.4), we see that

$$\begin{aligned} \mathcal {M}_{\mathrm {opt}}\subset \Big \{ Z :\; 0 \le Z_{ij} \le 1 \text { for all } i,j; \; \; \sum _{i,j=1}^n Z_{ij} = \lambda \Big \}. \end{aligned}$$

(6.1)

6.1 The maximizer of the expected objective function

Unlike before, the reference matrix $R = \bar{A} = \mathbb {E}A = (p_{ij})_{i,j=1}^n$ is not necessarily a block matrix like in (5.1) since the edge probabilities $p_{ij}$ may be different for all $i<j$. However, we will observe that the solution $Z_R$ of the reference problem (3.6) is a block matrix, and it is in fact the community membership matrix $\bar{Z}$ defined in (1.9).

Lemma 6.1

(The maximizer of the expected objective function) We have

$$\begin{aligned} Z_R := \arg \max _{Z \in \mathcal {M}_{\mathrm {opt}}} \langle \bar{A},Z\rangle = \bar{Z}. \end{aligned}$$

(6.2)

Proof

Let us first compute the maximizer on the larger set $\mathcal {M}_{\mathrm {opt}}'$, which contains the feasible set $\mathcal {M}_{\mathrm {opt}}$ according to (6.1). The maximum of the linear form $\langle \bar{A},Z\rangle $ on the convex set $\mathcal {M}_{\mathrm {opt}}'$ is attained at an extreme point. These extreme points are 0 / 1 matrices with $\lambda $ ones. Thus the maximizer of $\langle \bar{A},Z\rangle $ has the ones at the locations of the $\lambda $ largest entries of $\bar{A}$.

From the definition of the general stochastic block model we can recall that $\bar{A} = (p_{ij})$ has two types of entries. The entries larger than p form the community blocks $\mathcal {C}_k \times \mathcal {C}_k$, $k=1,\ldots ,K$. The number of such large entries is the same as the number of ones in the community membership matrix $\bar{Z}$, which in turn equals $\lambda $ by the choice we made in Theorem 1.3. All other entries of $\bar{A}$ are smaller than q. Thus the $\lambda $ largest entries of $\bar{A}$ form the community blocks $\mathcal {C}_k \times \mathcal {C}_k$, $k=1,\ldots ,K$.

Summarizing, we have shown that the maximizer of $\langle \bar{A},Z\rangle $ on the set $\mathcal {M}_{\mathrm {opt}}'$ is a 0 / 1 matrix with ones forming the community blocks $\mathcal {C}_k \times \mathcal {C}_k$, $k=1,\ldots ,K$. Thus the maximizer is the community membership matrix $\bar{Z}$ from (1.9). Since $\bar{Z}$ belongs to the smaller set $\mathcal {M}_{\mathrm {opt}}$, it must be the maximizer on that set as well. $\square $

6.2 Bounding the error

We are going to conclude from Lemma 4.1 and Lemma 3.3 that the maximizer of the actual objective function,

$$\begin{aligned} \widehat{Z}= \arg \max _{Z \in \mathcal {M}_{\mathrm {opt}}} \langle A,Z\rangle , \end{aligned}$$

must be close to $\bar{Z}$, the maximizer of the reference objective function. We will first show that the reference objective function $\langle \bar{A},Z\rangle $ distinguishes points near its maximizer $\bar{Z}$.

Lemma 6.2

(Expected objective function distinguishes points) Every $Z \in \mathcal {M}_{\mathrm {opt}}$ satisfies

$$\begin{aligned} \langle \bar{A},\bar{Z} - Z\rangle \ge \frac{p-q}{2} \, \Vert \bar{Z} - Z\Vert _1. \end{aligned}$$

(6.3)

Proof

We will prove that the conclusion holds for every Z in the larger set $\mathcal {M}_{\mathrm {opt}}'$, which contains the feasible set $\mathcal {M}_{\mathrm {opt}}$ according to (6.1). Expanding the inner product, we can represent it as

$$\begin{aligned} \langle \bar{A},\bar{Z} - Z\rangle = \sum _{i,j=1}^n p_{ij} (\bar{Z}-Z)_{ij} = \sum _{(i,j) \in \mathrm {In}} p_{ij} (\bar{Z}-Z)_{ij} - \sum _{(i,j) \in \mathrm {Out}} p_{ij} (Z - \bar{Z})_{ij} \end{aligned}$$

where $\mathrm {In}$ and $\mathrm {Out}$ denote the set of edges that run within and across the communities, respectively. Formally, $\mathrm {In}= \cup _{k=1}^K (\mathcal {C}_k \times \mathcal {C}_k)$ and $\mathrm {Out}=\{1,\ldots , n\}^2{\setminus }\mathrm {In}$.

For the edges $(i,j) \in \mathrm {In}$, we have $p_{ij} \ge p$ and $(\bar{Z}-Z)_{ij} \ge 0$ since $\bar{Z}_{ij}=1$ and $Z_{ij} \le 1$. Similarly, for the edges $(i,j) \in \mathrm {Out}$, we have $p_{ij} \le q$ and $(Z - \bar{Z})_{ij} \ge 0$ since $\bar{Z}_{ij}=0$ and $Z_{ij} \ge 0$. It follows that

$$\begin{aligned} \langle \bar{A},\bar{Z} - Z\rangle \ge p S_{\mathrm {In}} - q S_{\mathrm {Out}} \end{aligned}$$

(6.4)

where

$$\begin{aligned} S_{\mathrm {In}} = \sum _{(i,j) \in \mathrm {In}} (\bar{Z}-Z)_{ij} \quad \text {and} \quad S_{\mathrm {Out}} = \sum _{(i,j) \in \mathrm {Out}} (Z - \bar{Z})_{ij}. \end{aligned}$$

Since both $\bar{Z}$ and Z belong to $\mathcal {M}_{\mathrm {opt}}$, the sum of all entries of both these matrices is the same ($n^2/2$), so we have

$$\begin{aligned} S_{\mathrm {In}} - S_{\mathrm {Out}} = \sum _{i,j=1}^n \bar{Z}_{ij} - \sum _{i,j=1}^n Z_{ij} =0. \end{aligned}$$

(6.5)

On the other hand, as we already noticed, the terms in the sums that make $S_{\mathrm {In}}$ and $S_{\mathrm {Out}}$ are all non-negative. Therefore

$$\begin{aligned} S_{\mathrm {In}} + S_{\mathrm {Out}} = \sum _{i,j=1}^n |(\bar{Z}-Z)_{ij}| = \Vert \bar{Z}-Z\Vert _1. \end{aligned}$$

(6.6)

Substituting (6.5) and (6.6) into (6.4), we obtain the conclusion (6.3). $\square $

Now we are ready to conclude that $\widehat{Z}\approx \bar{Z}$.

Lemma 6.3

(Maximizers of random and expected functions are close) Assume that $\bar{p}$ satisfies (4.1). With probability at least $1- e^3 5^{-n}$, we have

$$\begin{aligned} \Vert \widehat{Z}- \bar{Z}\Vert _1 \le \frac{12 \,K_\mathrm {G}\, p_0^{1/2} n^{3/2}}{p-q}. \end{aligned}$$

Proof

Using first Lemma 6.2, Lemma 3.3 (with $R=\bar{A}$ and $Z_R=\bar{Z}$ as before) and then Lemma 4.1, we obtain

$$\begin{aligned} \Vert \widehat{Z}- \bar{Z}\Vert _1 \le \frac{2}{p-q} \, \langle \bar{A},\bar{Z} - \widehat{Z}\rangle \le \frac{4 K_\mathrm {G}}{p-q} \Vert A - \bar{A}\Vert _{\infty \rightarrow 1} \le \frac{12 K_\mathrm {G}}{p-q} \bar{p}^{1/2} n^{3/2} \end{aligned}$$

with probability at least $1- e^3 5^{-n}$. Lemma 6.3 is proved. $\square $

Proof of Theorem 1.3

The conclusion follows from Lemma 6.3. Indeed, substituting $p=a/n$, $q=b/n$ and $\bar{p}= g/n$ and rearranging the terms, we obtain

$$\begin{aligned} \Vert \widehat{Z}- \bar{Z}\Vert _1 \le \frac{12 K_\mathrm {G}g^{1/2}}{a-b} \cdot n^2 \le \frac{22 g^{1/2}}{a-b} \cdot n^2 \end{aligned}$$

since we know form (3.1) that Grothendieck’s constant $K_\mathrm {G}$ is bounded by 1.783. Rearranging the terms, we can see that this expression is bounded by $\varepsilon n^2$ if $(a-b)^2 \ge 484 \, \varepsilon ^{-2} g$, which is our assumption (1.12). This proves the required bound for the $\Vert \cdot \Vert _1$ norm.

Since for any sequence $\sum |b_{i,j}|^2 \le \max |b_{i,j}| \sum |b_{i,j}|$, we get

$$\begin{aligned} \Vert \widehat{Z}- \bar{Z}\Vert _2^2 \le \Vert \widehat{Z}- \bar{Z}\Vert _\infty \cdot \Vert \widehat{Z}- \bar{Z}\Vert _1. \end{aligned}$$

As we noted in (6.1), all entries of $\widehat{Z}$ and $\bar{Z}$ belong to [0, 1] hence $\Vert \widehat{Z}- \bar{Z}\Vert _\infty \le 1$. The bound for the Frobenius norm follows and Theorem 1.3 is proved. $\square $

References

Abbe, E., Sandon, C.: Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms. arXiv:1503.00609
Ailon, N., Chen, Y., Xu, H.: Breaking the small cluster barrier of graph clustering. J. Mach. Learn. Res. (2014)
Airoldi, E., Blei, D., Fienberg, S., Xing, E.: Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981–2014 (2008)
MATH Google Scholar
Alon, N.: Spectral techniques in graph algorithms. In: LATIN’98: theoretical informatics, Lecture Notes in Computer Science, vol. 1380, pp. 206–215, Springer, Berlin (1998)
Alon, N., Kahale, N.: A spectral technique for coloring random 3-colorable graphs. SIAM J. Comput. 26, 1733–1748 (1997)
Article MathSciNet MATH Google Scholar
Alon, N., Naor, A.: Approximating the cut-norm via Grothendieck’s inequality. SIAM J. Comput. 35, 787–803 (2006)
Article MathSciNet MATH Google Scholar
Ames, B., Vavasis, S.: Nuclear norm minimization for the planted clique and biclique problems. Math. Program. 129, 69–89 (2011)
Article MathSciNet MATH Google Scholar
Amini, A., Chen, A., Bickel, P., Levina, E.: Pseudo-likelihood methods for community detection in large sparse networks. Ann. Stat. 41, 2097–2122 (2013)
Article MathSciNet MATH Google Scholar
Amini, A., Levina, E.: On semidefinite relaxations of the block model (2014). arXiv:1406.5647
Bickel, P., Chen, A.: A nonparametric view of network models and Newman-Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106, 21068–21073 (2009)
Article Google Scholar
Bickel, P., Choi, D., Chang, X., Zhang, H.: Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. Ann. Stat. 41, 1922–1943 (2013)
Article MathSciNet MATH Google Scholar
Bollobas, B.: Random graphs. Cambridge Studies in Advanced Mathematics, vol. 73, 2nd edn, p. 498. Cambridge University Press, Cambridge (2001)
Google Scholar
Bollobas, B., Janson, S., Riordan, O.: The phase transition in inhomogeneous random graphs. Random Struct. Algorithms 31, 3–122 (2007)
Article MathSciNet MATH Google Scholar
Boppana, R.: Eigenvalues and graph bisection: an average-case analysis. In: Proceedings of 28th annual symposium on foundations of computer science (1987), pp. 280–285
Bordenave, C., Guionnet, A.: Localization and delocalization of eigenvectors for heavy-tailed random matrices. Probab. Theory Relat. Fields 157, 885–953 (2013)
Article MathSciNet MATH Google Scholar
Bordenave, C., Lelarge, M., Massoulié, L.: Non-backtracking spectrum of random graphs: community detection and non-regular Ramanujan graphs. arXiv:1501.06087
Braverman, M., Makarychev, K., Makarychev, Y., Naor, A.: The Grothendieck constant is strictly smaller than Krivine’s bound. IEEE 52nd annual symposium on foundations of computer science FOCS 2011, 453–462, IEEE Computer Soc., Los Alamitos, CA (2011)
Bui, T.N., Chaudhuri, S., Leighton, F.T., Sipser, M.: Graph bisection algorithms with good average case behavior. Combinatorica 7, 171–191 (1987)
Article MathSciNet Google Scholar
Cai, T., Li, X.: Robust and computationally feasible community detection in the presence of arbitrary outlier vertices (2014). arXiv:1404.6000
Celisse, A., Daudin, J.-J., Pierre, L.: Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Stat 6, 1847–1899 (2012)
Article MathSciNet MATH Google Scholar
Chafaï, D., Guédon, O., Lecué, G., Pajor, A.: Interactions between compressed sensing random matrices and high dimensional geometry. Panoramas et Synthèses [Panoramas and Syntheses], 37. Société Mathématique de France, Paris, pp. 181 (2012)
Chaudhuri, K., Chung, F., Tsiatas, A.: Spectral clustering of graphs with general degrees in the extended planted partition model. JMLR Workshop and Conference Proceedings, pp 23:35.1–35.23 (2012)
Chen, Y., Sanghavi, S., Xu, H.: Clustering sparse graphs, NIPS (2012). arXiv:1210.3335
Chen, Y., Jalali, A., Sanghavi, S., Xu, H.: Clustering partiallyobserved graphs via convex optimization. J. Mach. Learn. Res. (2014). arXiv:1104.4803
Chen, Y., Xu, J.: Statistical-computational tradeoffs in planted problems and submatrix localization with a growing number of clusters and submatrices (2014). arXiv:1402.1267
Chin, P., Rao, A., Vu, V.: Stochastic block model and community detection in the sparse graphs: a spectral algorithm with optimal rate of recovery (2015). arXiv:1501.05021
Coja-Oghlan, A.: Graph partitioning via adaptive spectral techniques. Comb Probab Comput 19, 227–284 (2010)
Article MathSciNet MATH Google Scholar
Davis, C., Kahan, W.M.: The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7, 1–46 (1970)
Article MathSciNet MATH Google Scholar
Decelle, A., Krzakala, F., Moore, C., Zdeborova, L.: Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys Rev E 84, 066106 (2012)
Article Google Scholar
Dyer, M.E., Frieze, A.M.: The solution of some random NP-hard problems in polynomial expected time. J. Algorithms 10, 451–489 (1989)
Article MathSciNet MATH Google Scholar
Feige, U., Ofek, E.: Spectral techniques applied to sparse random graphs. Random Struct. Algorithms 27, 251–275 (2005)
Article MathSciNet MATH Google Scholar
Friedman, J., Kahn, J., Szemeredi, E.: On the second eigenvalue in random regular graphs. In: Proceedings of 21 annual ACM symposium of theory of computing, pp. 587–598 (1989)
Gao, C., Ma, Z., Zhang, A., Zhou, H.: Achieving optimal misclassification proportion in stochastic block model. arXiv:1505.03772
Goemans, M., Williamson, D.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. Assoc. Comput. Mach. 42, 1115–1145 (1995)
Article MathSciNet MATH Google Scholar
Grothendieck, A.: Résumé de la théorie métrique des produits tensoriels topologiques. Bol. Soc. Mat. São Paulo 8, 1–79 (1953)
MathSciNet MATH Google Scholar
Guédon, O., Vershynin, R.: Community detection in sparse networks via Grothendieck’s inequality. arXiv:1411.4686
Jalali, A., Chen, Y., Sanghavi, S., Xu, H.: Clustering partially observed graphs via convex optimization, ICML (2011). arXiv:1104.4803
Joseph, A., Yu, B.: Impact of regularization on spectral clustering (2014). arXiv:1312.1733
Hagen, L., Kahng, A.: New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Comput. Aided Des. Integr Circuits Syst. 11, 1074–1085 (1992)
Article Google Scholar
Holland, P.W., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: first steps. Soc. Netw. 5, 109–137 (1983)
Article MathSciNet Google Scholar
Khot, S., Naor, A.: Grothendieck-type inequalities in combinatorial optimization. Comm. Pure Appl. Math. 65, 992–1035 (2012)
Article MathSciNet MATH Google Scholar
Ledoux, M., Talagrand, M.: Probability in Banach spaces. Isoperimetry and processes, reprint of the 1991 edition. Classics in mathematics. Springer-Verlag, Berlin, pp. 480 (2011)
Lei, J., Rinaldo, A.: Consistency of spectral clustering in sparse stochastic block models (2013). arXiv:1312.2050
Lelarge, M., Massoulié, L., Xu, J.: Reconstruction in the labeled stochastic block model. Preliminary version in Proceedings of the Information Theory Workshop (2013). arXiv:1502.03365
Leskovec, J., Lang, K., Dasgupta, A., Mahoney, M.: Statistical properties of community structure in large social and information networks, In: Proceedings of the 17th international conference on World Wide Web, ACM, pp. 695–704 (2008)
Levina, E., Le, C., Vershynin, R.: Sparse random graphs: regularization and concentration of the Laplacian. arXiv:1502.03049
Le, C., Vershynin, R.: Concentration and regularization of random graphs. arXiv:1506.00669
Lindenstrauss, J., Pelczyński, A.: Absolutely summing operators in $L_p$-spaces and their applications. Stud. Math. 29, 275–326 (1968)
MATH Google Scholar
Lovász, L., Schrijver, A.: Cones of matrices and set-functions and $0$–$1$ optimization. SIAM J. Optim. 1, 166–190 (1991)
Article MathSciNet MATH Google Scholar
McSherry, F.: Spectral partitioning of random graphs. In: Proceedings of 42nd annual symposium on foundations of computer science, pp. 529–537 (2001)
Montanari, A., Sen, S.: Semidefinite programs on sparse random graphs, arXiv:1504.05910
Mossel, E., Neeman, J., Sly, A.: Belief propagation, robust reconstruction, and optimal recovery of block models, 2013. arXiv:1309.1380
Mossel, E., Neeman, J., Sly, A.: Stochastic Block Models and Reconstruction, Probability Theory and Related Fields, to appear, 2014
Mossel, E., Neeman, J., Sly, A.: A proof of the block model threshold conjecture, 2014. arXiv:1311.4115
Mossel, E., Neeman, J., Sly, A.: Consistency Thresholds for the Planted Bisection Model, 2014. arXiv:1309.1380
Nadakuditi, R., Newman, M.: Graph spectra and the detectability of community structure in networks Phys. Rev. Lett. 108, 188701 (2012)
Article Google Scholar
Nesterov, Y.: Semidefinite relaxation and nonconvex quadratic optimization. Optim. Methods Softw. 9, 141–160 (1998)
Article MathSciNet MATH Google Scholar
Newman, M.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103, 8577–8582 (2006)
Article Google Scholar
Massoulié, L.: Community detection thresholds and the weak Ramanujan property, 2013. arXiv:1311.3085
Oymak, S., Hassibi, B.: Finding dense clusters via low rank $+$ sparse decomposition, CoRR, 2011. arXiv:1104.5186
Pisier, G.: Grothendieck’s theorem, past and present. Bull. Amer. Math. Soc. (N.S.) 49(2), 237–323 (2012)
Article MathSciNet MATH Google Scholar
Qin, T., Rohe, K.: Regularized spectral clustering under the degree-corrected stochastic blockmodel, arXiv:1309.4111
Rohe, K., Chatterjee, S., Yu, B.: Spectral clustering and the high-dimensional stochastic block model. Annals of Statistics 39, 1878–1915 (2011)
Article MathSciNet MATH Google Scholar
Snijders, T., Nowicki, K.: Estimation and prediction for stochastic block-structures for graphs with latent block structure. Journal of Classification 14, 75–100 (1997)
Article MathSciNet MATH Google Scholar
Strogatz, S.: Exploring complex networks. Nature 410, 268–276 (2001)
Article Google Scholar
Vu, V.: Singular vectors under random perturbation. Random Structures Algorithms 39, 526–538 (2011)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work was carried out while the first author was a Gerhing Visiting Professor at the University of Michigan. He thanks this institution for hospitality. The second author is grateful to Alexander Barvinok for drawing his attention to Y. Nesterov’s work [57] on combinatorial optimization and to Grothendieck’s inequality in this context. We also thank Elchanan Mossel for useful discussions, and the anonymous referees whose suggestions helped to improve the presentation.

Author information

Authors and Affiliations

Laboratoire d’Analyse et Mathématiques Appliquées (UMR 8050), UPEMLV, Université Paris-Est, 77454, Marne-la-Vallée Cedex 2, France
Olivier Guédon
Department of Mathematics, University of Michigan, 530 Church St., Ann Arbor, MI, 48109, USA
Roman Vershynin

Authors

Olivier Guédon
View author publications
You can also search for this author in PubMed Google Scholar
Roman Vershynin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Olivier Guédon.

Additional information

O. G. is supported by the ANR project GeMeCoD, ANR 2011 BS01 007 01. R. V. is partially supported by NSF grants 1265782 and US Air Force Grant FA9550-14-1-0009.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guédon, O., Vershynin, R. Community detection in sparse networks via Grothendieck’s inequality. Probab. Theory Relat. Fields 165, 1025–1049 (2016). https://doi.org/10.1007/s00440-015-0659-z

Download citation

Received: 24 November 2014
Revised: 19 August 2015
Published: 03 September 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s00440-015-0659-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Community detection in sparse networks via Grothendieck’s inequality

Abstract

Similar content being viewed by others

Community detection with a subsampled semidefinite program

Finding One Community in a Sparse Graph

Finding communities in sparse networks

1 Introduction

1.1 Semidefinite problems on random graphs

1.2 Community detection: the classical stochastic block model

Theorem 1.1

Corollary 1.2

1.3 Community detection: general stochastic block models

Theorem 1.3

Remark 1.4

Remark 1.5

Remark 1.6

Remark 1.7

1.4 Related work

1.4.1 Relatively dense networks: average degrees are \(\Omega (\log n)\)

1.4.2 Totally sparse networks: bounded average degrees

1.4.3 The new results in historical perspective

1.4.4 Follow up work

1.5 Plan of the paper

2 Semidefinite optimization on random graphs: the method in a nutshell

Remark 2.1

3 Grothendieck’s inequality and semidefinite programming

Theorem 3.1

3.1 Grothendieck’s inequality in matrix form

Fact 3.2

3.2 Semidefinite programming

Lemma 3.3

Proof

4 Deviation in the cut norm

Lemma 4.1

Remark 4.2

Theorem 4.3

Proof of Lemma 4.1

Remark 4.4

5 Stochastic block model: proof of Theorem 1.1

5.1 The maximizer of the reference objective function

Lemma 5.1

Proof

5.2 Bounding the error

Lemma 5.2

Proof

Proof of Theorem 1.1

Proof of Corollary 1.2

6 General stochastic block model: proof of Theorem 1.3

6.1 The maximizer of the expected objective function

Lemma 6.1

Proof

6.2 Bounding the error

Lemma 6.2

Proof

Lemma 6.3

Proof

Proof of Theorem 1.3

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation