1 Introduction

The vertex separator problem (VSP) for a graph is to find a subset of vertices (called vertex separator) whose removal disconnects the graph into two components of roughly equal size. The VSP is NP-hard. Some families of graphs are known to have small vertex separators. Lipton and Tarjan [1] provide a polynomial time algorithm which determines a vertex separator in n-vertex planar graphs of size \(O(\sqrt{n})\). Their result was extended to some other families of graphs such as graphs of fixed genus [2]. It is also known that trees, 3D-grids and meshes have small separators. However, there are graphs that do not have small separators.

The VSP problem arises in many different fields such as VLSI design [3] and bioinformatics [4]. Finding vertex separators of minimal size is an important issue in communications network [5] and finite element methods [6]. The VSP also plays a role in divide-and-conquer algorithms for minimizing the work involved in solving system of equations, see e.g., [2, 7].

The vertex separator problem is related to the following graph partition problem. Let \(A=(a_{ij})\) be the adjacency matrix of a graph G with vertex set \(V(G)=\{ 1,\ldots , n\}\) and edge set E(G). Thus A is a symmetric zero-one matrix of order n with zero diagonal. We are interested in 3-partitions \((S_{1},S_{2},S_{3})\) of V(G) with the property that

$$\begin{aligned} |S_{i}|=m_{i} \ge 1. \end{aligned}$$

Given A and \(m=(m_{1},m_{2},m_{3})^{{\mathrm { T}}}\) we consider the following minimum cut (MC) problem:

$$\begin{aligned} (\mathrm{MC}) \quad \mathrm{OPT}_\mathrm{MC} := \min \left\{ \sum \limits _{i\in S_1,j \in S_2} \!\!\! a_{ij} : (S_1, S_2, S_3) ~\text{ partitions }~ V, ~|S_i|=m_i, \forall i \right\} . \end{aligned}$$

It asks to find a vertex partition \((S_{1},S_{2},S_{3})\) with specified cardinalities, such that the number of edges joining vertices in \(S_{1}\) and \(S_{2}\) is minimized. We remark that this min-cut problem is known to be NP-hard [8]. It is clear that if \(\mathrm{OPT}_\mathrm{MC}=0\) for some \(m=(m_1,m_2,m_3)^{{\mathrm { T}}}\) then \(S_3\) separates \(S_1\) and \(S_2\). On the other hand, \(\mathrm{OPT}_\mathrm{MC}>0\) shows that no separator \(S_{3}\) for the cardinalities specified in m exists.

A natural way to model this problem in 0–1 variables consists in representing the partition \((S_{1},S_{2},S_{3})\) by the characteristic vectors \(x_{i}\) corresponding to \(S_{i}\). Thus \(x_{i} \in \{0,1\}^{n}\) and \((x_{i})_{j}=1\) exactly if \(j \in S_{i}\). Hence partitions with prescribed cardinalities are in one-to-one correspondence with \(n \times 3\) zero-one matrices \(X=(x_{1}, x_{2},x_{3})\) such that \(X^{{\mathrm { T}}}e=m\) and \(Xe=e\) (throughout e denotes the vector of all ones of appropriate size). The first condition takes care of the cardinalities and the second one insures that each vertex is in exactly one partition block. The number of edges between \(S_{1}\) and \(S_{2}\) is now given by \( x_{1}^{{\mathrm { T}}} Ax_{2}\). Thus (MC) is equivalent to

$$\begin{aligned} \min \{ x_{1}^{{\mathrm { T}}} Ax_{2}: X=(x_{1},x_{2},x_{3}) \in \{0,1\}^{n \times 3}, X^{{\mathrm { T}}} e=m, ~Xe=e \}. \end{aligned}$$
(1)

It is the main purpose of this paper to explore computationally efficient ways to get tight approximations of \(\mathrm{OPT}_\mathrm{MC}\). These will be used to find vertex separators of small size.

In Sect. 2 we provide an overview of various techniques from the literature to get lower bounds for the min-cut problem. Section 3 contains a series of new relaxations based on semidefinite programming (SDP). We also consider convexification techniques suitable for Branch-and-Bound methods based on convex quadratic optimization, see Sect. 4. In Sect. 5.1 we investigate reformulations of our SDP relaxations where strictly feasible points exist. This is crucial for algorithms based on interior-point methods. We also show equivalence of some of the here introduced SDP relaxations with the SDP relaxations from the literature, see Sect. 5.2. In particular, we prove that SDP relaxations with matrices of order \(2n+1\) introduced here are equivalent to the SDP relaxations with matrices of order \(3n+1\) from the literature. This reduction in the size of the matrix variable enables us to further improve and compute SDP bounds by adding the facet defining inequalities of the boolean quadric polytope. Symmetry \((m_{1}=m_{2})\) is investigated in Sect. 6. We also address the problem of getting feasible 0–1 solutions by standard rounding heuristics, see Sect. 7. Section 8 provides computational results on various classes of graphs taken from the literature, and Sect. 9 final conclusions.

Example 1

The following graph will be used to illustrate the various bounding techniques discussed in this paper. The vertices are selected from a \(17 \times 17\) grid using the following MATLAB commands to make them reproduceable.

$$\begin{aligned} \mathtt{rand('seed',27072015), Q=rand(17)<0.33} \end{aligned}$$

This results in \(n=93\) vertices which correspond to the nonzero entries in Q. These are located at the grid points (ij) in case \(Q_{ij} \not =0\). Two vertices are joined by an edge whenever their distance is at most \(\sqrt{10}\). The resulting graph with \(|E|=470\) is displayed in Fig. 1. For \(m=(44,~43,~ 6)^{{\mathrm { T}}}\) we find a partition which leaves 7 edges between \(S_{1}\) and \(S_{2}\). We will in fact see later on that this partition is optimal for the specific choice of m. Vertices in \(S_{3}\) are marked by ’*’, the edges in the cut between \(S_{1}\) and \(S_{2}\) are plotted with the thickest lines, those with one endpoint in \(S_{3}\) are dashed.

Fig. 1
figure 1

The example graph: vertices in \(S_{3}\) are marked by ‘*’, the edges in the cut are plotted with the thickest lines

Notation

The space of \(k\times k\) symmetric matrices is denoted by \(\mathcal {S}_k\), and the space of \(k\times k\) symmetric positive semidefinite matrices by \(\mathcal {S}^+_k\). The space of symmetric matrices is considered with the trace inner product \(\langle A,B\rangle =\mathrm{tr}(AB)\). We will sometimes also use the notation \(X \succeq 0\) instead of \(X \in \mathcal {S}^+_k\), if the order of the matrix is clear from the context.

We will use matrices having a block structure. We denote a sub-block of a matrix Y such as in Eq. (6) by \(Y_{ij}\) or \(Y_{i}\). In contrast we indicate the (ij) entry of a matrix Y by \(Y_{i,j}\). For two matrices \(X,Y\in {\mathbf {R}}^{p\times q}\), \(X\ge Y\) means \(X_{i,j}\ge Y_{i,j}\), for all ij.

To denote the ith column of the matrix X we write \(X_{:,i}\). J and e denote the all-ones matrix and all-ones vector respectively. The size of these objects will be clear from the context. We set \(E_{ij}=e_i e_j^{\mathrm {T}}\) where \(e_{i}\) denotes column i of the identity matrix I.

The ‘\(\mathrm{vec}\)’ operator stacks the columns of a matrix, while the ‘\(\mathrm{diag}\)’ operator maps an \(n\times n\) matrix to the n-vector given by its diagonal. The adjoint operator of ‘diag’ is denoted by ‘Diag’.

The Kronecker product \(A \otimes B\) of matrices \(A \in {\mathbf {R}}^{p \times q}\) and \(B\in {\mathbf {R}}^{r\times s}\) is defined as the \(pr \times qs\) matrix composed of pq blocks of size \(r\times s\), with block ij given by \(A_{i,j}B\), \(i = 1,\ldots ,p\), \(j = 1,\ldots ,q\), see e.g., [9]. The Hadamard product of two matrices A and B of the same size is denoted by \(A\circ B\) and defined as \((A\circ B)_{ij} = A_{i,j}\cdot B_{i,j}\) for all ij.

2 Overview of relaxations for (MC)

Before we present our new relaxations for (MC) we find it useful to give a short overview of existing relaxation techniques. This allows us to set the stage for our own results and also to describe the rich structure of the problem which gives rise to a variety of relaxations.

2.1 Orthogonal relaxations based on the Hoffman–Wielandt inequality

The problem (MC) can be viewed as optimizing a quadratic objective function over the set \(P_{m}\) of partition matrices where

$$\begin{aligned} P_{m} := \{ X \in \{0,1\}^{n \times 3}: X^{{\mathrm { T}}} e=m, Xe=e\}. \end{aligned}$$

Historically the first relaxations exploit the fact that the columns of \(X \in P_{m}\) are pairwise orthogonal, more precisely \(X^{{\mathrm { T}}} X= \mathrm{Diag}(m).\) The objective function \(x_{1}^{{\mathrm { T}}} Ax_{2}\) can be expressed as \(\frac{1}{2}\langle A , XBX^{{\mathrm { T}}} \rangle \) with

$$\begin{aligned} B= \left( \begin{array}{c@{\quad }c@{\quad }c} 0 &{} 1 &{} 0\\ 1 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 \end{array} \right) . \end{aligned}$$
(2)

We recall the Hoffman–Wielandt theorem which provides a closed form solution to the following type of problem.

Theorem 1

(Hoffman–Wielandt Theorem) If \(C, ~ D \in \mathcal {S}_{n}\) with eigenvalues \(\lambda _{i}(C), \lambda _{j}(D)\), then

$$\begin{aligned} \min \{ \langle C, XDX^{{\mathrm { T}}} \rangle : ~ X^{{\mathrm { T}}} X=I \} = \min \left\{ \sum _{i} \lambda _{i}(C) \lambda _{\phi (i)}(D): ~\phi \text{ permutation }\right\} . \end{aligned}$$

The minimum on the right hand side above is attained for the permutation which recursively maps the largest eigenvalue of C to the smallest eigenvalue of D.

Donath and Hoffman [10] use this result to bound (MC) from below,

$$\begin{aligned} \mathrm{OPT}_\mathrm{MC} \ge \frac{1}{2}\min \{ \langle A, XBX^{{\mathrm { T}}}: ~ X^{{\mathrm { T}}} X=\mathrm{Diag}(m) \}. \end{aligned}$$

The fact that in this case A and B do not have the same size can easily be overcome, see for instance [11].

To get a further tightening, we introduce the Laplacian L, associated to the adjacency matrix A, which is defined as

$$\begin{aligned} L := \mathrm{Diag}(Ae)-A. \end{aligned}$$
(3)

By definition, we have \(-L =A\) outside the main diagonal, and moreover \(\mathrm{diag}(XBX^{{\mathrm { T}}})=0\) for partition matrices X. Therefore the objective function of our problem satisfies \(\langle -L, XBX^{{\mathrm { T}}} \rangle =\langle A, XBX^{{\mathrm { T}}} \rangle \). The vector e is eigenvector of L, in fact \(Le=0\), which is used in [11] to investigate the following relaxation

$$\begin{aligned} \mathrm{OPT}_\mathrm{HW} := \frac{1}{2} \min \{ \langle -L,XBX^{{\mathrm { T}}} \rangle : ~X^{{\mathrm { T}}} X=\mathrm{Diag}(m), ~Xe=e, ~ X^{{\mathrm { T}}} e = m \}. \end{aligned}$$
(4)

This relaxation also has a closed form solution based on the Hoffman–Wielandt theorem. To describe it, we need some more notation. Let \(\lambda _{2}\) and \(\lambda _{n}\) denote the second smallest and the largest eigenvalue of L, with normalized eigenvectors \(v_{2}\) and \(v_{n}\). Further, set \(\tilde{m}=(\sqrt{m_{1}}, ~ \sqrt{m_{2}}, ~ \sqrt{m_{3}})^{{\mathrm { T}}}\) and let W be an orthonormal basis of the orthogonal complement to \(\tilde{m}\), \(W^{{\mathrm { T}}} W=I_{2}, ~ W^{{\mathrm { T}}} \tilde{m}=0\). Finally, define \(\tilde{B} = \sqrt{m_{1}m_{2}}W^{{\mathrm { T}}}BW\) with factorization

$$\begin{aligned} Q^{{\mathrm { T}}} \tilde{B} Q = \mathrm{Diag}([\tau _{1}~ \tau _{2}]), \end{aligned}$$

and eigenvalues \(\tau _{1}=\frac{1}{n}(-m_{1}m_{2} - \sqrt{m_{1}m_{2}(n-m_{1})(n-m_{2})} )\) and

\(\tau _{2}=\frac{1}{n}(-m_{1}m_{2} + \sqrt{m_{1}m_{2}(n-m_{1})(n-m_{2})} )\).

Theorem 2

([12]) In the notation above we have \(\mathrm{OPT}_\mathrm{HW} = -\frac{1}{2}(\lambda _{2} \tau _{1} + \lambda _{n}\tau _{2})\) and the optimum is attained at

$$\begin{aligned} X = \frac{1}{n}e m^{{\mathrm { T}}} + (v_{2}, ~v_{n}) Q^{{\mathrm { T}}}W^{{\mathrm { T}}}\mathrm{Diag}(\tilde{m}). \end{aligned}$$
(5)

This approach has been investigated for general graph partition with specified sizes \(m_{1}, \ldots , m_{k}\). We refer to [11, 13] for further details. More recently, Pong et al. [14] explore and extend this approach for the generalized min-cut problem.

The solution given in closed form through the eigenvalues of the input matrices makes it attractive for large-scale instances, see [14]. The drawback lies in the fact that it is difficult to introduce additional constraints into the model while maintaining the applicability of the Hoffman–Wielandt theorem. This can be overcome by moving to relaxations based on semidefinite optimization.

2.2 Relaxations using semidefinite optimization

The relaxations underlying the Hoffman–Wielandt theorem can equivalently be expressed using semidefinite optimization. We briefly describe this connection and then we consider more general models based on semidefinite optimization. The key tool here is the following theorem of Anstreicher and Wolkowicz [15], which can be viewed as an extension of the Hoffman–Wielandt theorem.

Theorem 3

([15]) Let \(C,D \in \mathcal {S}_{n}\). Then

$$\begin{aligned} \min \{ \langle C,XDX^{{\mathrm { T}}} \rangle : X^{{\mathrm { T}}}X=I \} = \max \{ \mathrm{tr}(S)+\mathrm{tr}(T): D \otimes C - S \otimes I -I \otimes T \succeq 0 \}. \end{aligned}$$

Based on this theorem, Povh and Rendl [16] show that the optimal value of (4) can equivalently be expressed as the optimal solution of the following semidefinite program with matrix Y of order 3n.

Theorem 4

[16] We have

$$\begin{aligned} \mathrm{OPT}_\mathrm{HW} = \min&~\frac{1}{2} \mathrm{tr} (-L)(Y_{12}+Y_{12}^{{\mathrm { T}}})\nonumber \\ \mathrm{s.t.} ~&\mathrm{tr} Y_{i} = m_{i}, ~ \mathrm{tr} (J Y_{i}) = m_{i}^{2}, ~ i=1,2,3\nonumber \\&\mathrm{tr}(Y_{ij}+Y_{ij}^{{\mathrm { T}}})=0, ~\mathrm{tr} J(Y_{ij}+Y_{ij}^{{\mathrm { T}}})= 2m_{i}m_{j},~j>i \nonumber \\&Y = \left( \begin{array}{ccc} Y_{1} &{} Y_{12} &{} Y_{13} \\ Y_{12}^{{\mathrm { T}}} &{} Y_{2} &{} Y_{23} \\ Y_{13}^{{\mathrm { T}}} &{} Y_{23}^{{\mathrm { T}}} &{} Y_{3} \end{array} \right) \succeq 0. \end{aligned}$$
(6)

A proof of this result is given in [16]. The proof implicitly shows that also the following holds.

Theorem 5

The following problem also has the optimal value \(OPT_{HW}\):

$$\begin{aligned} \mathrm{OPT}_\mathrm{HW} = \min ~&\frac{1}{2} \mathrm{tr} (-L)(Y_{12}+Y_{12}^{{\mathrm { T}}}) \\ \mathrm{s.t.} ~&\mathrm{tr}(Y_{i})=m_{i}, ~\mathrm{tr} (J Y_{i}) = m_{i}^{2}, ~i=1,2 \\&\mathrm{tr}(Y_{12}+ Y_{12}^{{\mathrm { T}}})=0,~ \mathrm{tr}(J (Y_{12}+Y_{12}^{{\mathrm { T}}}))=2m_{1}m_{2} \\&Y=\left( \begin{array}{cc} Y_{1} &{} Y_{12} \\ Y_{12}^{{\mathrm { T}}} &{} Y_{2} \end{array} \right) \succeq 0. \end{aligned}$$

We provide an independent proof of this theorem, which simplifies the arguments from [16]. To maintain readability, we postpone the proof to Sect. 10. The significance of these two results lies in the fact that we can compute optimal solutions for the respective semidefinite programs by simple eigenvalue computations.

The SDP relaxation from Theorem 4 can be viewed as moving from \(X \in P_{m}\) to \(Y =xx^{{\mathrm { T}}} \in \mathcal {S}_{3n}\) with \(x=\mathrm{vec}(X) \in {\mathbf {R}}^{3n}\) and replacing the quadratic terms in x by the corresponding entries in Y. The constraint \(\mathrm{tr}(Y_{1})=m_{1}\) follows from \(\mathrm{tr}(Y_{1})= \mathrm{tr}(x_{1}x_{1}^{{\mathrm { T}}})= x_{1}^{{\mathrm { T}}}x_{1}=m_{1}\). Similarly, \(\mathrm{tr}(Y_{12})=x_{1}^{{\mathrm { T}}}x_{2}=0\) and \(\mathrm{tr}(JY_{1})= (e^{{\mathrm { T}}}x_{1})^{2}= m_{1}^{2}\). Thus these constraints simply translate orthogonality of X into linear constraints on Y.

In order to derive stronger SDP relaxations than the one from Theorem 4, one can exploit the fact that for \(X \in P_{m}\) it follows that \(\mathrm{diag}(Y)=\mathrm{diag}(xx^{{\mathrm { T}}})= x\) with \(x= \mathrm{vec}(X)\). Now, the constraint \(Y- \mathrm{diag}(Y) \mathrm{diag}(Y)^{{\mathrm { T}}}=0\) may be weakened to \(Y - \mathrm{diag}(Y) \mathrm{diag}(Y)^{{\mathrm { T}}} \succeq 0\) which is well known to be equivalent to the following convex constraint

$$\begin{aligned} \begin{pmatrix} Y &{} \mathrm{diag}(Y) \\ \mathrm{diag}(Y)^{{\mathrm { T}}} &{} 1 \end{pmatrix} \succeq 0. \end{aligned}$$

The general case of \(k-\)partition leads to SDP relaxations with matrices of order \((nk+1)\), see for instance Zhao et al. [17] and Wolkowicz and Zhao [18]. In our notation, the model (4.1) from [18] has the following form:

$$\begin{aligned} \begin{array}{rl} \min &{} \frac{1}{2} \mathrm{tr} A(Y_{12}+Y_{12}^{{\mathrm { T}}}) \\ \mathrm{s.t.} &{} \mathrm{tr} Y_{i} = m_{i}, ~\mathrm{tr} (J Y_{i}) = m_{i}^{2}, ~~i=1,2,3 \\ &{}\mathrm{diag}(Y_{ij})=0, ~\mathrm{tr} J(Y_{ij}+Y_{ij}^{{\mathrm { T}}})= 2m_{i}m_{j}, ~~j>i\\ &{} Y = \left( \begin{array}{ccc} Y_{1} &{} Y_{12} &{} Y_{13} \\ Y_{12}^{{\mathrm { T}}} &{} Y_{2} &{} Y_{23} \\ Y_{13}^{{\mathrm { T}}} &{} Y_{23}^{{\mathrm { T}}} &{} Y_{3} \end{array} \right) , ~~y=\mathrm{diag}(Y), ~~ Y-yy^{{\mathrm { T}}} \succeq 0. \end{array} \end{aligned}$$
(7)

Literally speaking, the model (4.1) from [18] does not include the equations involving the \(m_{i}\) above, but uses information from the barycenter of the feasible region to eliminate these constraints by reducing the dimension of the matrix variable Y. We make this more precise in Sect. 5 below.

Further strengthening is suggested by asking \(Y \ge 0\) leading to the strongest bound contained in [18].

The min-cut problem can also be seen as a special case of the quadratic assignment problem (QAP), as noted already by Helmberg et al. [12]. This idea is further exploited by van Dam and Sotirov [19] where the authors use the well known SDP relaxation for the QAP [17], as the SDP relaxation for the min-cut problem. The resulting QAP-based SDP relaxation for the min-cut problem is proven to be equivalent to (7), see [19].

2.3 Linear and quadratic programming relaxations

The model (MC) starts with specified sizes \(m=(m_{1},m_{2},m_{3})^{{\mathrm { T}}}\) and tries to separate V(G) into \(S_{1}, S_{2}\) and \(S_{3}\) so that the number of edges between \(S_{1}\) and \(S_{2}\) is minimized. This by itself does not yield a vertex separator, but it can be used to experiment with different choices of m to eventually produce a separator.

Several papers consider the separator problem directly as a linear integer problem of the following form

$$\begin{aligned} (VS) ~~\begin{array}{cc} \min \{ e^{{\mathrm { T}}}x_{3}:&{} X=(x_{1},x_{2},x_{3}) \in \{0,1\}^{n\times 3}, ~Xe=e, ~ (x_{1})_{i} + (x_{2})_{j} \le 1 ~\\ &{} \forall [i,j] \in E, ~l_{i} \le e^{{\mathrm { T}}}x_{i} \le u_{i} ~ i=1,2 \}. \end{array} \end{aligned}$$

The constraint \(Xe=e\) makes sure that X represents a vertex partition, the inequalities on the edges inforce that there are no edges joining \(S_{1}\) and \(S_{2}\) and the last constraints are cardinality conditions on \(S_{1}\) and \(S_{2}\). The objective function looks for a separator of smallest size. We refer to Balas and de Souza [20, 21] who exploit the above integer formulation within Branch and Bound settings with additional cutting planes to find vertex separators in small graphs. Biha and Meurs [22] introduced new classes of valid inequalities for the vertex separator polyhedron and solved instances from [21] to optimality.

Hager et al. [23, 24] investigate continuous bilinear versions and show that

$$\begin{aligned} \max \{ e^{{\mathrm { T}}}x_{1} + e^{{\mathrm { T}}}x_{2} - \langle x_{1}, (A+I)x_{2} \rangle : ~0 \le x_{i} \le e,~l_{i } \le e^{{\mathrm { T}}}x_{i} \le u_{i} ~ i=1,2\} \end{aligned}$$

is equivalent to (VS). Even though this problem is intractable, as the objective function is indefinite, it is shown in [24] that this model can be used to produce heuristic solutions of good quality even for very large graphs.

A quadratic programming (QP) relaxation for the min-cut problem is derived in [14]. That convex QP relaxation is based on the QP relaxation for the QAP, see [15, 25, 26]. Numerical results in [14] show that QP bounds are weaker, but cheaper to compute than the strongest SDP bounds, see also Sect. 4.

Armbruster et al. [27] compared branch-and-cut frameworks for linear and semidefinite relaxations of the minimum graph bisection problem on large and sparse instances. Extensive numerical experiments show that the semidefinite branch-and-cut approach is superior choice to the simplex approach. In the sequel we mostly consider SDP bounds for the min-cut problem.

3 The new SDP relaxations

In this section we derive several SDP relaxations with matrix variables of order \(2n+1\) and increasing complexity. We also show that our strongest SDP relaxation provides tight min-cut bounds on a graph with 93 vertices.

Motivated by Theorem 5 and also in view of the objective function \(x_{1}^{{\mathrm { T}}}Ax_{2}\) of (MC), which makes explicit use only of the first two columns of \(X \in P_{m}\) we propose to investigate SDP relaxations of (MC) with matrices of order 2n, obtained by moving from \({x_{1} \atopwithdelims ()x_{2}}\) to \({x_{1} \atopwithdelims ()x_{2}}{x_{1} \atopwithdelims ()x_{2}}^{{\mathrm { T}}}\).

An integer programming formulation of (MC) using only \(x_{1}\) and \(x_{2}\) amounts to the following

$$\begin{aligned} \mathrm{OPT}_\mathrm{MC} = \min \{ x_{1}^{{\mathrm { T}}}Ax_{2}:~ x_{1}, x_{2} \in \{0,1\}^{n}, ~ x_{1}^{{\mathrm { T}}}e= m_{1}, ~ x_{2}^{{\mathrm { T}}}e = m_{2}, ~ x_{1}+ x_{2}\le e \}. \end{aligned}$$
(8)

This formulation has the disadvantage that its linear relaxation (\(0 \le x_{i} \le 1\)) is intractable, as the objective function \(x_{1}^{{\mathrm { T}}}Ax_{2}\) is indefinite. An integer linear version is obtained by linearizing the terms \((x_{1})_{i} (x_{2})_{j}\) in the objective function. We get

$$\begin{aligned} \mathrm{OPT}_\mathrm{MC} = \min \Bigg \{ \sum _{[ij] \in E(G)} z_{ij}+ z_{ji}: u,v \in \{0,1\}^{n}, z \ge 0, e^{{\mathrm { T}}}u=m_{1}, e^{{\mathrm { T}}}v=m_{2},\\ u+v\le e, z_{ij} \ge u_{i} + v_{j} -1, z_{ji} \ge u_{j} + v_{i} -1 ~\forall [ij] \in E(G) \Bigg \}. \end{aligned}$$

This is a binary LP with 2n binary and 2m continuous variables. Unfortunately, its linear relaxation gives a value of 0 (by setting an appropriate number of the nonzero entries in u and v to \(\frac{1}{2}\)). Even the use of advanced ILP technology, as for instance provided by GUROBI or CPLEX or similar packages, is only moderately successful on this formulation. We will argue below that some SDP models in contrast yield tight approximations to the optimal value of the integer problem.

Moving to the matrix space, we consider

$$\begin{aligned} Y=\left( \begin{array}{cc} Y_{1} &{} Y_{12} \\ Y_{12}^{{\mathrm { T}}} &{} Y_{2} \end{array} \right) ,\end{aligned}$$

where \(Y_{i}\) corresponds to \(x_{i}x_{i}^{{\mathrm { T}}}\) and \(Y_{12}\) represents \(x_{1}x_{2}^{{\mathrm { T}}}\). The objective function \(x_{1}^{{\mathrm { T}}}Ax_{2}\) becomes

$$\begin{aligned} \langle A, Y_{12} \rangle = \langle M, Y \rangle , \end{aligned}$$

where we set

$$\begin{aligned} M := \frac{1}{2} \left( \begin{matrix} 0 &{} A \\ A &{} 0 \end{matrix} \right) . \end{aligned}$$
(9)

Looking at Theorem 5 we consider the following simple SDP as our starting relaxation.

$$\begin{aligned} (SDP_{0})&\min \langle M, Y \rangle \nonumber \\&\mathrm{s.t.} \,\mathrm{tr}(Y_{i})=m_{i}, ~\mathrm{tr} (J Y_{i}) = m_{i}^{2},~ i=1,2 \end{aligned}$$
(10)
$$\begin{aligned}&\qquad \mathrm{tr}(Y_{12}+ Y_{12}^{{\mathrm { T}}})=0, ~\mathrm{tr}(J (Y_{12}+Y_{12}^{{\mathrm { T}}}))=2m_{1}m_{2} \end{aligned}$$
(11)
$$\begin{aligned}&\qquad Y=\begin{pmatrix} Y_{1} &{} Y_{12} \\ Y_{12}^{{\mathrm { T}}} &{} Y_{2} \end{pmatrix}, ~y = \mathrm{diag}(Y),\end{aligned}$$
(12)
$$\begin{aligned}&\quad \quad \begin{pmatrix} Y &{} y \\ y^{{\mathrm { T}}} &{} 1 \end{pmatrix} \succeq 0 \end{aligned}$$
(13)

This SDP captures the constraints from Theorem 5 and has \((2n+1)+6\) linear equality constraints. We have replaced \(Y \succeq 0\) by the stronger condition \(Y-yy^{{\mathrm { T}}} \succeq 0\), and we also replaced \(-L\) by A in the objective function.

There is an immediate improvement by exploiting the fact that \(x_{1} \circ x_{2} =0\) (elementwise product \((x_{1})_{i} \cdot (x_{2})_{i}=0\) is zero). Thus we also impose

$$\begin{aligned} \mathrm{diag}(Y_{12}) =0, \end{aligned}$$
(14)

which adds another n equations and makes \(\mathrm{tr} (Y_{12}+Y_{12}^{{\mathrm { T}}})=0\) redundant. We call \(SDP_{1}\) the relaxation obtained from \(SDP_{0}\) by replacing \(\mathrm{tr} (Y_{12}+Y_{12}^{{\mathrm { T}}})=0\) with \(\mathrm{diag}(Y_{12}) =0\). The equations in (14) are captured by the ‘gangster operator’ in [18]. Moreover, once these constraints are added, it will make no difference whether the adjacency matrix A or \(-L\) is used in the objective function.

Up to now we have not yet considered the inequality

$$\begin{aligned} e -x_{1} - x_{2} \ge 0, \end{aligned}$$
(15)

where \(x_1\) (resp. \(x_2\)) represents the first n (resp. last n) coordinates of y.

Lemma 6

Let Y satisfy (12), (13) and (14). Then \(\mathrm{diag}(Y_1) + \mathrm{diag}(Y_2) \le e\).

Proof

The submatrix in (13) indexed by \((i,n+i, 2n+1)\) and \(i\in \{1,\ldots , n\}\) is positive semidefinite, i.e.,

$$\begin{aligned} \left( \begin{matrix} y_{i} &{} 0 &{} y_{i} \\ 0 &{} y_{i+n} &{} y_{i+n}\\ y_{i} &{} y_{i+n} &{} 1 \end{matrix} \right) \succeq 0. \end{aligned}$$

The proof of the lemma follows from the following inequality

$$\begin{aligned} (-1,-1,1)^{{\mathrm { T}}} \left( \begin{matrix} y_{i} &{} 0 &{} y_{i} \\ 0 &{} y_{i+n} &{} y_{i+n}\\ y_{i} &{} y_{i+n} &{} 1 \end{matrix} \right) (-1,-1,1) \ge 0 \end{aligned}$$

\(\square \)

In order to obtain additional linear constraints for our SDP model, we consider (15) and

$$\begin{aligned}y \ge 0, ~e-y \ge 0, \end{aligned}$$

which we multiply pairwise and apply linearization. A pairwise multiplication of individual inequalities from (15) yields

$$\begin{aligned}&1- (y_{i}+y_{j}+y_{n+i}+y_{n+j})+Y_{i,j}+Y_{i,n+j}+Y_{j,n+i}+Y_{n+i,n+j} \ge 0\nonumber \\&\quad \forall i<j \in V(G). \end{aligned}$$
(16)

We also get

$$\begin{aligned} y_{j} - Y_{i,j} -Y_{n+i,j} \ge 0, \quad y_{n+j} -Y_{i,n+j} -Y_{n+i,n+j} \ge 0~~\forall i,j \in V(G),~ i\not =j\nonumber \\ \end{aligned}$$
(17)

by multiplying individual constraints from (15) and \(y\ge 0\). Finally we get in a similar way by multiplying with \(e-y\ge 0\)

$$\begin{aligned} 1 -y_{i}- y_{n+i} - y_{j} + Y_{i,j} + Y_{n+i,j} \ge 0 \end{aligned}$$
(18)
$$\begin{aligned} 1 -y_{i}- y_{n+i} - y_{n+j} + Y_{i,n+j} + Y_{n+i,n+j} \ge 0 ~~\forall i,j \in V(G),~ i \not =j. \end{aligned}$$
(19)

The inequalities (16)–(19) are based on a technique known as the reformulation-linearization technique (RLT) that was introduced by Sherali and Adams [28].

In order to strengthen our SDP relaxation further, one can add the following facet defining inequalities of the boolean quadric polytope (BQP), see e.g., [29],

$$\begin{aligned} \begin{array}{c} 0 \le Y_{i,j} \le Y_{i,i} \\ Y_{i,i} + Y_{j,j} \le 1 + Y_{i,j} \\ Y_{i,k} + Y_{j,k} \le Y_{k,k} + Y_{i,j} \\ Y_{i,i} + Y_{j,j} +Y_{k,k} \le Y_{i,j}+Y_{i,k} + Y_{j,k} +1. \end{array} \end{aligned}$$
(20)

In our numerical experiments we will consider the following relaxations which we order according to their computational effort.

Name

Constraints

Complexity

\(SDP_0\)

(10)–(13)

O(n)

\(SDP_1\)

(10)–(13), (14)

O(n)

\(SDP_2\)

(10)–(13), (14), \(Y_{i,j}\ge 0\) if \(M_{i,j}>0\)

\(O(n + |E|)\)

\(SDP_3\)

(10)–(13), (14), \(Y_{i,j}\ge 0\) if \(M_{i,j}>0\), (16)–(19)

\(O(n^2)\)

\(SDP_4\)

(10)–(13), (14), (16)–(19), (20)

\(O(n^3)\)

The first two relaxations can potentially produce negative lower bounds, which would make them useless. The remaining relaxations yield nonnegative bounds due to the nonnegativity condition on Y corresponding to the nonzero entries in M.

Example 2

We continue with the example from the introduction and provide the various lower bounds introduced so far, see Table 1. We also include the value of the best known feasible solution, which we found using a rounding heuristic described in Sect. 7 below. In most of these cases \(OPT_\mathrm{eig}\), \(SDP_0\) and \(SDP_1\) bounds are negative but we know that \(\mathrm{OPT}_\mathrm{MC} \ge 0\). In contrast, the strongest bound \(SDP_{4}\) proves optimality of all the solutions found by our heuristic. Here we do not solve \(SDP_{3}\) and \(SDP_{4}\) exactly. The \(SDP_{3}\) (resp. \(SDP_{4}\)) bounds are obtained by adding the most violated inequalities of type (16)–(19) (resp. (16)–(19) and (20)) to \(SDP_2\). The cutting plane scheme adds at most 2n violated valid constraints in each iteration and performs at most 15 iterations. It takes about 6 minutes to compute \(SDP_{4}\) bound for fixed m. We compute \(SDP_{2}\) (resp. \(SDP_{3}\)) bound in about 5 s (resp. 2 minutes) for fixed m.

For comparison purposes, we computed also linear programming (LP) bounds. The LP bound \(RLT_3\) incorporates all constraints from \(SDP_{3}\) except the SDP constraint, including of course standard linearization constraints. The \(RLT_3\) bound for \(m=(45,44,4)^{{\mathrm { T}}}\) is zero. Similarly, we derive the linear programming bound \(RLT_4\) that includes all constraints from \(SDP_{4}\) except the SDP constraint. We solve \(RLT_4\) approximately by cutting plane scheme that first adds all violated (16)–(19) constraints and then at most 4n violated constraints of type (20), in each iteration of the algorithm. After 100 such iterations the bound \(RLT_4\) was still zero.

We find it remarkable that even the rather expensive model \(SDP_{3}\) is not able to approximate the optimal solution value within ‘reasonable’ limits. On the other hand, the ‘full’ model \(SDP_{4}\) is strong enough to actually solve these problems. We will see in the computational section, that only the full model is strong enough to actually get good approximations also on instances from the literature.

Table 1 Lower bounds for the example graph

4 Bounds based on convex quadratic optimization

Convex quadratic programming bounds are based on the fact that minimizing a convex quadratic function over a polyhedron is tractable and moreover there are several standard solvers available for this type of problem.

In our case the objective function f(x) is not convex. It can however be reformulated as a convex quadratic L(x) such that \(f(x)= L(x)\) for all feasible 0–1 solutions by exploiting the fact that \(x \circ x=x\) for 0–1 valued x. This convexification is based on Lagrangian duality and has a long history in nonlinear optimization, see for instance Hammer and Rubin [30] and Shor [31, 32]. Lemarechal and Oustry [33], Faye and Roupin [34] and Billionet et al. [35] consider convexification of quadratic problems and the connection to semidefinite optimization. We briefly summarize the theoretical background behind this approach.

4.1 Convexification of 0–1 problems

We first recall the following well-known facts from convex analysis. Let \(f(x):= x^{{\mathrm { T}}}Qx + 2q^{{\mathrm { T}}}x + q_{0}\) for \(q_{0} \in {\mathbf {R}}, q \in {\mathbf {R}}^{n}\) and \(Q \in S_{n}\). Then

$$\begin{aligned} \inf _{x} f(x) > -\infty ~ \Longleftrightarrow ~ Q \succeq 0 \text{ and } \exists \xi \in {\mathbf {R}}^{n} \text{ such } \text{ that } q=Q\xi , \end{aligned}$$

due to the first order (\(Qx+q=0\)) and second order (\(Q \succeq 0\)) necessary optimality conditions. The following proposition summarizes what we need later for convexification.

Proposition 7

Let \(Q \succeq 0\) and \(q=Q\xi \) for some \(\xi \in {\mathbf {R}}^{n}\) and \(q_{0} \in {\mathbf {R}}\), and \(f(x)= x^{{\mathrm { T}}}Qx + 2q^{{\mathrm { T}}}x + q_{0}\). Set \(f^{*}:= q_{0} - \xi ^{{\mathrm { T}}}Q\xi \). Then

  1. 1.

    \(\inf \{f(x): ~ x \in {\mathbf {R}}^{n}\} = f^{*},\)

  2. 2.

    \(\inf \{\langle Q, X \rangle + 2q^{{\mathrm { T}}}x + q_{0}: ~ X-xx^{{\mathrm { T}}} \succeq 0\} = f^{*},\)

  3. 3.

    \(\sup \{q_{0} +\sigma : ~ \left( \begin{array}{cc} Q &{} q \\ q^{{\mathrm { T}}} &{} - \sigma \end{array} \right) \succeq 0 \} = f^{*}. \)

Proof

For completeness we include the following short arguments. The first statement follows from \(\nabla f(x)=0\). To see the last statement we use the factorization

$$\begin{aligned} \left( \begin{array}{cc} Q &{} Q\xi \\ (Q\xi )^{{\mathrm { T}}} &{} - \sigma \end{array} \right) = \left( \begin{array}{cc} Q^{\frac{1}{2}} &{} 0 \\ 0 &{} 1 \end{array} \right) \left( \begin{array}{cc} I &{} Q^{\frac{1}{2}}\xi \\ (Q^{\frac{1}{2}}\xi )^{{\mathrm { T}}} &{} -\sigma \end{array} \right) \left( \begin{array}{cc} Q^{\frac{1}{2}} &{} 0 \\ 0 &{} 1 \end{array} \right) . \end{aligned}$$

Using a Schur-complement argument, this implies

$$\begin{aligned} \left( \begin{array}{cc} I &{} Q^{\frac{1}{2}}\xi \\ (Q^{\frac{1}{2}}\xi )^{{\mathrm { T}}} &{} -\sigma \end{array} \right) \succeq 0 \Longleftrightarrow -\sigma - \xi ^{{\mathrm { T}}}Q\xi \ge 0, \end{aligned}$$

which shows that the supremum is attained at \(-\xi ^{{\mathrm { T}}}Q\xi \) with optimal value \(q_{0} - \xi ^{{\mathrm { T}}}Q\xi \). Finally, the second problem is the dual of the third, with strictly feasible solution \(X=I, ~ x=0\). \(\square \)

We are going to describe the convexification procedure for a general problem of the form

$$\begin{aligned} \min \{ f(x): ~ x \in \{0,1\}^{n}, ~ Dx=d, ~ Cx \ge c \} \end{aligned}$$
(21)

for suitable data DdCc. In case of (MC) we have \(x={x_{1} \atopwithdelims ()x_{2}} \in {\mathbf {R}}^{2n}\) with \(n+2\) constraints \(e^{{\mathrm { T}}}x_{1}= m_{1}, ~ e^{{\mathrm { T}}}x_{2} = m_{2}, ~ x_{1} + x_{2} \le e\).

The key idea is to consider a relaxation of the problem where integrality of x is expressed by the quadratic equation \(x \circ x = x\). Let us consider the following simple relaxation

$$\begin{aligned} \inf \{ f(x): ~ x \circ x= x, ~ Dx=d\} \end{aligned}$$
(22)

to explain the details. Its Lagrangian is given by

$$\begin{aligned} L(x; u, \alpha ) = f(x) + \langle u, x \circ x-x \rangle + \langle \alpha , d -Dx \rangle . \end{aligned}$$

The associated Lagrangian dual reads

$$\begin{aligned} \sup _{u, \alpha } ~ \inf _{x}~ L(x; u,\alpha ). \end{aligned}$$

Ignoring values \((u, \alpha )\) where the infimum is \(-\infty \), this is by Proposition 7 equivalent to

$$\begin{aligned} \sup _{u, \alpha } ~ \sup _{\sigma } \{ q_{0} + \langle \alpha , d \rangle + \sigma : ~ \left( \begin{array}{cc} Q + \mathrm{Diag}(u)&{} q - \frac{1}{2}D^{{\mathrm { T}}}\alpha - \frac{1}{2}u\\ (.)^{{\mathrm { T}}} &{} - \sigma \end{array} \right) \succeq 0 \}. \end{aligned}$$
(23)

This is a semidefinite program with strictly feasible points (by selecting u and \(-\sigma \) large enough and \(\alpha =0\)). Hence its optimal value is equal to the value of the dual problem, which reads

$$\begin{aligned} \inf \{ \langle Q,X \rangle + 2q^{{\mathrm { T}}}x + q_{0}: ~ \left( \begin{array}{cc} X &{} x \\ x^{{\mathrm { T}}} &{} 1 \end{array} \right) \succeq 0, ~\mathrm{diag}(X)=x, ~ Dx=d \}. \end{aligned}$$
(24)

Let \((u^{*}, \alpha ^{*}, \sigma ^{*})\) be an optimal solution of (23). Then we have

$$\begin{aligned}&q_{0} + \langle \alpha ^{*},d\rangle + \sigma ^{*}\\&\quad = \sup _{\sigma } \{ q_{0} + \langle \alpha ^{*},d\rangle + \sigma : ~ \left( \begin{array}{cc} Q + \mathrm{Diag}(u^{*})&{} q - \frac{1}{2}D^{{\mathrm { T}}}\alpha ^{*} - \frac{1}{2}u^{*}\\ (.)^{{\mathrm { T}}} &{} - \sigma \end{array} \right) \succeq 0 \}. \end{aligned}$$

Using Proposition 7 again we get the following equality

$$\begin{aligned} \inf _{x} L(x; u^{*}, \alpha ^{*}) = q_{0}+ \langle \alpha ^{*},d\rangle + \sigma ^{*}. \end{aligned}$$
(25)

The proposition also shows that \(L(x; u^{*}, \alpha ^{*})\) is convex (in x) and moreover \(L(x; u^{*}, \alpha ^{*})=f(x)\) for all integer feasible solutions x. The convex quadratic programming relaxation of problem (21), obtained from (22), consists in minimizing \(L(x; u^{*}, \alpha ^{*})\) over the polyhedron

$$\begin{aligned} 0 \le x \le e, ~ Dx=d, ~Cx\ge c. \end{aligned}$$

We close the general description of convexification with the following observation which will be used later on.

Proposition 8

Let \((X^{*},x^{*})\) be optimal for (24) and \((u^{*}, \alpha ^{*}, \sigma ^{*})\) be optimal for (23). Then

$$\begin{aligned} \inf _{x} L(x;u^{*},\alpha ^{*}) = L(x^{*},u^{*},\alpha ^{*}) = q_{0} + \langle \alpha ^{*},d \rangle + \sigma ^{*}. \end{aligned}$$

Thus the main diagonal \(x^{*}\) from the semidefinite program (24) is also a minimizer for the unconstrained minimization of L(x; .).

Proof

From \(X^{*} - x^{*}x^{*{{\mathrm { T}}}} \succeq 0\) and \(Q+\mathrm{Diag}(u^{*}) \succeq 0\) we get

$$\begin{aligned} x^{*{{\mathrm { T}}}}(Q+ \mathrm{Diag}(u^{*}))x^{*} \le \langle Q+\mathrm{Diag}(u^{*}),X^{*} \rangle . \end{aligned}$$
(26)

We also have, using (25) and strong duality

$$\begin{aligned} \inf _{x}L(x; u^{*}, \alpha ^{*}) = q_{0}+\langle \alpha ^{*},d \rangle + \sigma ^{*} = \langle Q, X^{*} \rangle + 2q^{{\mathrm { T}}}x^{*} + q_{0}. \end{aligned}$$

Feasibility of \((X^{*},x^{*})\) for (24) shows that the last term is equal to

$$\begin{aligned} \langle Q + \mathrm{Diag}(u^{*}), X^{*} \rangle +2q^{{\mathrm { T}}}x^{*} + q_{0} - u^{*{{\mathrm { T}}}}x^{*} + \alpha ^{*{{\mathrm { T}}}}(d-Dx^{*}). \end{aligned}$$

Finally, using (26), this term is lower bounded by \(L(x^{*}; u^{*},\alpha ^{*})\). \(\square \)

The relaxation (22) is rather simple. In [35] it is suggested to include all equations obtained by multiplying the constraints \(e^{{\mathrm { T}}}x_{1}=m_{1}, e^{{\mathrm { T}}}x_{2}=m_{2}\) with \(x_{(i)}\) and \(1-x_{(i)}\), where \(x_{(i)}\) denotes the \(i-\)th component of x. The inclusion of quadratic constraints is particularly useful, as their multipliers provide additional degrees of freedom for the Hessian of the Lagrangian function.

The main insight from the analysis so far can be summarized as follows. Given a relaxation of the original problem, such as (22) above, form the associated semidefinite relaxation obtained from relaxing \(X-xx^{{\mathrm { T}}}=0\) to \(X-xx^{{\mathrm { T}}} \succeq 0\). Then the optimal solution of the dual problem yields the desired convexification.

4.2 Convexifying (8)

Let us now return to (MC) given in (8). We have \(x={x_{1} \atopwithdelims ()x_{2}} \in {\mathbf {R}}^{2n}\) and \(f(x)= x^{{\mathrm { T}}}Mx\) where M is given in (9). The following models are natural candidates for convexification. We list them in increasing order of computational effort to determine the convexification.

  • Starting with \(\inf \{ f(x): x \circ x = x \}\) we get the Lagrangian relaxation

    $$\begin{aligned}&\min \{ \langle M, Y \rangle :~ Y -yy^{{\mathrm { T}}} \succeq 0, y = \mathrm{diag}(Y) \}\\&\quad = \max _{\sigma , u} \{\sigma : \left( \begin{array}{cc} M+\mathrm{diag}(u) &{} -\frac{1}{2} u \\ -\frac{1}{2}u^{T} &{} -\sigma \end{array} \right) \succeq 0 \} .\end{aligned}$$

    Let \(\sigma ^{*}, u^{*}\) be an optimal solution to the last problem above. The Lagrangian \(L(x;u^{*})= x^{{\mathrm { T}}}Mx + \langle u^{*}, x \circ x -x\rangle = x^{{\mathrm { T}}}(M+\mathrm{Diag}(u^{*}))x- u^{*{{\mathrm { T}}}}x\) is convex and \(f(x)=L(x; u^{*})\) for all \(x \in \{0,1\}^{2n}\). The convex QP bound based on this convexification is therefore given by

    $$\begin{aligned} \min \{ L(x; u^{*}): x= {x_{1} \atopwithdelims ()x_{2}},~ e^{{\mathrm { T}}}x_{1}=m_{1},~ e^{{\mathrm { T}}}x_{2}=m_{2},~ x_{1}+x_{2} \le e,~ x \ge 0 \}. \end{aligned}$$
    (27)

    The unconstrained minimization of L would result in

    $$\begin{aligned} \inf \{ L(x;u^{*}): x \in {\mathbf {R}}^{2n} \}. \end{aligned}$$
    (28)
  • In the previous model, no information of the \(m_{i}\) is used in the convexification. We next include also the equality constraints

    $$\begin{aligned} e^{{\mathrm { T}}}x_{i}= m_{i},~ (e^{{\mathrm { T}}}x_{i})^{2}= m_{i}^{2}, ~(i=1,2), ~~ (e^{{\mathrm { T}}}x_{1})(e^{{\mathrm { T}}}x_{2}) =m_{1}m_{2},~ x_{1}^{{\mathrm { T}}}x_{2} = 0. \end{aligned}$$

    The Lagrangian relaxation corresponds to \(SDP_{0}\). The dual solution of \(SDP_{0}\) yields again the desired convexification.

  • Finally, we replace \(x_{1}^{{\mathrm { T}}}x_{2}\) by \(x_{1} \circ x_{2} = 0\) above and get \(SDP_{1}\) as Lagrangian relaxation. In this case we know from Lemma 6 and Proposition 8 that the convex QP bound is equal to the value of \(SDP_{1}\) which in turn is equal to the unconstrained minimum of the Lagrangian L.

Example 3

We apply the convexification as explained above to the example graph from the introduction, see Table 2. In the first two cases we provide the unconstrained minimum along with the resulting convex quadratic programming bound. In case of \(SDP_{1}\) we know from the previous proposition that the unconstrained minimum agrees with the optimal value of \(SDP_{1}\). These bounds are not very useful, as we know a trivial lower bound of zero in all cases. On the other hand, the convex QP bound is computationally cheap compared to solving SDP, and may be useful in a Branch-and-Bound process.

Table 2 Convexification for the example graph

Convex quadratic programming bounds may play crucial role in solving non-convex quadratic 0–1 problems to optimality, see for instance the work of Anstreicher et al. [25] on the quadratic assignment problem. Here we presented a general framework for obtaining convexifications, partly iterating the approach from [35]. Compared to Table 1, we note that the bounds based on convexification and using convex QP are not competitive to the strongest SDP bounds. On the other hand, these bounds are much cheaper to compute, so their use in a Branch and Bound code may still be valuable. Implementing such bounds within a Branch and Bound framework is out of the scope of the present paper, so we will leave this for future research.

5 The Slater feasible versions of the SDP relaxations

In this section we present the Slater feasible versions of here introduced SDP relaxations. In particular, in Sect. 5.1 we derive the Slater feasible version of the relaxation \(SDP_1\), and in Sect. 5.2 present the Slater feasible version of the SDP relaxation (7) from [18]. In Sect. 5.2 we prove that the SDP relaxations \(SDP_1\) and (7) are equivalent, and that \(SDP_3\) with additional nonnegativity constraints on all matrix elements is equivalent to the strongest SDP relaxation from [18]. We actually show here that our strongest SDP relaxation with matrix variable of order \(2n+1\), i.e., \(SDP_4\) dominates the currently strongest SDP relaxation with matrix variable of order \(3n+1\).

5.1 The projected new relaxations

In this section we take a closer look at the feasible region of our basic new relaxation \(SDP_{0}\). The following lemma will be useful.

Lemma 9

Suppose \(X-xx^{{\mathrm { T}}} \succeq 0, ~x =\mathrm{diag}(X),\) and there exists \(a \not =0\) such that \(a^{{\mathrm { T}}}(X-xx^{{\mathrm { T}}})a=0\). Set \(t:=a^{{\mathrm { T}}}x\). Then \((a^{{\mathrm { T}}},-t)^{{\mathrm { T}}}\) is eigenvector of \(\left( \begin{array}{cc} X &{} x \\ x^{{\mathrm { T}}} &{} 1 \end{array} \right) \) to the eigenvalue 0.

Proof

If \(X-xx^{{\mathrm { T}}} \succeq 0\) and \(a^{{\mathrm { T}}}(X-xx^{{\mathrm { T}}})a=0\), then \((X-xx^{{\mathrm { T}}})a=0\), hence \(Xa=tx.\) From this the claim follows.\(\square \)

We introduce some notation to describe the feasible region of \(SDP_{0}\). Let Z be a symmetric \((2n+1) \times (2n+1)\) matrix with the block form

$$\begin{aligned} Z = \begin{pmatrix} Y_{1} &{} Y_{12} &{} y_1\\ Y_{12}^{{\mathrm { T}}} &{} Y_{2} &{} y_2 \\ y_1^{{\mathrm { T}}} &{} y_2^{{\mathrm { T}}} &{} 1 \end{pmatrix} \end{aligned}$$
(29)

as in the definition of \(SDP_{0}\). We define

$$\begin{aligned} F_{1} := \{Z: Z \text{ satisfies } (29), (10), (12), (13) \text{ and } \mathrm{tr}(J(Y_{12}+ Y_{12}^{{\mathrm { T}}})) = 2m_{1}m_{2} \}. \end{aligned}$$
(30)

The set \(F_{1}\) differs from the feasible region of \(SDP_{0}\) only in the constraint \(\mathrm{tr}(Y_{12} + Y_{12}^{{\mathrm { T}}})=0\) which is not included in \(F_{1}\).

Lemma 10

Let \(Z \in F_{1}\). Then \(T= \left( \begin{array}{cc} e &{} 0 \\ 0 &{} e \\ -m_{1} &{} -m_{2} \end{array} \right) \) is contained in the nullspace of Z.

Proof

The vector \(a^{{\mathrm { T}}}:=(e^{{\mathrm { T}}}, 0,\ldots , 0)\) satisfies \(a^{{\mathrm { T}}}y=m_{1}\) and \(a^{{\mathrm { T}}}(Y -yy^{{\mathrm { T}}})a=0\). Therefore, using the previous lemma, the first column of T is eigenvector of Z to the eigenvalue 0. A similar argument applies to \(a^{{\mathrm { T}}} :=(0,\ldots , 0,e^{{\mathrm { T}}})\). \(\square \)

Let \(V_{n}\) denotes a basis of \(e^{\bot }\), for instance

$$\begin{aligned} V_n := \begin{pmatrix} I \\ -e^{{\mathrm { T}}} \end{pmatrix} \in {\mathbf {R}}^{n\times (n-1)}. \end{aligned}$$
(31)

Then the matrix

$$\begin{aligned} W := \left( \begin{array}{ccc} V_n &{} 0 &{} \frac{m_1}{n} e \\ 0 &{} V_n &{} \frac{m_2}{n} e \\ 0 &{} 0 &{} 1 \end{array} \right) \end{aligned}$$
(32)

forms a basis of the orthogonal complement to T, \(W^{{\mathrm { T}}}T = 0\). Using the previous lemma, we conclude that \(Z \in F_{1}\) implies that \(Z=WUW^{{\mathrm { T}}}\) for some \(U \in {\mathcal S}^+_{2n-1}\). Let us also introduce the set

$$\begin{aligned} F_{2} := \{ WUW^{{\mathrm { T}}}: U \in {\mathcal S}^+_{2n-1},~ U_{2n-1,2n-1} = 1,~ \mathrm{diag}(WUW^{{\mathrm { T}}}) = WUW^{{\mathrm { T}}}e_{2n+1} \}. \end{aligned}$$

Here \(e_{2n+1}\) is the last column of the identity matrix of order \(2n+1\). In the following theorem we prove that sets \(F_1\) and \(F_2\) are equal. Similar results are also shown in the connection with the quadratic assignment problem, see e.g., [17].

Theorem 11

\(F_{1} = F_{2}.\)

Proof

We first show that \(F_{1} \subseteq F_{2}\) and take \(Z \in F_{1}\). The previous lemma implies that Z is of the form \(Z=WUW^{{\mathrm { T}}}\) and \(U \succeq 0\). \(Z_{2n+1,2n+1}=1\) implies that \(U_{2n-1,2n-1}=1\) due to the way W is defined in (32). The main diagonal of Z is equal to its last column, which translates into \(\mathrm{diag}(Z)=Ze_{2n+1}\), so \(Z \in F_{2}\).

Conversely, consider \(Z =WUW^{{\mathrm { T}}}\in F_{2}\) and let it be partitioned as in (29). We have \(W^{{\mathrm { T}}}T=0,\) so \(ZT=0\). Multiplying out columnwise and using the block form of Z we get

$$\begin{aligned} Y_{i}e = m_{i}y_{i},~ y_{i}^{{\mathrm { T}}}e = m_{i} \text{ and } Y_{12}^{{\mathrm { T}}}e=m_{1}y_{2},~ Y_{12}e = m_{2}y_{1}. \end{aligned}$$

From this we conclude that \(\mathrm{tr} (J Y_{i}) = e^{{\mathrm { T}}}Y_{i}e = m_{i}e^{{\mathrm { T}}}y_{i} = m_{i}^{2}\) and \(\mathrm{tr}(J(Y_{12}+ Y_{12}^{{\mathrm { T}}}))= e^{{\mathrm { T}}}(Y_{12} + Y_{12}^{{\mathrm { T}}})e = 2m_{1}m_{2}\). Finally, \(\mathrm{diag}(Z)=Ze_{2n+1}\) yields \(\mathrm{diag}(Y_{i})= y_{i}\) and we have \(\mathrm{tr} Y_{i} = e^{{\mathrm { T}}}\mathrm{diag}(Y_{i})= e^{{\mathrm { T}}}y_{i} = m_{i}\), hence \(Z \in F_{1}\). \(\square \)

We conclude by arguing that \(F_{2}\) contains matrices where \(U\succ 0\). To see this we note that the barycenter of the feasible set is

$$\begin{aligned} \hat{Z} = \left( \begin{array}{ccc} \frac{m_1}{n} I + \frac{m_1(m_1-1)}{n(n-1)}(E-I) &{} \frac{m_1m_2}{n(n-1)}(E-I) &{} \frac{m_1}{n} e \\ \frac{m_1m_2}{n(n-1)}(E-I) &{} \frac{m_2}{n} I + \frac{m_2(m_2-1)}{n(n-1)}(E-I) &{} \frac{m_2}{n} e \\ \frac{m_1}{n} e^{{\mathrm { T}}} &{} \frac{m_2}{n} e^{{\mathrm { T}}} &{} 1 \end{array} \right) . \end{aligned}$$

Since \(\hat{Z} \in F_{2}\) it has the form \(\hat{Z}= W\hat{U}W^{{\mathrm { T}}}\). It can be derived from the results in Wolkowicz and Zhao [18, Theorem 3.1.] that it has a two-dimensional nullspace, so \(\hat{U}\succ 0\).

This puts us in a position to rewrite our relaxations as SDP having Slater points. In case of \(SDP_{1}\), we only need to add the condition \(\mathrm{diag}(Y_{12})=0\) to the constraints defining \(F_{2}\). It can be expressed in terms of Z as \(e_{i}^{{\mathrm { T}}}Z e_{n+i} = 0 ~ i=1, \ldots n.\) Here \(e_{i}\) and \(e_{n+i}\) are the appropriate columns of the identity matrix of order \(2n+1\). We extend the matrix in the objective function by a row and column of zeros,

$$\begin{aligned} \hat{M} = \begin{pmatrix} M &{} 0 \\ 0 &{} 0 \end{pmatrix} \in {\mathcal S}_{2n+1} \end{aligned}$$

and get the following Slater feasible version of \(SDP_{1}\) in matrices \(U \in {\mathcal S}_{2n-1} \)

$$\begin{aligned} (SDP_{1\mathrm{project}})&\min \langle W^{{\mathrm { T}}} \hat{M}W, U \rangle \\&\text{ s.t. } ~e_{i}^{{\mathrm { T}}}(WUW^{{\mathrm { T}}})e_{n+i} = 0, ~~i=1,\ldots , n \\&\quad \quad \mathrm{diag}(WUW^{{\mathrm { T}}}) = (WUW^{{\mathrm { T}}})e_{2n+1}\\&\quad \quad U_{2n-1,2n-1}=1, ~~ U\succeq 0. \end{aligned}$$

5.2 The projected Wolkowicz–Zhao relaxation and equivalent relaxations

The Slater feasible version of the SDP relaxation (7) is derived in [18] and further exploited in [14]. The matrix variable Z in (7) is of order \(3n+1\) and has the following structure

$$\begin{aligned} \bar{Z} = \begin{pmatrix} Y_{1} &{} Y_{12} &{} Y_{13} &{} y_1\\ Y_{21} &{} Y_{2} &{} Y_{23} &{} y_2\\ Y_{31} &{} Y_{32} &{} Y_{3} &{} y_3\\ y_1^{{\mathrm { T}}} &{} y_2^{{\mathrm { T}}} &{} y_3^{{\mathrm { T}}} &{} 1 \end{pmatrix}. \end{aligned}$$
(33)

As before we can identify a nullspace common to all feasible matrices. In this case it is given by the columns of \(\bar{T}\), see [18], where

$$\begin{aligned}\bar{T}= \left( \begin{array}{rrrr} e &{} 0 &{} 0 &{} I\\ 0 &{} e &{} 0 &{} I\\ 0 &{} 0 &{} e &{} I\\ -m_{1} &{} -m_{2} &{} -m_{3} &{} -e^{{\mathrm { T}}} \end{array} \right) .\end{aligned}$$

Note that this is a \((3n+1) \times (n+3)\) matrix. It has rank \(n+2\), as the sum of the first three columns is equal to the sum of the last n columns, see also [18]. A basis of the orthogonal complement to \(\bar{T}\) is given by

$$\begin{aligned} \bar{W} := \left( \begin{array}{rrr} V_n &{} 0 &{} \frac{m_1}{n} e \\ 0 &{} V_n &{} \frac{m_2}{n} e \\ -V_{n} &{} -V_{n} &{} \frac{m_{3}}{n}e \\ 0 &{} 0 &{} 1 \end{array} \right) . \end{aligned}$$
(34)

As before, we argue that feasible \(\bar{Z}\) are of the form \(\bar{Z} = \bar{W} U \bar{W}^{{\mathrm { T}}}\) with additional suitable constraints on \(U \in {\mathcal S}_{2n-1}\). It is instructive to look at the last n columns of \(\bar{Z} \bar{T}\) which, due to the block structure of \(\bar{Z}\) translate into the following equations:

$$\begin{aligned} Y_{1} + Y_{12} + Y_{13} = y_{1}e^{{\mathrm { T}}}, ~ Y_{21} + Y_{2} + Y_{23} = y_{2}e^{{\mathrm { T}}}, ~ Y_{31} + Y_{32} +Y_{3} = y_{3}e^{{\mathrm { T}}}, ~ y_{1}+y_{2}+y_{3}=e. \end{aligned}$$
(35)

Given \(Y_{1}, Y_{2}, y_{1}, y_{2}\) and \(Y_{12}\) these equations uniquely determine \(y_{3}, Y_{13}, Y_{23}\), and \(Y_{3}\) and produce the n-dimensional part of the nullspace of \(\bar{Z}\) given by the last n columns of \(\bar{T}\). We can therefore drop this linear dependent part of \(\bar{Z}\) without loosing any information. Mathematically, this is achieved as follows. Let us introduce the \((2n+1) \times (3n+1)\) matrix

$$\begin{aligned} P = \left( \begin{array}{rrrr} I_n &{} 0 &{} 0 &{} 0 \\ 0 &{} I_{n} &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 1 \end{array} \right) . \end{aligned}$$

It satisfies

$$\begin{aligned} W = P \bar{W} \end{aligned}$$

and gives us a handle to relate the relaxations in matrices of order \(3n+1\) to our models.

We recall the Slater feasible version of (7) from [18]:

$$\begin{aligned} \begin{array}{rl} &{}\min \mathrm{tr} (\bar{W}^{{\mathrm { T}}} \bar{M} \bar{W}) R \\ &{}\mathrm{s.t.} e_i^{{\mathrm { T}}}(\bar{W} R \bar{W}^{{\mathrm { T}}})e_{n+i} = 0, ~ e_i^{{\mathrm { T}}}(\bar{W} R \bar{W}^{{\mathrm { T}}})e_{2n+i} = 0,\\ &{} \quad e_{n+i}^{{\mathrm { T}}}(\bar{W} R \bar{W}^{{\mathrm { T}}})e_{2n+i} = 0, \qquad i=1,\ldots ,n\\ &{}\quad R_{ 2n-1, 2n-1} = 1, ~ R\succeq 0, ~R\in {\mathcal S}_{2n-1}, \end{array} \end{aligned}$$
(36)

where

$$\begin{aligned} \bar{M} := \begin{pmatrix} M &{} 0\\ 0 &{} 0 \end{pmatrix} \in {\mathcal S}_{3n+1}. \end{aligned}$$

In [19] it is proven that the SDP relaxation (36) is equivalent to the QAP-based SDP relaxation with matrices of order \(n^2\times n^2\). Below we prove that (36) is equivalent to the here introduced SDP relaxation.

Theorem 12

The SDP relaxation (36) is equivalent to \(SDP_{1\mathrm{project}}\).

Proof

The feasible sets of the two problems are related by pre- and postmultiplication of the input matrices by P, for instance \(P\bar{Z}P^{T}\) yields the \((2n+1) \times (2n+1)\) matrix variable of our relaxations. This operation basically removes the block of rows and columns corresponding to \(x_{3}\). Both models contain the constraint \(\mathrm{diag}(Y_{12})=0\). From (35) we conclude that

$$\begin{aligned} \mathrm{diag}(Y_{1}) + \mathrm{diag}( Y_{12}) + \mathrm{diag}( Y_{13}) = y_{1}. \end{aligned}$$

Thus \(\mathrm{diag}(Y_{13})=0\) in (36) is equivalent to \(\mathrm{diag}(Y_{1})=y_{1}\) in \(SDP_{1\mathrm{project}}\). Similarly, \(\mathrm{diag}(Y_{23})=0\) is equivalent to \(\mathrm{diag}(Y_{2})=y_{2}\). The objective function is nonzero only on the part of \(\bar{Z}\) corresponding to \(Y_{12}\). Thus the two problems are equivalent. \(\square \)

The SDP relaxation (36) with additional nonnegativity constraints \(\bar{W} R \bar{W}^{{\mathrm { T}}}\) \(\ge 0\) is investigated by Pong et al. [14] on small graphs. The results show that the resulting bound outperforms other bounding approaches described in [14] for graphs with up to 41 vertices.

It is instructive to look at the nonnegativity condition \(\bar{Z}_{ij} \ge 0\), where \(\bar{Z}\) is has the block form (33), in connection with (35). From \(Y_{1} + Y_{21} + Y_{31}= e y_{1}^{{\mathrm { T}}}\) we get

$$\begin{aligned} Y_{2n+i,j} = y_{j} - Y_{i,j} - Y_{n+i,j} \ge 0, \end{aligned}$$

which is (17). In a similar way we get from \(\bar{Z} \ge 0\) all constraints (16)–(19). This is summarized as follows.

Theorem 13

The SDP relaxation (36) with additional nonnegativity constraints is equivalent to \(SDP_{1\mathrm{project}}\) with additional constraints \(WUW^{{\mathrm { T}}}\ge 0\) and (16)–(19).

Note that \(SDP_{1\mathrm{project}}\) with additional constraints \(WUW^{{\mathrm { T}}}\ge 0\), (16)–(19) is actually \(SDP_3\) with additional nonnegativity constraints.

6 Symmetry reduction

It is possible to reduce the number of variables in \(SDP_2\) when subsets \(S_1\) and \(S_2\) have the same cardinality. Therefore, let us suppose in this section that \(m_1=m_2\). Now, we apply the general theory of symmetry reduction to \(SDP_2\) (see e.g., [36, 37]) and obtain the following SDP relaxation:

$$\begin{aligned} \begin{array}{rl} &{}\min \langle A, Y_2 \rangle \\ &{}\text{ s.t. } \mathrm{tr}(Y_1) = m_1, ~\mathrm{diag}(Y_2)=0 \\ &{}\quad \mathrm{tr}(J Y_1)=m_1^2,~\mathrm{tr}(J Y_2)=m_1^2 \\ &{} \quad Y = \begin{pmatrix} Y_1 &{} Y_2 \\ Y_2 &{} Y_1 \end{pmatrix}, ~ \begin{pmatrix} Y &{} e \otimes \mathrm{diag}(Y_1) \\ (e \otimes \mathrm{diag}(Y_1))^{{\mathrm { T}}} &{} 1 \end{pmatrix} \succeq 0\\ &{} \quad Y\ge 0~~\text{ on } \text{ support }. \end{array} \end{aligned}$$
(37)

In order to obtain the SDP relaxation (37) one should exploit the fact that for \(m_1=m_2\) the matrix variable is of the following form

$$\begin{aligned} Y= I\otimes Y_1 + (J-I)\otimes Y_2, \end{aligned}$$

for details see [37], Sect. 5.1. In particular, the above equation follows from (20), page 264 in [37], and the fact that the basis elements \(A_t\) (\(t=1,2\)) in our case are I and \(J-I\). Now, (37) follows by direct verification. In a case that a graph under consideration is highly symmetric, the size of the above SDP can be further reduced by block diagonalizing the data matrices, see e.g., [36, 37].

In order to break symmetry we may assume without loss of generality that a vertex of the graph is not in the first partition set. This can be achieved by adding a constraint, which assigns zero value to the variable that corresponds to that vertex in the first set. In general, we can perform n such fixings and obtain n different valid bounds. Similar approach is exploited in e.g., [19, 37]. If the graph under consideration has a nontrivial automorphism group, then there might be less than n different lower bounds. It is not difficult to show that each of the bounds obtained in the above described way dominates \(SDP_2\). For the numerical results on the bounds after breaking symmetry see Sect. 8.

7 Feasible solutions

Up to now our focus was on finding lower bounds on \(\mathrm{OPT}_\mathrm{MC}\). A byproduct of all our relaxations is the vector \(y= {y_{1} \atopwithdelims ()y_{2}} \in {\mathbf {R}}^{2n}\) such that \(y_{1},~y_{2} \ge 0, ~e^{{\mathrm { T}}}y_{1}=m_{1}, e^{{\mathrm { T}}} y_{2} =m_{2}\) and possibly \(y_{1} + y_{2} \le e\). We now try to generate 0–1 solutions \(x_{1}\) and \(x_{2}\) with \(x_{1}+x_{2} \le e, ~e^{{\mathrm { T}}}x_{1}=m_{1}, ~e^{{\mathrm { T}}}x_{2}=m_{2}\) such that \(x_{1}^{{\mathrm { T}}} Ax_{2}\) is small.

The hyperplane rounding idea can be applied in our setting. Feige and Langberg [38] propose random projections followed by randomized rounding (\(RPR^{2}\)) to obtain 0–1 vectors \(x_{1}\) and \(x_{2}\). In our case, we need to modify this approach to insure that \(x_{1}\) and \(x_{2}\) represent partition blocks of requested cardinalities.

It is also suggested to consider \(y_{1}\) and \(y_{2}\) and find the closest feasible 0–1 solution \(x_{1}\) and \(x_{2}\), which amounts to solving a simple transportation problem, see for instance [11].

It is also common practice to improve a given feasible solution by local exchange operations. In our situation we have the following obvious options. Fixing the set \(S_{3}\) given by \(x_{3} := e - x_{1} - x_{2}\), we apply the Kernighan-Lin local improvement [39] to \(S_{1}\) and \(S_{2}\) in order to (possibly) reduce the number of edges between \(S_{1}\) and \(S_{2}\). After that we fix \(S_{1}\) and try swapping single vertices between \(S_{2}\) and \(S_{3}\) to reduce our objective function.

Table 3 \(SDP_4\) bounds for the min-cut and corresponding bounds for separators

It turns out that carrying out these local improvement steps by cyclically fixing \(S_{i}\) until no more improvement is found leads to satisfactory feasible solutions. In fact, all the feasible solutions reported in the computational section were found by this simple heuristic.

8 Computational results

In this section we compare several SDP bounds on graphs from the literature. All bounds were computed on an Intel Xeon, E5-1620, 3.70 GHz with 32 GB memory. All relaxations were solved with SDPT3 [40].

We select the partition vector m such that \(|m_{1}-m_{2}| \le 1\). For a given graph with n vertices, the optimal value of the min-cut is monotonically decreasing when \(m_{3}\) is increasing. We select \(m_{3}\) small enough so that \(\mathrm{OPT}_\mathrm{MC}>0\) and we can also provide nontrivial (i.e., positive) lower bounds on \(\mathrm{OPT}_\mathrm{MC}\). Thus for given m we provide lower bounds (based on our relaxations) and also upper bounds (using the rounding heuristic from the previous section) on \(\mathrm{OPT}_\mathrm{MC}\). In case of a positive lower bound we also get a lower bound (of \(m_{3}+1\)) on the size of a strongly balanced (\(|m_{1} -m_{2}| \le 1\)) vertex separator. Finally, we also use our rounding heuristic and vary \(m_{3}\) to actually find vertex separators, yielding also upper bounds for their cardinality. The results are summarized in Table 3.

Matrices can-xx and bcspwr03 are from the library Matrix Market [41], grid3dt matrices are 3D cubical meshes, gridt matrices are 2D triangular meshes, and Smallmesh is a 2D finite-element mesh. Lower bounds for the min-cut problem presented in the table are obtained by approximately solving \(SDP_4\), i.e., by iteratively adding the most violated inequalities of type (16)–(19) and (20) to \(SDP_2\). In particular, we perform at most 25 iterations and each iteration includes at most 2n the most violated valid constraints. It takes 59 minutes to compute grid3dt(5) and 170 minutes to compute grid3dt(6).

One can download our test instances from the following link:

https://sites.google.com/site/sotirovr/the-vertex-separator.

All the instances in Table 3 have a lower bound of 0 for \(SDP_{2}\) and \(SDP_{3}\) while \(SDP_{4}>0\). This is a clear indication of the superiority of \(SDP_{4}\).

Table 4 provides further comparison of the our bounds. In particular, we list \(SDP_1\), \(SDP_2\), \(SDP_3\), \(SDP_4\), \(SDP_\mathrm{fix}\), and upper bounds for several graphs. \(SDP_\mathrm{fix}\) bound is obtained after breaking symmetry as described in Sect. 6. We choose m such that \(SDP_2>0\) and \(m_1=m_2\). Thus, we evaluate all bounds obtained by fixing a single vertex and report the best among them. All bounds in Table 4 are rounded up to the closest integer. The results further verify the quality of \(SDP_4\), and also show that breaking symmetry improves \(SDP_2\) but not significantly.

Table 4 Different bounds for the min-cut problem

9 Conclusion

In this paper we derive several SDP relaxations for the min-cut problem and compare them with relaxations from the literature. Our SDP relaxations have matrix variables of order 2n, while other SDP relaxations have matrix variables of order 3n.

We prove that the eigenvalue bound from [12] equals the optimal value of the SDP relaxation from Theorem 5, with matrix variable of order 2n. In [16] it is proven that the same eigenvalue bound is equal to the optimal solution of an SDP relaxation with matrix variable of order 3n. Further, we prove that the SDP relaxation \(SDP_1\) is equivalent to the SDP relaxation (36) from [18], see Theorem 12. We also prove that the SDP relaxation obtained after adding all remaining nonnegativity constraints to \(SDP_3\) is equivalent to the strongest SDP relaxation from [18], see Theorem 13. Thus, we have shown that for the min-cut problem one should consider SDP relaxations with matrix variables of order \(2n+1\) instead of traditionally considered SDP relaxations with matrices of order \(3n+1\). Consequently, our strongest SDP relaxation \(SDP_4\) also has a matrix variable of order \(2n+1\) and \(O(n^3)\) constraints. \(SDP_4\) relaxation can be solved approximately by the cutting plane schema for graphs of medium size. The numerical results verify the superiority of \(SDP_{4}\). We further exploit the resulting strong SDP bounds for the min-cut to obtain strong bounds on the size of the vertex separators.

Finally, our general framework for convexifying non-convex quadratic problems (see Sect. 4) results with convex quadratic programming bounds that are cheap to compute. Since convex quadratic programming bounds played in the past crucial role in solving several non-convex problems to optimality, we plan to exploit their potential in our future research.

10 Proof of Theorem 5

We prove this theorem by providing feasible solutions to the primal and the dual SDP which have the same objective function value.

For the primal solution we take the optimizer \(X=(x_{1} ~x_{2} ~x_{3})\) from (5) with objective value \(\mathrm{OPT}_\mathrm{HW}\) and define \(Y= \left( \begin{array}{c} x_{1} \\ x_{2} \end{array} \right) \left( \begin{array}{c} x_{1} \\ x_{2} \end{array} \right) ^{{\mathrm { T}}}. \) Feasibility of X with respect to (4) shows that Y is feasible for the SDP (for instance \(\mathrm{tr} Y_{1}= x_{1}^{{\mathrm { T}}}x_{1}=m_{1}\) and \(\mathrm{tr} J Y_{1}= (e^{{\mathrm { T}}}x_{1})^{2}= m_{1}^{2}\)).

Next we construct a dual solution with objective value \(\mathrm{OPT}_\mathrm{HW}\). Let

$$\begin{aligned} E_{1} = \left( \begin{array}{cc} 1 &{} 0 \\ 0 &{} 0 \end{array} \right) , ~ E_{2} = \left( \begin{array}{cc} 0 &{} 0 \\ 0 &{} 1 \end{array} \right) , ~ E_{12} = \left( \begin{array}{cc} 0 &{} 1 \\ 1 &{} 0 \end{array} \right) . \end{aligned}$$

With this notation, the primal constraints become

$$\begin{aligned} \langle E_{1} \otimes I,Y \rangle = m_{1}, ~ \langle E_{2} \otimes I,Y \rangle = m_{2}, ~ \langle E_{12} \otimes I,Y \rangle =0.\end{aligned}$$
$$\begin{aligned} \langle E_{1} \otimes J,Y \rangle = m_{1}^{2}, ~ \langle E_{2} \otimes J,Y \rangle = m_{2}^{2}, ~ \langle E_{12} \otimes J,Y \rangle = 2m_{1}m_{2}, \end{aligned}$$

and the objective function takes the form \(\langle E_{12} \otimes (-\frac{1}{2} L),Y \rangle .\) Thus the dual has the following form

$$\begin{aligned} \max m_{1} \alpha _{1} + m_{2} \alpha _{2} + m_{1}^{2} \beta _{1} + m_{2}^{2} \beta _{2} +2m_{1}m_{2} \beta _{12} \text{ such } \text{ that } \end{aligned}$$
$$\begin{aligned}&E_{12} \otimes (-\frac{1}{2}L) - (\alpha _{1} E_{1} + \alpha _{2}E_{2} + \alpha _{12}E_{12}) \\&\quad \otimes I - (\beta _{1} E_{1} + \beta _{2}E_{2} + \beta _{12}E_{12}) \otimes J \succeq 0. \end{aligned}$$

We recall that \(Le=0\), hence we can select an eigenvalue decomposition of L as \(L=P\mathrm{Diag}(\lambda ) P^{{\mathrm { T}}}\) where \(P=( \frac{1}{\sqrt{n}}e ~V)\) with \(V^{{\mathrm { T}}}V=I_{n-1}, ~V^{{\mathrm { T}}}e=0\) and \(\lambda =(0, \lambda _{2}, \ldots , \lambda _{n})^{{\mathrm { T}}}\) contains the eigenvalues \(\lambda _{i}\) of L in nondecreasing order. The matrix P also diagonalizes J, \(J=P \mathrm{Diag}( (n,0, \ldots , 0))P^{{\mathrm { T}}}\). We use this to rewrite \(E_{12} \otimes L\) as

$$\begin{aligned} E_{12} \otimes L = (I_{2}E_{12}I_{2}) \otimes (P \Lambda P^{{\mathrm { T}}})= (I_{2} \otimes P)(E_{12} \otimes \Lambda )(I_{2} \otimes P)^{{\mathrm { T}}}. \end{aligned}$$

The other terms are rewritten similarly by pulling out \(I_{2} \otimes P\) from left and right. Let

$$\begin{aligned} s_{1} := (n \beta _{1}+ \alpha _{1}, \alpha _{1}, \ldots , \alpha _{1})^{{\mathrm { T}}}, ~ s_{2} := (n\beta _{2}+ \alpha _{2}, \alpha _{2}, \ldots , \alpha _{2})^{{\mathrm { T}}}, \\ s_{12} :=(\alpha _{12} + n\beta _{12}, \alpha _{12} + \frac{1}{2}\lambda _{2}, \ldots , \alpha _{12} + \frac{1}{2} \lambda _{n})^{{\mathrm { T}}} \end{aligned}$$

be \(n-\)vectors and consider the block-diagonal \(2n \times 2n\) matrix

$$\begin{aligned} S = \left( \begin{array}{cc} - \mathrm{Diag}(s_{1}) &{} -\mathrm{Diag}(s_{12}) \\ - \mathrm{Diag}(s_{12}) &{} -\mathrm{Diag}(s_{2}) \end{array} \right) . \end{aligned}$$

Dual feasibility can be expressed as \( (I_{2} \otimes P)S (I_{2} \otimes P)^{{\mathrm { T}}} \succeq 0\) which holds if and only if \(S \succeq 0\). The special form of S shows, that \(S \succeq 0\) breaks down to n semidefinitess conditions in matrices of order two:

$$\begin{aligned} \left( \begin{array}{cc} -\alpha _{1}-n\beta _{1} &{} -\alpha _{12} - n\beta _{12} \\ -\alpha _{12} - n\beta _{12} &{} -\alpha _{2} - n\beta _{2} \end{array} \right) \succeq 0, \end{aligned}$$
(38)
$$\begin{aligned} \left( \begin{array}{cc} -\alpha _{1} &{} -\alpha _{12} -\frac{1}{2}\lambda _{2} \\ -\alpha _{12} -\frac{1}{2}\lambda _{2} &{} -\alpha _{2} \end{array} \right) \succeq 0,~ \ldots , \left( \begin{array}{cc} -\alpha _{1} &{} -\alpha _{12} -\frac{1}{2}\lambda _{n} \\ -\alpha _{12} -\frac{1}{2}\lambda _{n} &{} -\alpha _{n} \end{array} \right) \succeq 0. \end{aligned}$$
(39)

We now select the dual variables in the following way. We set \(\alpha _{12}:= -\frac{1}{4}(\lambda _{n}+ \lambda _{2})\) which insures that

$$\begin{aligned} -\alpha _{12}-\frac{1}{2}\lambda _{2}=\frac{1}{4}(\lambda _{n} -\lambda _{2}) \ge \ldots \ge -\alpha _{12} - \frac{1}{2}\lambda _{n} = -\frac{1}{4}(\lambda _{n} - \lambda _{2}). \end{aligned}$$

Thus all \(n-1\) constraints in (39) are satisfied if we select \(t<0\) and set

$$\begin{aligned} \alpha _{1} := \frac{1}{16t}(\lambda _{n} - \lambda _{2})<0, ~ \alpha _{2} :=t(\lambda _{n}-\lambda _{2})<0. \end{aligned}$$

Finally, setting

$$\begin{aligned} \beta _{1}=-\frac{1}{n}\alpha _{1}, ~ \beta _{2}=-\frac{1}{n}\alpha _{2}, ~ \beta _{12}=-\frac{1}{n}\alpha _{12} \end{aligned}$$

insures that the matrix in (38) is 0 and hence (38) also holds. We now have a dual feasible solution, and we conclude the proof by selecting \(t<0\) in such a way that the objective function has value \(\mathrm{OPT}_\mathrm{HW}\).

Let \(D:= {m_{1}m_{2}(n-m_{1})(n-m_{2})}\). We recall that

$$\begin{aligned}&\mathrm{OPT}_\mathrm{HW}=(m_{1}m_{2}+ \sqrt{D})\frac{\lambda _{2}}{2n} + (m_{1}m_{2}-\sqrt{D})\frac{\lambda _{n}}{2n} \\&\quad = -\frac{1}{2n}\sqrt{D}(\lambda _{n}-\lambda _{2}) + \frac{m_{1}m_{2}}{2n}(\lambda _{n}+ \lambda _{2}). \end{aligned}$$

The dual solution defined above has value

$$\begin{aligned} m_{1}\frac{\lambda _{n} -\lambda _{2}}{16t} +m_{2}t(\lambda _{n}-\lambda _2) - \frac{\lambda _{n} -\lambda _{2}}{16tn}m_{1}^{2}- \frac{\lambda _{n}-\lambda _{2}}{n}tm_{2}^{2} + \frac{\lambda _{n}+\lambda _{2}}{4n}2m_{1}m_{2}. \end{aligned}$$

Comparing the coefficients of \(\lambda _{n}-\lambda _{2}\) and \(\lambda _{n}+ \lambda _{2}\) we note that the values agree if

$$\begin{aligned} - \frac{\sqrt{D}}{2n} = \frac{t}{n}m_{2}(n-m_{2})+ \frac{1}{16tn}m_{1}(n-m_{1}). \end{aligned}$$

This equation holds for

$$\begin{aligned} t=-\frac{1}{4}\sqrt{\frac{m_{1}(n-m_{1})}{m_{2}(n-m_{2})}}. \end{aligned}$$

\(\square \)