Sum-of-squares: proofs, beliefs, and algorithms — Boaz Barak and David Steurer

Unique Games and Small Set Expansion

The Unique Games Conjecture (UGC) (Khot 2002) states that for every \(\epsilon>0\) there is some finite \(\Sigma\) such that the \(1-\epsilon\) vs. \(\epsilon\) problem for the unique games problem with alphabet \(\Sigma\) is NP hard. An instance to this problem can be viewed as a set of equations on variables \(x_1,\ldots,x_n \in \Sigma\) where each equation has the form \(x_i = \pi_{i,j}(x_j)\) where \(\pi_{i,j}\) is a permutation over \(\Sigma\).

The order of quantifiers here is crucial. Clearly there is a propagation type algorithm that can find a perfectly satisfying assignment if one exist, and as we mentioned before, it turns out that there is also an algorithm nearly satisfying assignment (e.g., satisfying \(0.99\) fraction of constraints) if \(\epsilon\) tends to zero independently of \(\Sigma\). However, when the order is changed, the simple propagation type algorithms fail. That is, if we guess an assignment to the variable \(x_1\), and use the permutation constraints to propagate this guess to an assignment to \(x_1\)’s neighbors and so on, then once we take more than \(O(1/\epsilon)\) steps we are virtually guaranteed to make a mistake, in which case our propagated values are essentially meaningless. One way to view the unique games conjecture is as stating this situation is inherent, and there is no way for us to patch up the local pairwise correlations that the constraints induce into a global picture. While the jury is still out on whether the conjecture is true as originally stated, we will see that the sos algorithm allows us in some circumstances to do just that. The small set expansion (SSE) problem (Raghavendra and Steurer 2010) turned out to be a clean abstraction that seems to capture the “combinatorial heart” of the unique games problem, and has been helpful in both designing algorithms for unique games as well as constructing candidate hard instances for it. We now describe the SSE problem and outline the known relations between it and the unique games problem.

For every undirected \(d\)-regular graph \(G=(V,E)\), and \(\delta>0\) the expansion profile of \(G\) at \(\delta\) is \[ \varphi_\delta(G) = \min_{S\subseteq V , |S|\leq \delta |V|} |E(S,\overline{S})|/(d|S|) \;. \] For \(0 < c < s <1\), the \(c\) vs \(s\) \(\delta\)-SSE problem is to distinguish, given a regular undirected graph \(G\), between the case that \(varphi_\delta(G) \leq c\) and the case that \(\varphi_\delta(G) \geq s\).

The small set expansion hypothesis (SSEH) is that for every \(\epsilon>0\) there is some \(\delta\) such that the \(\epsilon\) vs \(1-\epsilon\) \(\delta\)-SSE problem is NP hard.

We will see that every instance of unique games on alphabet \(\Sigma\) gives rise naturally to an instance of \(\tfrac{1}{|\Sigma|}\)-SSE. But the value of the resulting instance is not always the same as the original one, and hence this natural mapping does not yield a valid reduction from the unique games to the SSE problem. Indeed, no such reduction is known. However, Raghavendra and Steurer (2010) gave a highly non trivial reduction in the other direction (i.e., reducing SSE to UG) hance showing that the SSEH implies the UGC. Hence, assuming the SSEH yields all the hardness results that are known from the UGC, and there are also some other problems such as sparsest cut where we only know how to establish hardness based on the SSEH and not the UGC (Raghavendra, Steurer, and Tulsiani 2012).

Embedding unique games into small set expansion

Given a unique games instance \(I\) on \(n\) variables and alphabet \(\Sigma\), we can construct a graph \(G(I)\) (known as the label extended graph corresponding to \(I\)) on \(n|\Sigma|\) vertices where for every equation of the form \(x_i = \pi_{i,j}(x_j)\) we connect the vertex \((i,\sigma)\) with \((j,\tau)\) if \(\pi_{i,j}(\tau)=\sigma\). Note that every assignment \(x\in\Sigma^n\) is naturally mapped into a subset \(S\) of size \(n\) (i.e. \(1/|\Sigma|\) fraction) of the vertices of \(G(I)\) corresponding to the vertices \(\{ (i,x_i) : i\in [n] \}\). Let’s assume for simplicity that in the unique games instance every variable participated in exactly \(d\) of the constraints (and hence there is a total of \(nd/2\) constraints). Then the graph \(G\) will have degree \(d\) and every constraint \(x_i \neq \pi_{i,j}(x_j)\) that is violated by \(x\) contributes exactly two edges to \(E(S,\overline{S})\) (the edges \(\{ (i,x_i),(j,\pi_{i,j}^{-1}(x_i))\}\) and \(\{ (i,\pi_{i,j}(x_j)),(j,x_j)\}\)). Hence the fraction of violated constraints by \(x\) is equal to \(E(S,\overline{S})/(d|S|)\).

However, this is not a valid reduction, since we cannot do this mapping in reverse. That is, the graph \(G\) might have a set \(S\) of \(n\) vertices that does not expand that does not correspond to any assignment for the unique games. It can be shown that this reduction is valid if the constraint graph of the unique games instance is a small set expander in the sense that every set of \(o(1)\) fraction of the vertices has expansion \(1-o(1)\).

The known algorithmic results for small set expansion are identical to the ones known for unique games:

The basic SDP algorithm can solve the \(1-\epsilon\) vs \(1-O(\sqrt{\epsilon\log(1/\delta)})\) \(\delta\)-SSE problem (Raghavendra, Steurer, and Tetali 2010) which matches the \(1-\epsilon\) vs \(1-O(\sqrt{\epsilon \log |\Sigma|})\) algorithm for the \(\Sigma\)-UG problem of Charikar, Makarychev, and Makarychev (2006) (which itself generalizes the \(1-\epsilon\) vs \(1-O(\sqrt{\epsilon})\) Goemans and Williamson (1995) for Max Cut).
When \(\delta \ll \epsilon\) (specially \(\delta < \exp(\Omega(1/\epsilon))\)) then no polynomial time algorithm is known. However, Arora, Barak, and Steurer (2015) showed a \(\exp(\poly(1/\delta)n^{\poly(1/\epsilon)})\) time algorithm for the \(1-\epsilon\) vs \(1/2\) \(\delta\)-SSE problem and similarly a \(\exp(\poly(|\Sigma|)n^{\poly(1/\epsilon)})\) time algorithm for the \(1-\epsilon\) vs \(1/2\) \(\Sigma\)-UG problem.

Thus it is consistent with current knowledge that the \(1-\epsilon\) vs \(\epsilon\) \(\Sigma\)-UG computational problem has complexity equivalent (up to polynomial factors) to the \(1-\epsilon\) vs \(\epsilon\) \(\tfrac{1}{|\Sigma|}\)-SSE problem. At the moment one can think of the small set expansion problem as the “easiest” “UG like” problem and the Max-Cut (or Max \(2LIN(2)\)) problem as the “hardest” “UG like” problem where by “UG like” we mean problems that in the perfect case can be solved by propagation like algorithms and that we suspect might have a subexponential time algorithm.
(Such algorithms are known for unique games and small set expansion but not for Max Cut.) Unique Games itself can be thought of as lying somewhere between these two problems. The following figure represents are current (very partial) knowledge of the landscape of complexity for CSPs and related graph problems such as small set expansion and sparsest cut

Small set expansion as finding “analytically sparse” vectors.

We have explained that the defining feature of “unique games like” problems is the fact that they exhibit strong pairwise correlations. One way to see it is that these problems have the form of trying to optimize a quadratic polynomial \(x^\top A x\) over some “structured” set \(x\):

The maximum cut problem can be thought as the task of maximizing \(x^\top L x\) over \(x\in \{ \pm 1 \}^n\) where \(L\) is the Laplacian matrix of the graph.
The sparsest cut problem can be thought of as the task of minimizing the same polynomial \(x^\top L x\) over unit vectors (in \(\ell_2\) norm) of the form \(x\in\{0,\alpha\}^n\) for some \(\alpha>0\).
The small set expansion problem can be thought of as the task of minimizing the same polynomial \(x^\top L x\) over unit vectors of the form \(x\in \{0,\alpha\}^n\) subject to the condition that \(\norm{x}_1/\sqrt{n} \leq \sqrt{\delta}\) (which corresponds to the condition that \(\alpha \geq 1/\sqrt{n\delta}\)).
The unique games problem can be thought of as the task of minimizing the polynomial \(x^\top L x\) where \(L\) is the Laplacian matrix corresponding to the label extended graph of the game and \(x\) ranges over all vectors in \(\zo^{n|\Sigma|}\) such that for every \(i\in [n]\) there exists exactly one \(\sigma\in\Sigma\) such that \(x_{i,\sigma}=1\).

In all of these cases the computational question we are faced with amounts to what kind of global information can we extract about the unknown assignment \(x\) from the the local pairwise information we get by knowing the expected value of \((x_i-x_j)^2\) over an average edge \(i\sim j\) in the graph.

Proxies for sparsity

We see that all these problems amount to optimizing some quadratic polynomial \(q(x)\) over unit vector \(x\)’s that are in some structured set. By “guessing” the optimum value \(\lambda\) that \(q(x)\) can achieve, we can move to a “dual” version of these problems where we want to maximize \(p(x)\) over all \(x\in\R^n\) such that \(q(x)=\lambda\) where \(p(x)\) is some “reward function” that is larger the closer that \(x\) is to our structured set (equivalently, one can look at \(-p(x)\) as a function that penalizes \(x\)’s that are far from our structured set).

In fact, in our setting this \(\lambda\) will often be (close to) the minimum or maximum of \(q(\cdot)\) which means that when \(x\) achieves \(q(x)=\lambda\) then all the directional derivatives of the quadratic \(q\) (which are linear functions) vanish. Another way to say it is that the condition \(q(x)=\lambda\) will correspond to \(x\) being (almost) inside some linear subspace \(V\subseteq \R^n\) (which will be the eigenspace of \(q\), looked at as a matrix, corresponding to the value \(\lambda\) and other nearby values). This means that we can reduce our problem to the task of approximating the value \[ \max_{x\in V, \norm{x}=1} p(x) \label{eq:reward} \] where \(p(\cdot)\) is a problem-specific reward function and \(V\) is a linear subspace of \(\R^n\) that is derived from the input. This formulation does not necessarily make the problem easier as \(p(\cdot)\) can (and will) be a non convex function, but, when \(p\) is a low degree polynomial, it does make it potentially easier to analyze the performance of the sos algorithm. That being said, in none of these cases so far we have found the “right” reward function \(p(\cdot)\) that would yield a tight analysis for the sos algorithm. But for the small set expansion question, there is a particularly natural and simple reward function that shows promise for better analysis, and this is what we will focus on next.

Choosing such a reward function might seem to go against the “sos philosophy” where we are not supposed to use creativity in designing the algorithm, but rather formulate a problem such as small set expansion as an sos program in the most straightforward way, without having to guess a clever reward function. We believe that in the cases we discussed, it is possible to consign the choice of the reward function to the analysis of the standard sos program. That is, if one is able to show that the sos relaxation for \eqref{eq:reward} achieves certain guarantees for the original small set expansion problem, then it should be possible to show that the standard sos program for small set expansion would achieve the same guarantees.

Norm ratios as proxies for sparsity

In the small set expansion we are trying to minimize the quadratic \(x^\top L x\) subject to \(x\in \{0,\alpha\}\) being sparse. It turns out that the sparsity condition is more significant than the Booleanity. For example, we can show the following result:

Suppose that \(x\in \R^n\) has at most \(\delta n\) nonzero coordinates, and satisfies \(x^\top L x \leq \epsilon \norm{x}^2\) where \(L\) is the Laplacian of an \(n\)-vertex \(d\)-regular graph \(G\). Then there exists a set \(S\) of at most \(\delta n\) of \(G\)’s vertices such that \(|E(S,\overline{S})| \leq O(\sqrt{\epsilon} dn)\).

It turns out that this theorem can be proven in much the same way that Cheeger itself is proven, showing that a random level set of \(x\) (chosen such that it will have the support be a subset of the nonzero coordinates of \(x\)) will satisfy the needed condition. Note that if \(x\) has at most \(\delta n\) nonzero coordinates then \(\norm{x}_1 \leq \sqrt{n\delta}\norm{x}_2\). It turns out this condition is sufficient for the above theorem:

Suppose that \(x\in \R^n\) satisfies \(\norm{x}_1 \leq \sqrt{\delta n}\norm{x}_2\), and satisfies \(x^\top L x \leq \epsilon \norm{x}^2\) where \(L\) is the Laplacian of an \(n\)-vertex \(d\)-regular graph \(G\). Then there exists a set \(S\) of at most \(O(\sqrt{\delta} n)\) of \(G\)’s vertices such that \(|E(S,\overline{S})| \leq O(\sqrt{\epsilon} dn)\).

If \(x\) satisfies the theorem’s assumptions, then we can write \(x = x' + x''\) where for every \(i\), if \(|x_i| > \norm{x}_2/(2\sqrt{n})\) then \(x'_i = x_i\) and \(x''_i = 0\) and else \(x''_i = x_i\) and \(x'_i=0\). By this choice, \(\norm{x''}^2_2 \leq norm{x}^2/2\) and so (since \(x'\) and \(x''\) are orthogonal) \(\norm{x'}^2 = \norm{x}^2 - \norm{x''}^2 \geq \norm{x}^2/2\). By direct inspection, zeroing out coordinates will not increase the value of \(x^\top L x\) and so \[ {x'}^\top L x' \leq x^\top L x \leq \epsilon\norm{x}^2 \leq 2\epsilon \norm{x'}^2 \;. \]

But the number of nonzero coordinates in \(x'\) is at most \(2\sqrt{\delta} n\). Indeed, if there were more than \(2\sqrt{\delta} n\) coordinates in which \(|x_i| \geq norm{x}_2/(2\sqrt{n})\) then \(\norm{x}_1 \geq 2\sqrt{\delta} n \norm{x}_2/(2\sqrt{n}) = 2\sqrt{\delta n} \norm{x}_2\).

The above suggests the following can serve as a smooth proxy for the small set expansion problem: \[ \max \norm{x}_2^2 \] over \(x\in \R^n\) such that \(\norm{x}_1=1\) and \(x^\top L x \leq \epsilon\norm{x}^2\). Indeed the \(\ell_1\) to \(\ell_2\) norm ratio is an excellent proxy for sparsity, but unfortunately one where we don’t know how to prove algorithmic results for.

As we say in the sparsest vector problem, the \(\ell_2\) to \(\ell_4\) ratio (and more generally the \(\ell_2\) to \(\ell_p\) ratio for \(p>2\)) is a quantity we find easier to analyze, and indeed it turns out we can prove the following theorem:

Let \(G\) be a \(d\) regular and \(V_\lambda\) be the subspace of vectors corresponnding to eigenvalues smaller than \(\lambda\) in the Laplacian.

If \(\max_{\norm{x}_2=1, x\in V_\lambda} \norm{x}_p \leq Cn^{1/p-1/2}\) then for every subset \(S\) of at most \(\delta n\) vertices, \(|E(S,\overline{S})| \geq \lambda(1-C^2\delta^{1-2/p})\)
If there is a vector \(x\in V_\lambda\) with \(\norm{x}_p > C\norm{x}_2\) then there exists a set \(S\) of at most \(O(n/\sqrt{C})\) vertices such that \(|E(S,\overline{S})| \leq 1-(1-\lambda)^{2p}2^{-O(p)}\) where the \(O(\cdot)\) factor hide absolute factors independent of \(C\).

Statement 1. (a bound on the \(\ell_p/\ell_2\) ratio implies expansion) has been somewhat of “folklore” (e.g., see Chapter 10 in O’Donnell (2014)) . Statement 2. is due to (???) .

The proof for 1. is not hard and uses Hölder’s inequality in a straightforward matter. In particular (and this has been found useful) this proof can be embedded in the degree \(O(p)\) sos program.

The proof for 2. is more involved. One way to get intuition for this is that if \(x^\top L x\) is small then one could perhaps hope that \(y^\top L y\) is also not too large where \(y\in \R^n\) is defined as \((x_1^{p/2},\ldots,x_n^{p/2})\). That is, \(x^\top L x \leq \epsilon\norm{x}^2\) means that on average over a random edge \(\{i,j\}\), \((x_i-x_j)^2 \leq O(\epsilon(x_i^2+x_j^2))\). So one can perhaps hope that for a “typical” edge \(i,j\), \(x_i \in (1 \pm O(\epsilon))x_j\) but if that holds then \(y_i \in (1 \pm O(p\epsilon))y_j\) since \(y_i=x_i^{p/2}\) and \(y_j=x_j^{p/2}\). But if \(\E x_i^p > C(\E x_i^2)^{p/2}\) we can show using Hölder that \(\E x_i^p > C'(\E x_i^{p/2})^2\) for some \(C'\) that goes to infinity with \(C\). Hence this means that \(y\) is sparse in the \(\ell_2\) vs \(\ell_1\) sense, and we can apply Reference:#thm-local-cheeger. The actual proof breaks \(x\) into “level sets” that of values that equal one another up \(1\pm \epsilon\) multiplicative factors, and then analyzes the interaction between these level sets. This turns out to be quite involved but ultimately doable.
We should note that there is a qualitative difference between the known proof of 2. and the proof of Reference:#thm-local-cheeger and it is that the known proof does not supply a simple procedure to “round” any vector \(x\) that satisfies \(\norm{x}_p \geq C\norm{x}_2\) to a small set \(S\) that doesn’t expand, but rather transforms such a vector to either such a set or a different vector \(\norm{x'}_p \geq 2C\norm{x'}_2\), which is a procedure that eventually converges.

Subexponential time algorithm for small set expansion

At the heart of the sub-exponential time algorithm for small set expansion is the following result:

Suppose that \(G\) is a \(d\) regular graph and let \(A\) be the matrix corresponding to the lazy random walk on \(G\) (i.e., \(A_{i,i}=1/2\) and \(A_{i,j}=1/(2d)\) for \(i\sim j\)), with eigenvalues \(1=\lambda_1 \geq \lambda_2 \cdots \geq \lambda_n \geq 0\). If \(|\{ i : \lambda_i \geq 1-\epsilon \}| \geq n^\beta\) then there exists a vector \(x\in \R^n\) such that \(x^\top A x \geq (1-O(\epsilon/\beta))\norm{x}^2\) and \(\norm{x}_1 \leq \norm{x}_2\sqrt{n^{1-\beta/2}}\).

Let \(\Pi\) be the projector matrix to the span of the eigenvectors of \(A\) corresponding to eigenvalues over \(1-\epsilon\). Under our assumptions \(\Tr(\Pi) \geq n^\beta\) and so in particular there exists \(i\) such that \(e_i^\top \Pi e_i = \norm{\Pi e_i}^2 \geq n^{\beta-1}\). Define \(v_j = A^je_i\) to be the vector corresponding to the probability distribution of taking \(j\) steps according to \(A\) from the vertex \(i\). Since the eigenvalues of \(A^j\) are simply the eigenvalues of \(A\) raised to the \(j^{th}\) power, and \(e_i\) has at least \(n^{1-\beta}\) norm-squared projection to the span of eigenvalues of \(A\) larger than \(1-\epsilon\), \(\norm{v_j}_2^2 \geq n^{1-\beta}(1-\epsilon)^j\) which will be at least \(n^{1-\beta/2}\) if \(j \leq \beta \log n / (3\epsilon)\). Since \(v_j\) is a probability distribution \(\norm{v_j}_1=1\) which means that \(v_j\) satisfies the \(\ell_1\) to \(\ell_2\) ratio condition of the conclusion. Hence all that is left to show is that there is some \(j\leq \beta \log n / (3\epsilon)\) such that \(v_j^\top A v_j \geq (1-O(\epsilon/\beta))\norm{v_j}^2\). Note that (exercise!) it enough to show this for \(A^2\) instead of \(A\) (i.e. that \(v_j^\top A^2 v_j \geq (1-O(\epsilon/\beta))\norm{v_j}^2\)) and so, since \(Av_j =v_{j+1}\) it is enough to show that \(v_{j+1}^\top v_{j+1} = \norm{v_{j+1}}^2 \geq (1-O(\epsilon/\beta))\norm{v_j}^2\).
But since \(\norm{v_0}_2^2=1\), if \(\norm{v_{j+1}}^2 \leq (1-10\epsilon/\beta)\norm{v_j}^2\) for all such \(j\), then we would get that \(\norm{v_{\beta \log n / (3\epsilon)}}^2\) would be less than \((1-10\epsilon/\beta)^{\beta \log n / (3\epsilon)} \leq 1/n^2\) which is a contradiction as \(\norm{v}^2_2 \geq 1/n\) for every \(n\) dimensional vector \(v\) with \(\norm{v}_1=1\).

Reference:thm-struct-sse leads to a subexponential time algorithm for sse quite directly. Given a graph \(G\), if its top eigenspace has dimension at least \(n^{\beta}\) then it is not a small set expander by combining Reference:thm-struct-sse with the local Cheeger inequality. Otherwise we can do a “brute force search” in time \(\exp(\tilde{O}(n^\beta))\) over this subspace to search for vector that is (close to) the characteristic vector of a sparse non expanding set. Using the results of “fixing a vector in a subspace” we saw before, we can also do this via a \(\tilde{O}(n^\beta)\) degree sos. Getting a subexponential algorithm for unique games is more involved but uses the same ideas. Namely we obtain a refined structure theorem that states that for any graph \(G\), by removing \(o(1)\) fraction of the edges we can decompose \(G\) into a collection of disjoint subgraphs each with (essentially) at most \(n^\beta\) large eigenvalues. See chapter 5 of (Steurer 2010) for more details.

References

Arora, Sanjeev, Boaz Barak, and David Steurer. 2015. “Subexponential Algorithms for Unique Games and Related Problems.” J. ACM 62 (5): 42:1–42:25.

Charikar, Moses, Konstantin Makarychev, and Yury Makarychev. 2006. “Near-Optimal Algorithms for Unique Games.” In STOC, 205–14. ACM.

Goemans, Michel X., and David P. Williamson. 1995. “Improved Approximation Algorithms for Maximum Cut and Satisfiability Problems Using Semidefinite Programming.” J. ACM 42 (6): 1115–45.

Khot, Subhash. 2002. “On the Power of Unique 2-Prover 1-Round Games.” In Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing, 767–75. ACM, New York. doi:10.1145/509907.510017.

O’Donnell, Ryan. 2014. Analysis of Boolean Functions. Cambridge University Press.

Raghavendra, Prasad, and David Steurer. 2010. “Graph Expansion and the Unique Games Conjecture.” In STOC, 755–64. ACM.

Raghavendra, Prasad, David Steurer, and Prasad Tetali. 2010. “Approximations for the Isoperimetric and Spectral Profile of Graphs and Related Parameters.” In STOC, 631–40. ACM.

Raghavendra, Prasad, David Steurer, and Madhur Tulsiani. 2012. “Reductions Between Expansion Problems.” In 2012 IEEE 27th Conference on Computational Complexity—CCC 2012, 64–73. IEEE Computer Soc., Los Alamitos, CA. doi:10.1109/CCC.2012.43.

Steurer, David. 2010. “On the Complexity of Unique Games and Graph Expansion.” PhD thesis, Princeton University.