Sum-of-squares: proofs, beliefs, and algorithms — Boaz Barak and David Steurer

From integrality gaps to hardness

We have seen how we can transform computational hardness results into integrality gaps. In a surprising work Raghavendra (2008) showed a transformation in the other direction. Namely, he showed how to take every constant degree integrality gap for a constraint satisfaction problem, and obtain a hardness of approximation result for the same problem with (essentially) the same parameters. Alas, there is one major fly in this ointment: the result is based on Khot’s Unique Games Conjecture, on whose veracity there is no consensus.

We will discuss the Unique Games Conjecture, and the evidence for and against it, later in this course, but regardless of whether the conjecture is true or not, the techniques and ideas behind Raghavendra’s result are beautiful, and have already found additional applications. One promising sign is a result of Chan (2013) , who gave a hardness of approximation result based merely on \(P\neq NP\) which matches the parameters (and is inspired by) the degree \(\Omega(n)\) integrality gap for “nice subspace predicates” we saw before. However, at the moment we still don’t know of a generic transformation along these lines.

The Max Cut problem we saw before is an example of a Constraint Satisfaction Problem (CSP). In such a problem, the instance \(I\) is given a list of functions \(f_1,\ldots,f_m\from\Sigma^n\to\bits\), where \(\Sigma\) is some finite set, and the goal is to find the assignment \(x^*\in\Sigma^n\) that maximizes the fraction of \(i\)’s such that \(f_i(x^*)=1\). This fraction is known as the value of the instance \(I\), and is denoted as \(\val(I)\).We will not make much distinction between algorithms whose goal is to compute the value and algorithms whose goal is to find the actual maximizing assignment. In practice most algorithms that are aimed at the former also achieve the latter. For every subset \(\cC\) of functions that that map finite sequences of elements in \(\Sigma\) to \(\bits\), we define \(CSP(\cC)\) to be the class of CSP’s where all constraints are in \(\cC\). One particular case of interest is when \(\cC\) the set \(\cC(\cP)\) where \(\cP\) is a finite set of functions mapping \(\bits^k\) to \(\bits\), and a function \(f\from\Sigma^n\to\bits\) is in \(\cC(\cP)\) if \(f\) is obtained by applying some function \(P\in\cP\) to \(k\) of its input symbols. The corresponding class of CSP’s is denoted as \(CSP(\cP)\).

For \(1\geq c>s \geq 0\), a \((c,s)\)-approximation algorithm for a class \(CSP(\cC)\) is an algorithm that outputs \(1\) on every \(I\in CSP(\cC)\) such that \(\val(I)\geq c\) and outputs \(0\) on every \(I\) such that \(val(I) \leq s\).The reason for calling these parameters \(c\) and \(s\) is that \(c\) stands for “completeness” and \(s\) stands for “soundness”. These names make the most sense in the context of hardness reductions, as we’ll see below. A \((c,s)\) basic integrality gap for a class \(CSP(\cC)\) is an instance \(I=(f_1,\ldots,f_m)\in CSP(\cC)\) such that \(\val(I)\leq c\) but there is a pseudo-distribution \(\mu\) of degree the maximum of \(\deg f_i\) over \(\bits^n\) such that \(\pE_\mu \sum_{i=1}^m f_i = m\). Raghavendra (2008) proved the following remarkable result:

Assuming Khot’s Unique Games Conjecture and \(P\neq NP\), for every \(c>s\) and set \(\cP\) of functions mapping \(\Sigma^k\) to \(\bits\). If there exists a \((c,s)\) degree \(2|\Sigma|k\) integrality gap for \(CSP(\cP)\) then for every \(\epsilon>0\) there is no polynomial time \((c-\epsilon,s+\epsilon)\) approximation algorithm for \(CSP(\cP)\).

Note: The actual semidefinite program that Raghavendra considered, which he called Basic SDP is weaker (i.e., contains fewer constraints) than degree \(2|\Sigma|k\) sos, but stronger than degree \(2\) sos, though is arguably “morally” closer to degree \(2\) than degree \(2|\Sigma|k\). Indeed, Basic SDP is a degree two sos relaxation of the maximization problem phrased in a somewhat different “constraint vs variable” formulation. In particular for the Max Cut problem, the Basic SDP formulation and the standard degree 2 sos formulation we saw before are equivalent.

While it will not matter for our discussion beow, for the sake of completeness we describe the Basic SDP formulation for a general CSP \(I=(f_1,\ldots,f_m)\) over alphabet \(\Sigma\). It is the degree \(2\) sos relaxation for the problem of maximizing a polynomial \(F\) over \(\bits^{n|\Sigma|+m|\Sigma|^k}\) where we use the \(0/1\) variables \(\{ x_{i,\sigma} \}_{i\in [n],\sigma\in\Sigma}\) and \(\{ y_{\ell,\vec{\sigma}}\}_{\ell\in [m],\vec{\sigma}\in \Sigma^k}\). Intuitively, \(x_{i,\sigma}=1\) iff the \(i^{th}\) variable of the original assignment is \(\sigma\), and \(y_{\ell,\vec{\sigma}}=1\) iff the \(k\) variables involved in the \(\ell^{th}\) constraint have the assignment \(\vec{\sigma}\).

It is not too hard to come up with a quadratic polynomial \(F\) in these variables such that the maximum of \(F(x,y)\) over all \((x,y) \in \bits^{|\Sigma|n+m|\Sigma|^k}\) will equal the maximum fraction of satisfiable constraints to the original CSP by an assignment in \(\Sigma^n\). We leave verifying this as an exercise.

Since max cut is a particular instance of constraint satisfaction problem with degree two constraints, combining this with Feige-Schechtman’s result we saw in the last lecture, we get the following theorem as a corollary:

Let \(\alpha_{GW} \sim 0.878\) and \(x_{GW}\sim 0.845\) be the constants computed in the previous lecture. Assuming Khot’s Unique Games Conjecture and \(P\neq NP\), then for every \(\epsilon>0\) there is no polynomial time \((x_{GW}-\epsilon,\alpha_{GW}x_{GW}+\epsilon)\) approximation algorithm for \(CSP(\cP)\).

Reference:thm-ug-maxcut was actually proven by Khot et al. (2004) before Reference:thm-raghavendra and served as an inspiration to Raghavendra (2008) ’s result. We will only show the proof of Reference:thm-ug-maxcut, in fact, only sketch it at that while indicating how it can be further generalized.

The Unique Games Conjecture

The Unique Games Conjecture was proposed by Khot (2002) . It concerns the following constraint satisfaction problem:

For every \(1 \geq c^* >s^* \geq 0\) and \(\ell\in\N\), the \(UG_{c^*,s^*}(\ell)\) is the problem of distinguishing whether an instance of \(CSP(\cP_\ell)\) has value at least \(c^*\) or value at most \(s^*\), where \(\cP_\ell\) is the set of \(2\)-ary predicates on alphabet \([\ell]\) defined as: \[ \cP_\ell = \Set{ P\from[\ell]^2\to\bits \Mid \forall x\in [\ell] \exists \text{ unique } y\in [\ell] { s.t. } P(x,y)=1 }\,. \]

The conjecture is the following:

For every \(\epsilon>0\), there exists some \(\ell\) such that for \(UG_{1-\epsilon,\epsilon}(\ell)\) is NP hard.

The requirement of completeness less than \(1\) is inherent, as the following exercise shows:

For every \(\ell,s^*<1\), give a polynomial-time algorithm for the \(UG_{1,s^*}(\ell)\) problem.

Tight hardness of approximation for Max Cut

We will not show the full proof of Reference:thm-ug-maxcut (let alone Reference:thm-raghavendra), but we will illustrate some of the ideas behind it. A priori it seems very strange that a result like that could be proved. A \((c,s)\) integrality gap is some finite mathematical object with particular properties. How can the existence of such an object prevent the existence of an efficient algorithm?

The idea is that such an integrality gap can be used as a gadget in a reduction from the (conjectured to be) hard computational problem \(UG_{1-\epsilon,\epsilon}(\ell)\) into an instance of \(CSP(\cP)\). The Unique Games conjecture posits that for some particular values \(c^*=1-\epsilon\) and \(s^* =\epsilon\) it is computationally hard (specifically NP-hard) to distinguish, given an instance \(I\) of \(UG_{1-\epsilon,\epsilon}(\ell)\), between the case that \(val(I)\geq c^*\) and the case that \(\val(I)\leq s^*\). The reduction we are looking for is some efficient map \(\varphi\) mapping an instance \(I\) of \(UG_{1-\epsilon,\epsilon}(\ell)\) into an instance \(\varphi(I)\) of \(CSP(\cP)\) satisfying:

Completeness: If \(\val(I)\geq c^*\) then \(\val(\varphi(I))\geq c-\epsilon\).
Soundness: If \(\val(I)\leq s^*\) then \(\val(\varphi(I))\leq s+\epsilon\)

We will not give a full description of the reduction, but will only mention some of its key features. Recall that the alphabet of \(I\) in \(UG_{1-\epsilon,\epsilon}(\ell)\) is the set \([\ell]=\{1,\ldots,\ell\}\). The reduction will use as a gadget an error correcting code \(ECC\) mapping \([\ell]\) to \(\bits^L\) for some \(L\), and will map an instance \(I\) of \(\Pi\) that uses \(n\) variables into a max-cut instance \(I'\) on \(nL\) vertices that are divided into \(n\) \(L\)-sized blocks. If \(x\in [\ell]^n\) is an assignment that achieves value at least \(c^*\) for the instance \(I\), then we obtain a bipartition with cut value \(c-\epsilon\) for \(I'\) by cutting the \(L\) variables of the \(i^{th}\) block according to the \(L\)-length string \(ECC(x_i)\). The particular code we will use is known as the long code. It is the map \(ECC\from[\ell]\to\bits^L\) for \(L=2^\ell\), for every \(i\in [\ell]\) and \(w\in\bits^\ell\), we define \(ECC(i)_w=w_i\). It will be more convenient for us to think of (potentially corrupted) codewords of \(ECC\) as functions \(f\) mapping \(\bits^\ell\) to \(\bits\), where the actual (i.e., non corrupted) codewords correspond to the dictator functions of the form \(f_i(w)=w_i\).

The Max-Cut gadget desiderata

To show that our reduction is sound, we need to show that given any bipartition \(f\from\bits^{nL}\to\bits\) of the vertices of \(I'\) that cuts more than \(s+\epsilon\) fraction of the edges, we can decode it into an assignment \(x\in[\ell]^n\) that satisfies at least an \(s^*\) fraction of the original constraints of \(I\). It turns out that the key property is to show a way how to decode the restriction of the bipartition \(f\) to any particular block into a particular symbol in \(\Sigma\). Informally, the “gadget” we need for the reduction, is a graph \(H\) with vertex set \(V=\bits^\ell\) having the following properties:

Completeness: For every \(i\in[\ell]\), the cut value of the bipartition corresponding to the dictator \(f_i\) is at least \(c-\epsilon\).
Soundness: For every \(f\from\bits^\ell\to\bits\) that is “far” from a dictator function, the value of the cut corresponding to the bipartition \(f\) is at most \(s+\epsilon\).

Constructing a gadget from an integrality gap

Roughly speaking, the idea behind the construction is as follows. Recall that the \((x_{GW}-o(1),\alpha_{GW}x_{GW}+o(1))\) integrality gap was obtained by taking the graph \(G\) whose vertices are the vectors in the \(d\) dimensional unit sphere and we put an edge between two vertices \(u,v\in \R^d\) if \(\iprod{u,v} \leq \rho_{GW}+\epsilon\) for \(\rho_{GW} = 1-2x_{GW}\). Our gadget graph \(H\) over the Boolean cube \(\bits^\ell\) will be inspired by \(G\), in the sense that we connect \(w,z\in\bits^\ell\) if their correlation (i.e., the fraction of coordinates they agree on minus the fraction of coordinates they disagree on) is at most \(\rho_{GW}+\epsilon\). Let us now try to analyze this graph on an intuitive level.

The completeness property is fairly straightforward. Indeed, if \(f:\bits^\ell\to\bits\) is a “dictator” function of the form \(f(w)=w_i\) then for a random edge \((w,z)\), the probability that \(w_i\neq z_i\) is at least \(1/2 + (\rho_{GW}+\epsilon)/2 = x_{GW}+\epsilon/2\).

The soundness property is much subtler, not least because we did not even define what being “far from a dictator” means. Let us take one particular example. Suppose that \(f\from\bits^\ell\to\bits\) is a linear threshold function of the form \(f(w)=1\) if \(\sum \alpha_i w_i > \tau\) and \(f(w)=0\) otherwise, for some coefficients \(\alpha_1,\ldots,\alpha_\ell,\tau\). If all the \(\alpha\) coefficients but one are zero then \(f\) is a dictator, and so one way of saying that \(f\) is “far” from being a dictator is that all the coefficients are of roughly equal magnitude (say the same up to some constant).In Fourier analysis parlance, this is known as the property of \(f\) having small maximum influence. In this case we can use the central limit theorem to argue that \(\sum \alpha_i w_i\) is roughly the same as a normal variable with the same mean and variance. So, the probability that for a random edge \((w,z)\), the bipartition \(f\) will cut \((w,z)\) in the sense \(f(w)\neq f(z)\) is essentially the same as the probability that we can distinguish between two \(\rho\)-correlated normal variables, but by the same calculations that we’ve done before, this will be at most \(\arccos(\rho)/\pi\) which in our case would be at most \(\alpha_{GW}x_{GW}\).

More generally, to analyze the soundness of this gadget, Khot et al. (2004) used a powerful generalization of the central limit theorem known as the invariance principle (Mossel, O’Donnell, and Oleszkiewicz 2005).In fact the papers were in the reverse order. The motivation behind (Mossel, O’Donnell, and Oleszkiewicz 2005) was precisely to complete the soundness analysis of (Khot et al. 2004). Roughly speaking, the invariance principle means that if \(f\) is “far” from a dictator in some technical sense, then it cannot distinguish between the case that its input comes from the \(0/1\) Bernoulli distribution or the Gaussian distribution with the same moments. But then if \(f\) would have too good of a cut value, that would refute the isoperimteric result that underlies the integrality gap. The invariance principle can be thought of as an inverse theorem, saying that only “nice” functions (i.e., dictators or functions close to them) can and do distinguish between the sphere (or Gaussian space) and the cube. This completes our (admittedly quite partial and sketchy) outline of the proof.

What’s unique about unique games?

We have not seen the full reduction, and so can not at this point truly appreciate why it needs to rely on the unique games conjecture. Indeed, the label cover problem, which is superficially quite similar to unique games, is in fact NP-hard (see Reference:def-label-cover below). A priori one could perhaps hope that there is a “minor modification” of this reduction so it can handle the case where the original constraint satisfaction problem instance \(I\) is “non unique” and hence be based merely on \(P\neq NP\).

On a technical level, the current proof relies on the uniqueness property to reduce the task of verifying that the original constraints were satisfied to the “code checking” task of verifying that any assignment that has good value for the gadget is related to a true codeword. One could hope to get a more sophisticated gadget that would enable such “consistency checking” as well. Indeed, the current best candidate approaches to proving the unique games conjecture boil down to coming up with such gadgets. There are some obstacles to such an approach, showing that it would require more significant modifications. First, there is a sub-exponential time algorithm for unique games (Arora, Barak, and Steurer 2010) which implies that any such reduction based on the label cover problem (which is believed to be exponentially hard) would need to use some kind of “powering” step with a polynomial blow up the instance size in addition to any gadget. It also shows that there is a sense in which the Unique Games problem is qualitatively easier than Label Cover.

Also, the relation between gadgets and integrality gaps is not yet fully understood. For example, while the current gadgets are based on degree two integrality gaps, it turns out that they are inherently not integrality gaps for higher degree, as the invariance principle itself (or, more accurately, close variants of it) has a constant degree sum of squares proof (Barak et al. 2012). We do not know whether it means that a gadget whose soundness proof is based on the invariance principle cannot be used to obtain such NP hardness reductions.

We will return to the Unique Games Conjecture later in this course. Given current research, it seems that understanding its truth is closely coupled with the question of understanding the extent of the power of the sum of squares algorithm.

One variant of the PCP Theorem is that for every \(\epsilon>0\) there is some \(k\) and some family \(\cP\) of predicates on \(\bits^k\) such that it is NP hard to distinguish between \(CSP(\cP)\) instances of value at most \(\epsilon\) and \(CSP(\cP)\) instances of value at least \(1-\epsilon\). Show that you can reduce the arity \(k\) of the constraints to \(2\), at the cost of increasing the alphabet. That is, show that for every \(k\) and such family \(\cP\) of predicates over \(\bits^k\), there is some \(\ell\) and a family \(\cP'\) of predicates over \([\ell]^2\), and an efficient reduction \(R\) such that for every instance \(I\) of \(CSP(\cP)\), \(R(I)\) is an instance of \(CSP(\cP')\) satisfiying:

Completeness: If \(\val(I) \geq 1-\epsilon\) then \(\val(R(I)) \geq 1-10\epsilon\)
Soundness: If \(\val(I) \leq \epsilon\) then \(\val(R(I)) \leq 10\epsilon\).

This shows that if we drop the uniqueness constraints on the constraints on \([\ell]^2\) in the definition of \(UG_{1-\epsilon,\epsilon}(\ell)\) then the unique games conjecture becomes a corollary of the PCP Theorem.Hint: Given an instance \(I\) with \(n\) variables and \(m\) \(k\)-ary constraints, the reduction will create an instance \(I'\) with alphabet size \(\ell=2^k\) and \(n+m\) variables \(x_1,\ldots,x_n,y_1,\ldots,y_m\) where the variables \(x_1,\ldots,x_n\) are as in the original constraint and the variable \(y_i\) encodes the assignment to the \(k\) variables participating in the \(i^{th}\) constraint.

References

Arora, Sanjeev, Boaz Barak, and David Steurer. 2010. “Subexponential Algorithms for Unique Games and Related Problems.” In FOCS, 563–72. IEEE Computer Society.

Barak, Boaz, Fernando G. S. L. Brandão, Aram Wettroth Harrow, Jonathan A. Kelner, David Steurer, and Yuan Zhou. 2012. “Hypercontractivity, Sum-of-Squares Proofs, and Their Applications.” In STOC, 307–26. ACM.

Chan, Siu On. 2013. “Approximation Resistance from Pairwise Independent Subgroups.” In STOC, 447–56. ACM.

Khot, Subhash. 2002. “On the Power of Unique 2-Prover 1-Round Games.” In STOC, 767–75. ACM.

Khot, Subhash, Guy Kindler, Elchanan Mossel, and Ryan O’Donnell. 2004. “Optimal Inapproximability Results for Max-Cut and Other 2-Variable Csps?” In FOCS, 146–54. IEEE Computer Society.

Mossel, Elchanan, Ryan O’Donnell, and Krzysztof Oleszkiewicz. 2005. “Noise Stability of Functions with Low in.uences Invariance and Optimality.” In FOCS, 21–30. IEEE Computer Society.

Raghavendra, Prasad. 2008. “Optimal Algorithms and Inapproximability Results for Every Csp?” In STOC, 245–54. ACM.