Sum-of-squares: proofs, beliefs, and algorithms — Boaz Barak and David Steurer

Integrality Gap for the Knapsack Problem

Note: These notes are still somewhat rough.

Suppose we have \(n\) items of unit size and we want to pack as many as possible in a knapsack of size \(r\). Then, we can pack at most \(\lfloor r \rfloor\) number of items in the knapsack. In this lecture, we show that this seemingly simple reasoning that one can’t pack a ``fractional" item is not captured by low-degree SoS algorithm. Specifically, we’ll prove Grigoriev’s theorem (Grigoriev 2001):

For any \(r \leq n/2\), there is a pseudodistribution of degree \(\Omega(r)\) supported on points \(x \in \bits^n\) such that \[ \sum_{i = 1}^n x_i = r. \label{eq:knapsack} \]

Observe that for non-integral \(r\) the above yields an integrality gap for the special case of knapsack presented above of degree \(\Omega(r)\).

Grigoriev’s original proof of this result was a elegant argument that analyzed a natural, maximally symmetric pseudodistribution that we will momentarily describe. This argument essentially invented several basic results about the spectrum of matrices from the Johnson Scheme studied in algebraic combinatorics. We highly recommend reading Grigoriev’s original proof - in the lecture however, we will rely on standard results from the theory of association schemes and get a shorter proof. This argument was first presented by Meka and Wigderson (Meka and Wigderson 2013) in a work that attempted to show the first SoS lower bound for the Planted Clique problem.

The Pseudodistribution

Fix \(d = \Theta(r)\) to be chosen later. The idea for constructing degree \(d\) pseudodistribution is very natural - the constraint \(\sum_{i}x_i = r\) is symmetric in \(r\). Thus, if we had an arbitrary degree \(2d\) pseudodistribution, we could average it over all permutations \(\sigma\) of \([n]\) and obtain another degree \(d\) pseudodistribution over the hypercube that is symmetric w.r.t. permutations of \([n]\) and satisfies \eqref{eq:knapsack}.

Thus, for any symmetric pseudodistribution \(\mu\), there exists a function \(f\) such that for every multilinear monomial \(x_S = \Pi_{i \in S} x_i\), \(\pE[x_S] = f(|S|).\) It turns out that there’s no choice in \(f\) either.

First, \(\sum_{i = 1}^n \pE_{\mu}[x_i] = n f(1)\). On the other hand, since \(\mu\) is supported on \(x\) satisfying \eqref{eq:knapsack}, \(\sum_{i = 1}^n\pE[x_i] = r\). Thus, \(f(1) = r/n\) for every \(i\). \[r f(1) = r \pE[x_i] = \pE[x_i(\sum_{j = 1}^n x_j)]= \pE[x_i] + (n-1) \pE[x_ix_j] = f(1) + (n-1) f(2),\] which implies that \(f(2) = (r-1)/(n-1)\). One can repeat this argument to obtain that \[ \pE[x_S] = f(|S|) = \frac{{r \choose |S|}}{{n \choose |S|}}, \label{eq:def-pE} \] for every \(S\) such that \(|S| \leq 2d\).

It is easy to check that \(\pE\) constructed here satisfies the constraint \eqref{eq:knapsack}. Thus, as usual, we are left with showing the positivity of \(\pE\).

Reduction to Positivity over Squared Homogenous Polynomials

Because of the linear equality constraint Reference:eq:knapsack, it turns out that it is enough to prove positivity of \(\pE[p^2]\) for any homogenous degree \(d\) polynomial.

Suppose \(M\) be any linear operator on degree \(d\) polynomials that is consistent with the constraints \(x_i^2 = x_i\) for every \(i \in [n]\) and \(\sum_{i = 1}^n x_i - r = 0.\) Suppose \(M(p^2) \geq 0\) for every degree \(d\) homogenous polynomial \(p\). Then, \(M(q^2) \geq 0\) for any degree \(d\) polynomial \(q\).

The idea is that any polynomial \(p\) of degree \(d\) can be written as \[ p_1 + \sum_{i = 1}^n (x_i^2 - x_i) p_i + q (\sum_{i = 1}^n x_i - r) \label{eq:homogenization} \] for a homogenous degree \(d\) polynomial \(p_1\), degree at most \(d-2\) polynomials \(p_i\) and degree at most \(d-1\) polynomial \(q\). This can be shown by polynomial division, for example.

Once we have the above representation, squaring the right hand side of Reference:eq:homogenization yields \(p_1^2\) plus terms that have either \((x_i^2-x_i)\) or \((\sum_{i = 1}^n x_i - r)\) as a factor. Applying \(M\) to the RHS, using linearity and the fact that \(M\) satisfies both Boolean and the knapsack constraints thus yields that \(M(p^2) = M(p_1^2) \geq 0\).

We can now go to the matrix view of things in order to show positivity of \(\pE\) on homogenous degree \(d\) polynomials - we have seen this argument several times in the lectures before.

Let \(\cM \in \R^{\nchoose{d} \times \nchoose{d}}\) be defined by \(\cM(S,T) = \pE[ x_{S \cup T}] = f(2d-|S\cap T|)\) for any \(S, T \in \nchoose{d}\). Show that \(\pE[p^2] \geq 0\) for every homogenous degree \(d\) polynomial if and only if \(\cM \succeq 0.\)

Note that \(\cM\) is the \({n \choose d} \times {n \choose d}\) dimensional principal sub-matrix of the usual moment-matrix of \(\pE\).

Johnson Scheme

The discussion here is based on (Meka and Wigderson 2013) which in turn is based on (Godsil 1993). Association schemes are well-studied objects in Algebraic Combinatorics. For our purposes, we can think of them as a commutative algebra of square matrices - i.e. adding or multiplying any two matrices from the set yields another matrix in the set and for any two matrices \(A,B\) in the set, \(AB = BA\). We are interested in one such well-studied scheme.

Let \(d < n/2 \in \N\) be parameters. The Johnson Scheme \(\cJ_{n,d}\) of order \(d\) on \([n]\) is the linear subspace of all matrices \(J\) in \(\R^{\nchoose{d}\times \nchoose{d}}\) that are set symmetric i.e. \(J(I,J) = h(|I \cap J|)\) for any \(I,J \in \nchoose{d}\). In other words, any entry of any matrix in the subspace depends only on the size of the intersection of the row and column index sets.

We now define two basis for \(\cJ_{n,d}\).

For \(0 \leq \ell \leq d \leq n\), let \(D_{\ell} \in \R^{\nchoose{d} \times \nchoose{d}}\) be the matrix defined by \[ D_{\ell}(I,J) = \begin{cases} 1 & \text{ if } |I \cap J| = \ell\\ 0 & \text{ otherwise.}. \end{cases} \]

\(D_0\) is then the well-studied Set Disjointness matrix from communication complexity. It is easy to check that \(D_{\ell}\) for \(\ell \leq d\) span \(\cJ_{n,d}\). Further, it’s also easy to verify that the \(D_{\ell}\)’s commute with each other - and thus every pair of matrices in \(\cJ_{n,d}\) commute establishing that \(\cJ_{n,d}\) is indeed a commutative algebra of matrices.

For the purposes of proving PSDness, another basis of \(\cJ_{n,d}\) is very useful - the \(P\)-Basis.

For \(0 \leq t \leq d\), let \(P_t \in \R^{\nchoose{d} \times \nchoose{d}}\) be defined by \[ P_t(I,J) = {{|I \cap J|} \choose t}, \] where it’s understood that if \(|I \cap J| = 0\), then \(P_t(I,J) = 0.\)

There’s an equivalent definition of \(P_t\)s that’s helpful in calculations.

Let \(R_T\) be the rank \(1\) matrix defined by \(R_T(I,J) = \1(I \supseteq T) \1(J \supseteq T)\). Show that \(P_t(I,J) = \sum_{T \subseteq [n], |T|= t} R_T.\)

The above exercise can be used to obtain explicit basis change coefficients between the \(D\) and the \(P\) basis.

Show that

For \(0\leq t \leq d\), we have \(P_t = \sum_{\ell = t}^d {\ell \choose t} D_{\ell}.\)
For \(0 \leq \ell \leq d\), we have \(D_{\ell} = \sum_{t \geq \ell} (-1)^{t-\ell} {t \choose \ell} P_t.\)

The main results from the theory of association schemes of interest to us is the characterization of eigenspaces and eigenvalues of the matrices in \(\cJ_{n,d}\). This is done using the fact that there’s a natural action (the relabeling action on elements of \([n]\)) of \(S_n\) that commutes with every matrix in \(\cJ_{n,d}\) (because of set-symmetry of \(\cJ_{n,d}\)-matrices). This implies that matrices in \(\cJ_{n,d}\) must share eigenspaces with those corresponding to the action of the symmetric group \(S_n\). The latter are well understood as representations of \(S_n\) and one can use this understanding to arrive at an explicit description of eigenvalues and eigenspaces of matrices in \(\cJ_{n,d}\).

For our purposes, we will just state the results required for us.

For \(P_t = P_{t,n,d}\) defined as above, there exist pairwise orthogonal subspaces \(V_0, V_1, V_2, \ldots, V_d\) such that

\(V_0, V_1, V_2,\ldots,V_d\) are eigenspaces for \(P_t\) for every \(0 \leq t \leq r\) and consequently for every matrix in \(\cJ_{n,d}\).
\(dim(V_j) = {n \choose j} - {n \choose {j-1}}\).
For any matrix \(J \in \cJ_{n,d}\), let \(\lambda_i(J)\) denote the eigenvalue of \(J\) on the subspace \(V_i\). Then, \[ \lambda_i(P_t) = \begin{cases} {{n-t-i} \choose {d-t}} \cdot {{d-i} \choose {t-i}} & \text{ if } i \leq t\\ 0 & \text{ otherwise.} \end{cases} \]

The above lemma helps us estimate the eigenvalues of any matrix that can be written as a linear combination of the matrices \(P_t\) or \(D_{\ell}\) matrices. In particular, we will use the following estimate on the eigenvalues of such matrices in our analysis of \(\cM\) from the previous section.

Let \(Q= \sum_{\ell} \alpha_{\ell} D_{\ell} \in \cJ_{n,d}\) and \(\beta_t = \sum_{t \leq \ell} {t \choose \ell} \alpha_{\ell}\) for \(\alpha_{\ell} \geq 0\). Then, for \(0 \leq j \leq r\), \[ \lambda_j(Q) \leq \sum_{t \geq j} \beta_t {{n-t-j} \choose {d-t}} {{d-j} \choose {t-j}}. \]

\[ \begin{aligned} \sum_{\ell} \alpha_{\ell} D_{\ell} &= \sum_{\ell} \alpha_{\ell} (\sum_{t \geq \ell}(-1)^{t-\ell}{t \choose \ell} P_t) \\ &= \sum_{t} P_t (\sum_{\ell \leq t} (-1)^{t-\ell} {t \choose \ell} \alpha_{\ell})\\ &\leq \sum_{t} P_t (\sum_{t \leq \ell} \alpha_{\ell}) & = \sum_{t } \beta_t P_t \end{aligned} \]

Using that \(Q\) and \(P_t\) share eigenspaces and applying Lemma Reference:eigenspaces-of-johnson-scheme, we obtain the estimate claimed.

PSDness of \(\pE\)

\(\cM \succeq 0\).

We are going to use Lemma Reference:eigenvalue-estimates-for-johnson-scheme. For any non-negative \(\alpha_1, \alpha_2, \ldots, \alpha_{d}\), note that: \[ 0 \preceq \sum_{t} \alpha_{t} P_t = \sum_{\ell= 0}^r ( \sum_{t = 0}^{\ell} \alpha_t) {\ell \choose t}) D_{\ell} \]

From the definition of \(\cM\), we know that \(\cM = \sum_{\ell = 0}^d f(\ell) D_{2d-\ell}.\) We will thus be done from above if we can find non-negative \(\alpha_{t}\) such that \(( \sum_{t = 0}^{\ell} \alpha_t) {\ell \choose t} = f( 2d-\ell)\) for every \(\ell\).

Observe that \(f(2d-\ell) = f(2d) \cdot \frac{{{n-2d+\ell} \choose \ell}}{{{r-2d+\ell} \choose \ell}}.\)

Choose \(\alpha_t = \alpha_t = \frac{f(d)}{{n \choose d}} \cdot \frac{{{n-r} \choose t}}{{{r-2d+t-1} \choose t}}.\)

We can now verify: \[ \begin{aligned} f(2d) {{n-2d+\ell} \choose \ell} &= f(2d) \sum_{t = 0}^{\ell} {{n-r} \choose t} \cdot {{r-2d+\ell-1} \choose {\ell-t}}\\ &= \sum_{t = 0}^{\ell} \alpha_t \cdot {{r-2d+t-1} \choose t} {{r-2d+\ell-1} \choose {\ell-t}}\\ &= \sum_{t = 0}^{\ell} \alpha_t {\ell \choose t} \cdot {{r-2d+\ell} \choose \ell}. \end{aligned} \]

The lemma now follows and in fact shows that the minimum eigenvalue of \(\cM\) is \(\alpha_d\).

References

Godsil, Chris D. 1993. Algebraic Combinatorics. Chapman and Hall Mathematics Series. Chapman; Hall.

Grigoriev, Dima. 2001. “Complexity of Positivstellensatz Proofs for the Knapsack.” Computational Complexity 10 (2): 139–54.

Meka, Raghu, and Avi Wigderson. 2013. “Association Schemes, Non-Commutative Polynomial Concentration, and Sum-of-Squares Lower Bounds for Planted Clique.” Electronic Colloquium on Computational Complexity (ECCC) 20: 105.