Sum-of-squares: proofs, beliefs, and algorithms — Boaz Barak and David Steurer

Grothendieck-type inequalities

Suppose \(A\in\R^{n\times m}\) is a linear operator from \(\R^m\) to \(\R^n\) represented by an \(n\)-by-\(m\) matrix. An important parameter of \(A\) is its operator norm, the smallest number \(c\ge 0\) such that \(\norm{A x} \le c \cdot \norm{x}\) for all \(x\in\R^m\). This quantity depends on the choice of norms for the input and output spaces of operator—\(\R^m\) and \(\R^n\) in our case. The most common choice is the Euclidean norm. In this case, the operator norm is simply the largest singular value of \(A\), which can be computed in polynomial time.

Suppose that we wanted to bound the maximum operator norm of \(A\) over all choices of \(\ell_p\) norms for \(\R^m\) and \(\R^n\)—fixing the norms of the coordinate basis vectors to be \(1\). It turns out that the worst choice of norms is \(\ell_\infty\) for the input space \(\R^m\) and \(\ell_1\) for the output space \(\R^n\).The reason is that \(\ell_\infty\) is the smallest \(\ell_p\) norm of \(\R^m\) and that \(\ell_1\) is the largest norm of \(\R^n\) (when we fix the norm of coordinate basis vectors). We let \(\norm{A}_{\infty\to 1}\) denote the operator norm of \(A\) for this choice of norms for the input and output space, \[ \norm{A}_{\infty\to 1} = \max_{x\in \R^m - 0} \frac{\norm{A x}_1}{\norm {x}_\infty}\,. \] Unlike the largest singular value, computing this operator norm is NP-hard. However, as we will see, there exists a polynomial-time algorithm for this norm that achieves a constant approximation factor (the constant is bigger than \(\tfrac12\)).

The following lemma shows that \(\norm{A}_{\infty \to 1}\) is the optimum value of a quadratic optimization problem over the hypercube. For convenience, we work with the set \(\sbits^n\) instead of \(\bits^n\).

For every matrix \(A\in \R^{n\times m}\), \[ \norm{A}_{\infty \to 1} = \max_{x\in \sbits^m, y\in \sbits^n} \iprod{Ax,y}\,. \]

The lemma follows from the fact that for every vector \(z\in\R^n\), the maximum value of \(\iprod{z,y}\) over all \(y\in\sbits^n\) is equal to \(\norm{z}_1\).

One application of the \(\infty\)-to-\(1\) norm is to approximate the cut norm of a matrix \(A=(a_{ij})\in\R^{n\times m}\), which is the maximum of \(\sum_{i\in S,j\in T} a_{ij}\) over all subsets \(S\subseteq [n],T\subseteq [m]\).

Prove that for every matrix \(A\), the cut norm of \(A\) is between \(\norm{A}_{\infty \to 1}/4\) and \(\norm{A}_{\infty \to 1}\).

Alexander Grothendieck (1928–2014) was one of the leading mathematicians of the 20th century, transforming the field of algebraic geometry. One of his early works established a result he called “the fundamental theorem in the metric theory of tensor products” and is now known as Grothendieck’s inequality. This inequality has found applications in a diverse variety of fields including Banach spaces, \(C^*\)-algebras, quantum mechanics, and computer science. The surveys of Pisier (2012) and Khot and Naor (2012) are good sources for the amazing arrays of applications.

Grothendieck’s inequality is equivalent to the following theorem about degree-2 pseudo-distributions (see Alon and Naor (2004)).

There exists an absolute constant \(K_\mathrm{G}\) such that for every matrix \(A\in \R^{n\times m}\) and degree-\(2\) pseudo-distribution \(\mu\from \sbits^{n}\times \sbits^m\to \R\), \[ \pE_{\mu(x,y)}\iprod{Ax,y} \le K_{\mathrm{G}} \cdot \max_{x\in \sbits^m, y\in \sbits^n} \iprod{Ax,y}\,. \]

Up to now we have defined only pseudo-distributions over \(\bits^\ell\) for some \(\ell\in\N\), but here it is convenient to work with pseudo-distributions defined over \(\sbits^\ell\). We can simply use the linear map \(x \mapsto \Ind - 2x\) to map one set to the other, but it is also easy to define directly the notion of pseudo-distributions and pseudo-expectations over the signed Boolean cube \(\sbits^\ell\). The only difference is that when reducing a general polynomial to a multilinear one, in the \(\sbits\) case we use the identity \(x_i^2=1\) instead of \(x_i^2=x_i\) as we did in the \(\bits\) case.

By the duality between pseudo-distributions and sos certificates, Grothendieck’s inequality is also equvialent to the statement that the polynomial \(K_G\cdot\norm{A}_{\infty\to 1}-\iprod{Ax,y}\) has a degree-\(2\) sos certificate.

The smallest value of \(K_{\mathrm G}\) satisfying this inequality is known as Grothendieck’s constant. Computing the exact numerical value of this constant is a longstanding open problem, though we know that it is around \(1.7\). In 1977, Krivine proved that \(K_{\mathrm G} \leq \tfrac{\pi}{2\log(1+\sqrt{2})} \sim 1.782..\) and conjectured that this bound is tight. However, this conjecture was disproved by Braverman et al. (2011) . Raghavendra and Steurer (2009) showed that one can compute \(K_{\mathrm G}\) up to accuracy \(\epsilon\) in time double exponential in \(1/\epsilon\).

We show a proof of Grothendieck’s inequality due to Krivine (see Alon and Naor (2004)).

As in the proof for Max Cut, we may assume that \(\mu\) has mean~\(0\). We will show that there are joint Gaussian vectors \(\xi,\zeta\) such that \[ \pE_{\mu(x,y)} x\transpose y = K_{\mathrm{Krivine}} \cdot \E_{\xi,\zeta} (\sign \circ \xi)\transpose {(\sign \circ \zeta)} \, \label{eq:krivine} \] where \(K_{\mathrm{Krivine}}\) is an absolute constant to be determined later. (Here, \((\sign\circ \xi)\in\bits^m\) and \((\sign \circ\zeta)\in \sbits^n\) denote the vectors obtained by taking the signs coordinate-wise for \(\xi\) and \(\zeta\).) Equation \eqref{eq:krivine} implies the theorem because \[ \begin{aligned} \pE_\mu \iprod{Ax,y} & = \Tr A \pE_\mu x\transpose y \\ & = K_{\mathrm{Krivine}} \cdot \Tr A \E_{\xi,\zeta} (\sign \circ\xi)\transpose{ (\sign \circ \zeta)}\\ & = K_{\mathrm{Krivine}} \cdot \E_{\xi,\zeta} \bigiprod{A (\sign\circ \xi), (\sign\circ \zeta)}\\ & \le K_{\mathrm{Krivine}} \cdot \norm{A}_{\infty\to 1}\,. \end{aligned} \] It remains to show the existence of Gaussian vectors such that \eqref{eq:krivine} holds. We will choose the Gaussian vectors such that the diagonals of the covariances \(\E \dyad \xi\) and \(\E \dyad \zeta\) are all ones. Then, as in the proof for Max Cut (also see Reference:grothendieck-identity), \[ \E_{\xi,\zeta} (\sign \circ\xi)\transpose {(\sign\circ \zeta)} = \tfrac 2\pi\cdot \arcsin\circ \Paren{\E \xi \transpose \zeta}\,, \] where we apply the \(\arcsin\) function entry-wise to the matrix \(\E \xi\transpose \zeta\). Therefore, our goal is to choose the distribution of \(\xi,\zeta\) such that \[ \sin\circ\Paren{c \cdot\pE_\mu x\transpose y} = \pE \xi\transpose \zeta\,, \] where \(c=\tfrac \pi {2 {K_{\mathrm{Krivine}}}}\) and we apply the \(\sin\) function again entry-wise. By Reference:sin-and-sinh-applied-to-block-psd-matrices below, the following matrix is positive semidefinite \[ \Paren{\begin{matrix} \sinh \circ \Paren{c \pE_\mu \dyad x} & \sin \circ \Paren{c \pE_{\mu} x\transpose y} \\ \sin \circ \Paren{c \pE_{\mu} y\transpose x} & \sinh \circ \Paren{c \pE_\mu \dyad y}\\ \end{matrix}} \] It follows that we can choose \((\xi,\zeta)\) to be Gaussian vectors with the above matrix as covariance. Recall that we required the entries of \(\xi\) and \(\zeta\) to have variance \(1\). Since \(\pE_\mu \dyad x\) and \(\pE_\mu \dyad y\) are all ones on their diagonals, this requirement translates to the condition \(\sinh(c)=1\). The solution to this equation is \(c=\sinh^{-1}(1)=\ln(1+\sqrt 2)\). Therefore we can choose \({K_{\mathrm{Krivine}}}=\frac \pi{2 \ln(1+\sqrt 2)}\le 1.783\) for the conclusion of the theorem.

Exercises to complete Krivine’s proof of Grothendieck’s inequality

The following exercises ask you to fill in some details for Krivine’s proof of Grothendieck’s inequality.

Show that for every \(\rho\in\R\) with \(-1\le \rho\le 1\) \[ \E_{s,t\sim \cN(0,1)} \sign s \cdot \sign \Paren{\rho \cdot s + \sqrt{1-\rho^2}\cdot t} = \tfrac 2 \pi \arcsin\rho\,. \]

For every two matrices \(M,N\) of the same dimension we define the Hadamard product of \(M\) and \(N\), denoted as \(M\odot N\), as the matrix \(H\) where \(H_{i,j} = M_{i,j}N_{i,j}\) for all \(i,j\). Prove that if \(M\) and \(N\) are psd then so is \(M\odot N\).

Let \(p\) be a univariate polynomial with nonnegative coefficients in the monomial basis. Show that for every positive semidefinite matrix \(M\in\R^{n\times n}\), the matrix \(N=p\circ M\) with entries \(N_{i,j}=p(M_{i,j})\) is also positive semidefinite.

Let \(p=\sum_i p_i x^i\) be a univariate polynomial and let \(p_+ = \sum_i \abs{p_i} x^i\) be the corresponding polynomial with only nonnegative coefficients. Show that for every 2-by-2 block psd matrix \(\Paren{\begin{smallmatrix} A & B \\ \transpose B & D\end{smallmatrix}}\), the following matrix is also positive semidefinite, \[ \Paren{\begin{matrix} p_+ \circ A & p \circ B \\ p\circ \transpose B & p_+ \circ D \end{matrix}}\,. \]

Show that there exists a sequence of univariate polynomials \(\set{\super p k}_{k\in \N}\) that converges point-wise to the \(\sin\) function (i.e., \(\lim_{k\to \infty} \super p k(x)=\sin x\) for every \(x\in\R\)). Show that the corresponding polynomials \(\set{\super p k _+}_{k\in \N}\) with nonnegative coefficients in the monomial basis converges point-wise to the \(\sinh\) function.Hint: Look up the Taylor-series expansion of the \(\sin\) and \(\sinh\) functions.

Show that for every 2-by-2 block psd matrix \(\Paren{\begin{smallmatrix} A & B \\ \transpose B & D\end{smallmatrix}}\), the following matrix is also positive semidefinite, \[ \Paren{\begin{matrix} \sinh \circ A & \sin \circ B \\ \sin\circ \transpose B & \sinh \circ D \end{matrix}}\,. \]

More general Grothendieck-type inequalities

We have used crucially the fact that we need to optimize on disjoint sets of variables \(x_1,\ldots,x_n\) and \(y_1,\ldots,y_n\) in the proof above, since we only needed to fix the two off-diagonal blocks of the covariance matrix of the Gaussian we used, and so had freedom in choosing the two diagonal blocks in a way to help make this matrix psd. One can ask more general questions of looking at maximizers of the form \(x^\top A x\) where \(x\in\sbits^{2n}\) and \(A\) is an arbitrary matrix whose support (i.e., non zero entries) is contained in some graph \(H\). The Grothendieck constant of \(H\) is the maximum over all such matrices of the ratio between the pseudo-distribution and actual value. The standard Grothendieck value corresponds to the case that \(H\) is bipartite but one can study the questions for other graphs as well. For some graphs \(H\), the Grothendieck constant corresponding to \(H\) might not be an absolute constant but can depend on \(H\). Specifically, Alon et al. (2005) show that there are some absolute constants \(c,C\) such that the Grothendieck constant of \(H\) always lies in \([c\log \omega(H),C\log \chi(H)]\) where \(\omega(H)\) denotes the clique number of \(H\) and \(\chi(H)\) denotes the chromatic number of \(H\).

References

Alon, Noga, and Assaf Naor. 2004. “Approximating the Cut-Norm via Grothendieck’s Inequality.” In STOC, 72–80. ACM.

Alon, Noga, Konstantin Makarychev, Yury Makarychev, and Assaf Naor. 2005. “Quadratic Forms on Graphs.” In STOC, 486–93. ACM.

Braverman, Mark, Konstantin Makarychev, Yury Makarychev, and Assaf Naor. 2011. “The Grothendieck Constant Is Strictly Smaller Than Krivine’s Bound.” In FOCS, 453–62. IEEE Computer Society.

Khot, Subhash, and Assaf Naor. 2012. “Grothendieck-Type Inequalities in Combinatorial Optimization.” Comm. Pure Appl. Math. 65 (7): 992–1035. doi:10.1002/cpa.21398.

Pisier, Gilles. 2012. “Grothendieck’s Theorem, Past and Present.” Bull. Amer. Math. Soc. (N.S.) 49 (2): 237–323. doi:10.1090/S0273-0979-2011-01348-9.

Raghavendra, Prasad, and David Steurer. 2009. “Towards Computing the Grothendieck Constant.” In SODA, 525–34. SIAM.