Sum-of-squares: proofs, beliefs, and algorithms — Boaz Barak and David Steurer

The sos algorithm over general domains

So far our focus has been on the task of optimizing some \(n\)-variate polynomial \(f\) over the Boolean cube. But the sos algorithm is applicable in a much more general setting. In particular, we can replace the Boolean cube with any set \(\Omega \subseteq \R^n\) that is defined by polynomial inequalities. This is important to capture computational problems that go beyond those that we have considered so far. We now make the appropriate general definitions of sos proofs, pseudo-distributions, and state the more general theorem regarding the sos algorithm.

It turns out that in the most general setting some issues arise that do not come up for the hypercube. So far these issues seem to be pathological in the sense that one can construct bad examples but they don’t play a role when designing algorithms based on sum-of-squares. We will highlight an important special case that avoids those issues altogether and still significantly generalizes the hypercube.

General sum-of-squares proofs

Let \(\R[x]\) be the ring of polynomials with real coefficients in variables \(x=(x_1,\ldots,x_n)\). Let \(\cA=\set{f_1\ge 0,\dots,f_m\ge 0}\) be a system of polynomial constraints with \(f_1,\ldots,f_m\in\R[x]\). We will be interested in two kinds of problems related to \(\cA\):

decide if \(\cA\) has a solution or if it is infeasible,
given a polynomial \(g\in \R[x]\), decide if \(g\) is nonnegative over the set of solutions to \(\cA\).

We say that a polynomial \(p\in \R[x]\) is sum-of-squares (sos) if there are polynomials \(q_1,\ldots,q_r\in\R[x]\) such that \(p=q_1^2+\cdots+q_r^2\).s

A sum-of-squares (sos) proof that the system of polynomial constraints \(\cA\) implies the constraint \(\set{g\ge 0}\) consists of sum-of-squares polynomials \((p_S)_{S\subseteq [m]}\) in \(\R[x]\) such thatIn \eqref{eq:sos-proof-equation} we adopt the convention that empty products are \(1\), so that \(\prod_{i\in S}f_i=1\) for \(S=\emptyset\). \[ g = \sum_{S\subseteq [m]} p_S \cdot \prod_{i\in S} f_i\,.\label{eq:sos-proof-equation} \] We say that this proof has degree at most \(\ell\) if each summand has degree at most \(\ell\), i.e., every set \(S\subseteq [m]\) satisfies \(\deg(p_S\cdot \Pi_{i\in f_i})\le \ell\). If there exists such a proof of degree at most \(\ell\), we write \[ \cA \vdash_\ell \{g \ge 0\}\,.\label{eq:sos-proof} \]

The identity \eqref{eq:sos-proof-equation} implies that \(g\) is nonnegative for every point \(x\) that satisfies the system \(\cA\) because each summand on the right-hand side is nonnegative. Given \(\cA\), \(g\), and sos polynomials \((p_S)_{S\subseteq [m]}\) (together with their sos representations), we can efficiently verify that \eqref{eq:sos-proof-equation} holds by comparing coefficients on the left and on the right.

In order to emphasize the variables for the proofs, we sometimes write \eqref{eq:sos-proof} as \(\{f_1(x)\ge0,\ldots,f_m(x)\ge0\}\vdash_{x,\ell} \{g(x)\ge 0\}\).This convention is useful when the functions \(f_1,\ldots,f_m\) and \(g\) involve variables besides the “proof variables”. The distinction is important sometimes because non-proof variables do not count toward the degree of the proof and are treated as constants.

The following theorem (Krivine’64, Stengle’74) shows that sum-of-squares proofs are enough to decide the infeasibility of systems of polynomial constraints. However, the proof is highly non-constructive and certainly does not give any bounds on the degree (which we would need in order to use sum-of-squares proofs for the design of algorithms).

For every system of polynomial constraints \(\cA=\{f_1\ge 0,\ldots,f_m\ge 0\}\), either there exists a solution or there exists a sum-of-squares proof \(\cA \vdash_\ell \{ -1 \ge 0\}\) for some \(\ell\in \N\).

We refer to a sos proof of the form \(\cA\vdash_\ell\set{-1\ge 0}\) as a degree-\(d\) sum-of-squares refutation for \(\cA\).

It turns out that sum-of-squares proofs capture many kinds of “real-world mathematical proofs”. In particular, they obey the following intuitive inference rules, for all system of polynomial constraints \(\cA,\cB,\cC\), polynomials \(f,g\from \R^n \to \R\) and \(F\from \R^n \to \R^m, G\from \R^n \to \R^k, H\from \R^p\to \R^n\), \[ \begin{aligned} \text{addition: }&\frac{\cA \vdash _\ell \{ f\ge 0 , g \ge 0\}} {\cA \vdash_{\ell} \{ f+g\ge 0\}}, \\ \quad \text{multiplication: } &\frac{\cA \vdash _{\ell} \{ f\ge 0\},~ \cA \vdash_{\ell'} \{g \ge 0\}} {\cA \vdash_{\ell+\ell'} \{ f\cdot g \ge 0\}}\mcom \\[0.5em] \text{transitivity: }&\frac{\cA \vdash _\ell \cB,~ \cB\vdash _{\ell'} \cC} {\cA \vdash _{\ell\cdot \ell'} \cC}\mcom \\[0.5em] \text{substitution: }&\frac{\{F\ge 0\} \vdash_{\ell} \{ G \ge 0\}} {\{F(H))\ge 0\} \vdash_{\ell\cdot \deg(H)} \{ G(H)\ge 0\}} \mper \end{aligned} \]

Exercises

Let \(p\in \R[t]\) be a univariate degree-\(d\) polynomial such that \(p(t)\ge 0\) for all \(t\in \R\). Show that \[ \vdash_d \{ p \ge 0\}\,. \]

Let \(d\in\N\) and \(f\in \R[x]\) be a polynomial of degree at most \(d\). Show that there exists a scalar \(M >0\) such that \[ \{\norm{x}^2 \le 1\} \vdash_d \{f \le M\}\,. \]

Pseudo-distributions

We can represent a finitely supported probability distribution over \(\R^n\) by its probability mass function \(\mu\from \R^n\to \R_{\ge 0}\) so that \(\mu(x)\) is equal to the probability of the point \(x\in \R^n\) under the distribution. We define pseudo-distributions as generalizations of such probability mass functions. First, we define the formal expecation of a function \(f\from \R^n\to \R\) under a finitely supported function \(\mu\), \[ \pE_{\mu} f = \sum_{x\in \support(\mu)} f(x)\cdot \mu(x)\,. \] Note that since \(\mu\) is finitely supported there are no issues of measurability or convergence here.

A finitely supported function \(\mu\from \R^n\to \R\) is a degree-\(d\) pseudo-distribution if \(\pE_\mu 1 =1\) and \(\pE_\mu f^2\ge 0\) for all polynomials \(f\) on \(\R^n\) with \(\deg f\le d/2\).

Furthermore, if \(\support(\mu)\subseteq \Omega\), we say that \(\mu\) is a degree-\(d\) pseudo-distribution over \(\Omega\).

Note that these definitions are consistent with our previous definition for pseudo-distributions over the hypercube.

Unlike for actual probability distributions, many computational problems for pseudo-distributions admit efficient algorithms. The following characterization in terms of positive semidefinite matrices is the main structure that those algorithms exploit.

Let \(\mu\from\R^n\to \R\) be finitely supported and \(\pE_\mu 1=1\). Then, \(\mu\) is a degree-\(d\) pseudo-distribution if and only if formal degree-\(d\) moment matrix \(\pE_{\mu(x)}\dyad{\Paren{(1,x)^{\otimes d/2}}}\) is positive semidefinite.

Suppose that the degree-\(d\) moment matrix \(M\) is positive semidefinite. Let \(p\) be any polynomial of degree at most \(d/2\). Let \(v\) be a vector such that \(p(x)=\iprod{p,(1,x)^{\otimes d/2}}\). Then, \(\pE_{\mu} p^2=\iprod{v,M v}\ge 0\). It follows that \(\mu\) is a degree-\(d\) pseudo-distribution.

On the other hand, suppose that this moment matrix \(M\) is not positive semidefinite. Then, there exists a vector \(v\) such that \(\iprod{v,Mv}<0\). Let \(p\) be the polynomial \(p(x)=\iprod{v,(1,x)^{\otimes d/2}}\). Then, \(\pE_\mu p^2 = \iprod{v,M,v}<0\). It follows that \(\mu\) is not a degree-\(d\) pseudo-distribution.

The following definition formalizes what it means for a pseudo-distribution to satisfy a system of polynomial constraints.

Let \(\cA=\set{f_1\ge 0,\ldots,f_m\ge 0}\) be a set of polynomial constraints over \(\R^n\). Let \(\mu\from \R^n\to \R\) be a pseudo-distribution. We say that \(\mu\) satisfies \(\cA\) at degree \(\ell\), denoted \(\mu \models_\ell \cA\), if every set \(S\subseteq [m]\) and every sos polynomial \(p\) on \(\R^n\) with \(\deg h+\sum_{i\in S}\max\set{\deg f_i,\ell}\le d\) satisfiesThe parameter \(\ell\) here allows us some additional freedom which will be important for stating the interaction between sum-of-squares proof and pseudo-distributions. The idea behind the condition \(\sum_{i\in S}\max\set{\deg f_i,\ell}\) is that it bounds the degree of the polynomial \(h\cdot \prod_{i\in S}f_i\) by \(d\), even when we treat each \(f_i\) to have degree \(\max\set{\deg f_i,\ell}\). \[ \pE_{\mu} h\cdot \prod_{i\in S}f_i\ge 0\,. \label{eq:pseudo-distribution-constraint} \]

We write \(\mu \models \cA\) (without further specifying the degree) if \(\mu\models_0 \cA\).

Note that if a degree-\(d\) pseudo-distribution \(\mu\) satisfies \(\mu\models \set{f\ge 0}\) for a polynomial \(f\) with \(\deg f\le d\), then \(\pE_\mu f \ge 0\). However, if \(\mu\) only satisfies \(\mu\models_\ell \set{f\ge 0}\), then we can only conclude \(\pE_\mu f \ge 0\) if \(\ell\le d\).

Duality between pseudo-distributions and sum-of-squares proofs

The following theorem shows a duality between pseudo-distributions and sos proofs for systems of polynomial constraints that are explicitly bounded in the sense that they contain a constraint that implies that every variable is restricted to a finite interval.The technical term for this property of systems of polynomial constraints is Archimedian.

Let \(\cA\) be a system of polynomial constraints over \(\R[x]\) that contains a constraint of the form \(\norm{x}^2\le M\) for some scalar \(M\ge 0\). Then for every even \(d\in\N\) and every polynomial \(f\in\R[x]_{\le d}\), exactly one of the following conditions is satisfied:Note that the duality between pseudo-distributions and sos proofs for the Boolean hypercube is stronger because it is possible to choose \(\e=0\) below.

for every \(\e>0\), there exists a degree-\(d\) sos proof \(\cA \vdash_d \{f \ge -\e\}\),
there exists a degree-\(d\) pseudo-distribution \(\mu\from \R^n\to \R\) such that \(\mu \models \cA\) and \(\E_\mu f\le 0\).

A direct consequence of this theorem is that for all \(\cA\) and \(f\) as above, the supremum of the set \(\{c \in \R \mid \cA \vdash_d f \ge c\}\) is equal to the minimum value of \(\E_\mu f\) over all degree-\(d\) pseudo-distributions \(\mu\) such that \(\mu \models \cA\).

An important special case of the above theorem is that \(f=-1\). In this case the theorem says that either \(\cA\) has a degree-\(d\) sos refutation so that \(\cA \vdash_d \{-1\ge 0\}\) or there exists a degree-\(d\) pseudo-distribution \(\mu\) that satisfies the constraints \(D \models A\).

Let \(C\subseteq \R[x]_{\le d}\) be the cone of polynomials \(g\) such that \(\cA \vdash_d \{g\ge 0\}\). We will show that if \(f\) is in the closure of \(C\), then \(\cA \vdash_d \{ f \ge -\e\}\) for all \(\e >0\) and thatif \(f\) is not in the closure of \(C\), then there exists a degree-\(d\) pseudo-distribution \(\mu\) such that \(\mu\models \cA\) and \(\pE_\mu f <0\).

If \(f\) is in the closure of \(C\), then by convexity there exists a polynomial \(g\in C\) such that \((1-\e)f + \e g\in C\) for all \(\e >0\), which means that \(\cA \vdash_d \{ f \ge -\e (g-f)\}\) for all \(\e>0\). Since \(\cA\) contains a constraint of the form \(\norm{x}^2\le M\), it follows that \(\cA \vdash_d \{g-f \le M'\}\) for some scalar \(M'>0\).Exercise: Show that for every polynomial \(f\in \R[x]_{\le d}\), there exists a scalar \(M>0\) such that \(\{\norm{x}^2\le 1\}\vdash_d \{f \le M\}\). Putting these sos proofs together, we get \(\cA \vdash_d \{f \ge -\e\}\) for all \(\e>0\).

If \(f\) is not in the closure of \(C\), then there exists a separating linear functional \(\phi\) such that \(\phi[f]<0\) and \(\phi[g]\ge 0\) for every \(g\in C\). We claim that by rescaling we may assume \(\phi[1]=1\). It is enough to show that \(\phi[1]>0\). Indeed, since \(\cA\) contains a constraint of the form \(\norm{x}^2\le M\), we have \(\cA \vdash_d \{f \le M'\}\) for some scalar \(M'>0\). Therefore, \(M'-f\in C\) and \(0\le \phi[M'-f] = M'\cdot \phi[1]- \phi[f]<M'\cdot \phi[1]\), which means that \(\phi[1]>0\).

Using multivariate interpolation, we can represent the linear functional \(\phi\) as a linear combination of point evaluations, i.e., there exist points \(\super y 1,\ldots,\super y m\in \R^n\) and scalars \(\alpha_1,\ldots,\alpha_m\in \R\) such that \(\phi[g]=\sum_{i=1}^m \alpha_i \cdot g(\super y i)\) for all \(g\in \R[x]_{\le d}\). Let \(\mu\from \R^n\to \R\) be the finitely-supported function such that \(\mu(\super y i)=\alpha_i\) for all \(i\in [m]\) and \(\mu(y)=0\) for \(y\in \R^n\setminus \{\super y 1,\ldots,\super y m\}\). Then, \(\mu\) is a degree-\(d\) pseudo-distribution with \(\mu \models \cA\) and \(\pE_\mu f <0\).

Soundness and completeness

It turns out that the duality between pseudo-distributions and sum-of-squares proofs is related to the idea that sum-of-squares proofs are a sound and complete proof system when allowing pseudo-distributions as models.

We remark that the following lemmas about soundness and completeness do not contain significant new ideas beyond the duality of pseudo-distributions and sos proofs. However, they are useful in order to reason about pseudo-distributions and sos proofs in a composable way.

Let \(\mu\) be a pseudo-distribution and let \(\cA,\cB\) be systems of polynomial constraints. Suppose \(\mu\) satisfies \(\mu \models_\ell \cA\) and there exists a sum-of-squares proof \(\cA\vdash_{\ell'}\cB\). Then, \(\mu\) satisfies \(\mu\vdash_{\ell\cdot \ell'}\cB\).

This soundness lemma shows that sum-of-squares proofs allow us to reason about properties of pseudo-distributions.

The following completeness theorem shows that sum-of-squares proof allow us to reason about all properties of pseudo-distributions (under the mild technical assumption that the system of polynomial constraints is explicitly bounded).

Let \(d\ge \ell' \ge \ell\), and let \(\cA,\cB\subseteq \R[x]\) be systems of polynomial constraints such that \(\cA\) contains a constraint of the form \(M-\sum_{i=1}^nx_i^2\ge 0\) for some \(M\ge 0\). Suppose every degree-\(d\) pseudo-distribution \(\mu\) that satisfies \(\mu\models_\ell \cA\) also satisfies \(\mu \models_{\ell'} \cB\), then for every \(\e>0\) there exists a sum-of-squares proof \(\cA \vdash _d \cB_\e\), where \(\cB_\e\) is the system obtained from \(\cB\) by weakening each constraint by \(\e\).

General sum-of-squares algorithm

The following theorem shows that we can efficiently search through pseudo-distributions satisfying a system of polynomial constraints.

There exists an algorithm that given \(d\) and a satisfiable, explicitly bounded system of polynomial constraints \(\cA\) over \(\R^n\), outputs in time \(n^{O(d)}\) a degree-\(d\) pseudo-distribution that approximately satisfies \(\mu \models \cA\) up to error \(2^{-n}\).We have not defined what it means for a pseudo-distribution to approximately satisfy a system of polynomial constraints. The idea is that we allow a small slack for all of the equations \eqref{eq:pseudo-distribution-constraint}.

About approximation errors and bit complexity: We can still use the same kind of sum-of-squares proofs in order to argue about properties of pseudo-distributions that approximately satisfy a system of polynomial constraints. However, one caveat is that in order for approximation errors not to amplify we need to ensure that the bit complexity of the coefficients of the sum-of-squares proofs we apply are polynomially bounded. While there are examples that require sum-of-squares proofs with exponential bit complexity (O’Donnell 2016), the proofs that arise in the settings we consider for designing algorithms have small bit complexity.

Sum-of-squares certificates over instance-independent variety

In this section, we discuss properties of pseudo-distributions and sos proofs in a setting that generalizes the Boolean hypercube but avoids some of issues that arise in the general case.

Let \(\Omega\subseteq\R^n\) be an algebraic set (defined by a system of polynomial equations) such that \(f_1,\ldots,f_m\) are a linear basis for the set of polynomials of degree at most \(d\) that vanish over \(\Omega\), \[ \Span\Set{f_1,\dots,f_m} = \R[x]_{\le d} \cap I(\Omega)\,. \label{eq:nice-basis} \] (Here, \(I(\Omega)\) denotes the set of polynomials that vanish over \(\Omega\).) For many interesting choices of \(\Omega\) (e.g., the Boolean hypercube and the Euclidean sphere), we can construct such a basis \(f_1,\ldots,f_m\) in time \(n^{O(d)}\).

We will show that if we are given such a basis for the degree-\(d\) part of the ideal of \(\Omega\), then much about sum-of-squares proofs and pseudo-distributions works like for the hypercube.

Let \(\Omega\) and \(f_1,\ldots,f_m\in \R[x]_{\le d}\) be as in \eqref{eq:nice-basis}. Let \(\cA=\{f_1=0,\ldots,f_m=0\}\). Then, every \(f\in \R[x]_{\le d}\) satisfies exactly one of the following conditions:

there exists a degree-\(d\) sos proof \(\cA\vdash_d \{ f\ge 0\}\),
there exists a degree-\(d\) pseudo-distribution \(\mu\from \R^n\to \R\) such that \(\mu \models \cA\) and \(\pE_\mu f< 0\).

As for the hypercube, we can make this theorem algorithmic.

For every even \(d\in \N\), there exists an \(n^{O(d)}\)-time algorithm that given a basis \(f_1,\ldots,f_m\) for \(\R[x]_{\le d}\cap I(\Omega)\) as in \eqref{eq:nice-basis} and a polynomial \(f\in \R[x]_{\le d}\) outputs,

either a degree-\(d\) sos proof \(\cA\vdash_d \{f\ge 0\}\),
or a degree-\(d\) pseudo-distribution \(\mu: \R^n\to \R\) such that \(\mu\models \cA\) and \(\pE_\mu f \le 2^{-n}\).

We remark that under mild assumptions on \(\Omega\), for every degree-\(d\) pseudo-distribution \(\mu\from \R^n\to \R\) such that \(\mu\models \cA\), there exists a degree-\(d\) pseudo-distribution \(\mu'\from \Omega\to\R\) with the same first \(d\) moments, so that \(\pE_{\mu(x)}(1,x)^{\otimes d}=\pE_{\mu'(x)}(1,x)^{\otimes d}\).

References

O’Donnell, Ryan. 2016. “SOS Is Not Obviously Automatizable, Even Approximately.” Electronic Colloquium on Computational Complexity (ECCC) 23: 141.