Sum-of-squares: proofs, beliefs, and algorithms — Boaz Barak and David Steurer

# Mathematical background and pre work

## Mathematical background

We will not assume a lot of mathematical background in this course but will use some basic notions from linear algebra, such as vector spaces (finite dimensional and almost always over the real numbers), matrices, and associated notions such as rank, eigenvalues and eigenvectors. We will use the notion of convexity (of functions and sets) and some of its basic properties. We will also use basic notions from probability such as random variables, expectation, variance, tail bounds as well as properties of the normal (a.k.a. Gaussian) distribution. Though this will not be our main focus, we will assume some comfort with algorithms and notions such as order of growth ($$O(n),2^{\Omega(n)}$$, etc..) and some notions from computational complexity such as the notion of a reduction and the classes P and NP.

Probably the most important mathematical background for this course is that ever elusive notion of “mathematical maturity” which basically means the ability to pick up on the needed notions as we go along. At any point, please do not hesitate to ask questions when you need clarifications or pointers to some references, either in the class or on the Piazza forum.

Some references for some of this material (that include much more than what we need are):

## Pre work (“homework 0”)

Please read the lecture notes for the introduction to this course and for definitions of sum of squares over the hypercube. You don’t have to do the exercises in the lecture notes, but you may find attempting them useful. (See here for all notation used in these lecture notes.)

### Exercises:

You do not need to submit these exercises, or even to write them down properly, and feel free to collaborate with others while working on them.

All matrices and vectors are over the reals. In all the exercises below you can use the fact that any $$n\times n$$ matrix $$A$$ has a singular value decomposition (SVD) $$A = \sum_{i=1}^r \sigma_i u_i \otimes v_i$$ with $$\sigma_i \in R$$ and $$u_i,v_i \in \R^n$$, and for every $$i,j$$ $$\norm{u_i}=1$$ , $$\norm{v_j}=1$$ (where $$\norm{v} =\sqrt{\sum v_i^2}$$), and for all $$i\neq j$$, $$\iprod{u_i,u_j}=0$$ and $$\iprod{v_i,v_j}=0$$. (For vectors $$u,v$$, their tensor product is defined as $$u\otimes v$$ is the matrix $$T = uv^\top$$ where $$T_{i,j} = u_iv_j$$.) Equivalently $$A = U\Sigma V^\top$$ where $$\Sigma$$ is a diagonal matrix and $$U$$ and $$V$$ are orthogonal matrices (satisfying $$U^\top U = V^\top V = I$$). If $$A$$ is symmetric then there is such a decomposition with $$u_i=v_i$$ for all $$i$$ (i.e., $$U=V$$). In this case the values $$\sigma_1,\ldots,\sigma_r$$ are known as eigenvalues of $$A$$ and the vectors $$v_1,\dots,v_r$$ are known as eigenvectors. (This decomposition is unique if $$r=n$$ and all the $$\sigma_i$$’s are distinct.) Moreover the SVD of $$A$$ can be found in polynomial time. (You can ignore issues of numerical accuracy in all exercises.)

For an $$n\times n$$ matrix $$A$$, the \emph{spectral norm} of $$A$$ is defined as the maximum of $$\norm{Av}$$ over all vectors $$v\in\R^n$$ with $$\norm{v}=1$$. * Prove that if $$A$$ is symmetric (i.e., $$A=A^\top$$), then $$\norm{A} \leq \max_i \sum_j |A_{i,j}|$$. See footnote for hintYou can do this via the following stronger inequality: for any (not necessarily symmetric) matrix $$A$$, $$\norm{A} \leq \sqrt{\alpha\beta}$$ where $$\alpha = \max_i \sum_j |A_{i,j}|$$ and $$\beta = \max_j \sum_i |A_{i,j}|$$. * Show that if $$A$$ is the adjacency matrix of a $$d$$-regular graph then $$\norm{A} = d$$.

Let $$A$$ be a symmetric $$n\times n$$ matrix. The Frobenius norm of $$A$$, denoted by $$\norm{A}_F$$, is defined as $$\sqrt{\sum_{i,j} A_{i,j}^2}$$.

Prove that $$\norm{A} \leq \norm{A}_F \leq \sqrt{n}\norm{A}$$. Give examples where each of those inequalities is tight.

Let $$\Tr(A) = \sum A_{i,i}$$. Prove that for every even $$k$$, $$\norm{A} \leq \Tr(A^k)^{1/k} \leq n^{1/k}\norm{A}$$.

Let $$A$$ be a symmetric matrix such that $$A_{i,i}=0$$ for all $$i$$ and $$A_{i,j}$$ is chosen to be a random value in $$\{\pm 1\}$$ independently of all others.

• Prove that (for $$n$$ sufficiently large) with probability at least $$0.99$$, $$\norm{A} \leq n^{0.9}$$.
• (harder) Prove that with probability at least $$0.99$$, $$\norm{A} \leq n^{0.51}$$.

While $$\norm{A}$$ can be computed in polynomial time, both $$\max_i \sum_j |A_{i,j}|$$ and $$\norm{A}_F$$ give even simpler to compute upper bounds for $$\norm{A}$$. However the examples in the previous exercise show that they are not always tight. It is often easier to compute $$\Tr(A^k)^{1/k}$$ than trying to compute $$\norm{A}$$ directly, and as $$k$$ grows this yields a better and better estimate.

Let $$A$$ be an $$n\times n$$ symmetric matrix. Prove that the following are equivalent:

1. $$A$$ is positive semi-definite. That is, for every vector $$v\in R^n$$, $$v^\top A v \geq 0$$ (where we think of vectors as column vectors and so $$v^\top A v = \sum_{i,j} A_{i,j}v_iv_j$$).
2. All eigenvalues of $$A$$ are non-negative. That is, if $$Av = \lambda v$$ then $$\lambda \geq 0$$.
3. The quadratic polynomial $$P_A$$ defined as $$P_A(x) = \sum A_{i,j} x_ix_j$$ is a sum of squares. That is, there are linear functions $$L_1,\ldots,L_m$$ such that $$P_A = \sum_i (L_i)^2$$.
4. $$A = B^\top B$$ for some $$n\times r$$ matrix $$B$$
5. There exist a set of correlated random variables $$(X_1,\ldots,X_m)$$ such that for every $$i,j$$, $$\E X_i X_j = A_{i,j}$$ and moreover, for every $$i$$, the random variable $$X_i$$ is distributed like a Normal variable with mean $$0$$ and variance $$A_{i,i}$$.