Here I would like to summarize my understanding of Singular Value Decomposition (SVD).

Matrix multiplication is essentially rotation and stretching

Consider the following multiplication

\begin{align} Ax &= y \label{eq:matmul} \end{align}

e.g.

\[\begin{align} \begin{bmatrix}2 & 1\\ -1 & 1\end{bmatrix} \begin{bmatrix}1 \\3 \end{bmatrix} = \begin{bmatrix}5 \\2 \end{bmatrix} \end{align}\]

Thus, a 2D vector ends in another 2D vector, which is essentially a rotation and stretching.

To see which part of \(A\) does the rotation and which part of it does the stretching, that’s where SVD comes into play.

Equivalently, we could rewrite Eq. \eqref{eq:matmul} into

\begin{align} Av &= \sigma u \end{align}

where \(v\) and \(u\) are unit vectors, you could consider both sides are divided by a factor to turn \(x\) from Eq. \eqref{eq:matmul} into a unit \(v\). \(\sigma\) is a scaling factor that plays the stretching.

When considering \(n\) orthonormal \(v\) vectors, represented as U, the above equation becomes

\begin{align} \underset{m \times n,}{A} \underset{n \times n}{V} &= \underset{m \times n,}{U} \underset{n \times n}{\Sigma} \end{align}

where \(\Sigma\) is diagonal. The reason that \(n\) \(v\) vectors are used is because \(V\) cannot be inverted otherwise. If they were not orthonormal, they could always be turned into orthonormal via Gram–Schmidt process.

Then,

\begin{align} \underset{m\times n}{A} &= \underset{m \times n,}{U} \underset{n \times n,}{\Sigma} \underset{n \times n}{V^{-1}} = U \Sigma V^T \end{align}

Since \(U\) is not square, this version of SVD is called reduced SVD. There are also full SVD and truncated SVD, see here.

Some experiments done on SVD regarding matrices shapes are recorded here.

SVD theorem

For any matrix \(A \in C^{m \times n}\), it has a singular value decomposition.

  • singular values \(\{\sigma_i\}\) are uniquely determined, and if \(A\) is square, \(\{\sigma_i\}\) is distinct.
  • \(\{u_i\}\) and \(\{v_i\}\) are also unique up to a complex sign.

This statement of the theorem is from https://youtu.be/EokL7E6o1AE?t=38m16s.

SVD and PCA

SVD could be used to solve principal component analysis (PCA).

Suppose input is the matrix \(X\), which has to be standardized to have zero mean and unit variance. The principal components of \(X\) are also the egigenvectors of the corresponding covariance matrix (\(X^TX\)).

\begin{align} X X^T = Q \Lambda Q^T \end{align}

In terms of SVD, \(X^TX\) can also be written as

\[\begin{align} X X^T &= (U \Sigma V^T)(V \Sigma U^T) \\ &= U \Sigma^2 U^T \\ \end{align}\]

Hence, \(\begin{align} Q &= U \\ \Lambda &= \Sigma^2 \\ \end{align}\)

Additional reference: https://math.stackexchange.com/questions/3869/what-is-the-intuitive-relationship-between-svd-and-pca, which says that numerically SVD is much more stable than constructing covariance matrix. SVD is one of the most common way conduct PCA.