Singular Value Decomposition

SVD Theorem

For each matrix ARn×m, there exist orthogonal matrixes URn×n and VRm×m such tat A can be expressed as:
A=UΣVT,Σ=diag(σ1,,σmin{n,m}),σiσi+10i

  • The set of singular values is unique
  • Left/right singular vectors (columns or U and V) can have arbitrary sign

Computation Complexity

SVD is computable in O(min{nm2,mn2}

SVD and Frobenius norm

Let the SVD of ARn×m be given by A)UΣVT, then:
||A||F2=i=1minn,mσi2

SVD and Spectral norm

Let the SVD of ARn×m be given by A)UΣVT, then:
||A||2:=sup{||Ax||:||x||=1}=σ1

Reduced SVD

One can often prune columns of U or V corresponding to zero elements in Σ (or zero-padded columns/rows)
4-archive/cil/theory/old/assets/reduced-svd.png

Eckart Young Theorem

By pruning the singular values below σk in the SVD representation, we get an optimal rank k approximation of a matrix

  • Is fundamental to many low-rank approximation problems
  • This means that approximations for any k can directly be read-off the SVD

Formal definition:

Given ARn×m with SVD A=UΣVT. Then for all 1kminn,m:

Ak:=Udiag(σ1,,σk)VTargmin{||AB||F:rank(B)k}

Corollary

The squared error of low rank approximations can be expressed as:
||AAk||F2=i=k+1rank(A)σi2
Proof:
AAk=Udiag(0,,0,,σk+1,,σmin{n,m})VT

SVD and PCA

SVD is intimately related to eigendecomposition

  • A is square and symmetric: U and V have equal columns up to possible sign differences
  • if A is positive semi-definit, then the SVD is equal to the eigendecomposition
  • Squares of A: AATRn×n as well as ATARm×m:
    AAT=UΣΣTUT=Udiag(σ12,,σn2)UTATA=VTΣTΣV=Vdiag(σ12,,σn2)
    convention: σr=0 for min{n,m}<rmax{n,m}
SVD can be applied to the data matrix to identify the principal eigenvectors of the covariance matrix (PCA)

SVD and Matrix Completion

We generalize the simple rank 1 model to a rank k approximation.
We do this by additive superposition of k rank 1 matrices.
If all matrix entries are completely obeserved, the solution is constructively given by the SVD.

k principal left singular vectors, paired with the respective right singular vector form outer products:
Ai=1kσiuiviT

Low-rank matrix approximation is non-convex, even for the completely observed case

$

SVD

Computation Complexity

SVD is computable in O(min{nm2,mn2}

SVD with Imputation

Naive strategy:
1. estimate values for missing matrix elements (e.g. row or column mean)
2. impute them to create a complete matrix
3. run SVD to compute low rank approximation

This is not a good idea

If we have an incomplete observation, SVD is in general not applicable directly to compute low-rank approximations.

NP Hardness

Weighted Forbenius norm problem:
A^k=argmin{i,j(wi,jaijbij)2},rank(B)=k

  • special case: wij=wij{0,1}, ?!!? maybe wij=wji
    This is NP-hard even for k=1
Low rank matrix reconstruction is NP hard and one has - in general - to resort to approximation algorithms. The completely observed case is special.