#eth/cil/theory

Singular Value Decomposition

SVD Theorem

For each matrix $A \in R^{n \times m}$ , there exist orthogonal matrixes $U \in R^{n \times n}$ and $V \in R^{m \times m}$ such tat $A$ can be expressed as:
$A = U Σ V^{T}, Σ = diag (σ_{1}, \dots, σ_{min {n, m}}), σ_{i} \geq σ_{i + 1} \geq 0 \forall i$

The set of singular values is unique
Left/right singular vectors (columns or $U$ and $V$ ) can have arbitrary sign

Computation Complexity

SVD is computable in

O (min {n m^{2}, m n^{2}}

SVD and Frobenius norm

Let the SVD of $A \in R^{n \times m}$ be given by $A) U Σ V^{T}$ , then:
$| | A | |_{F}^{2} = \sum_{i = 1}^{min n, m} σ_{i}^{2}$

SVD and Spectral norm

Let the SVD of $A \in R^{n \times m}$ be given by $A) U Σ V^{T}$ , then:
$| | A | |_{2} := sup {| | A x | | : | | x | | = 1} = σ_{1}$

Reduced SVD

One can often prune columns of $U$ or $V$ corresponding to zero elements in $Σ$ (or zero-padded columns/rows)
4-archive/cil/theory/old/assets/reduced-svd.png

Eckart Young Theorem

By pruning the singular values below

σ_{k}

in the SVD representation, we get an optimal rank

k

approximation of a matrix

Is fundamental to many low-rank approximation problems
This means that approximations for any $k$ can directly be read-off the SVD

Formal definition:

Given

A \in R^{n \times m}

with SVD

A = U Σ V^{T}

. Then for all

1 \leq k \leq min n, m

$\begin{aligned} A_{k} & := U diag (σ_{1}, \dots, σ_{k}) V^{T} \\ \in \arg min {| | A - B | |_{F} : rank (B) \leq k} \end{aligned}$

Corollary

The squared error of low rank approximations can be expressed as:
$| | A - A_{k} | |_{F}^{2} = \sum_{i = k + 1}^{rank (A)} σ_{i}^{2}$
Proof:
$A - A_{k} = U diag (0, \dots, 0, \dots, σ_{k + 1}, \dots, σ_{min {n, m}}) V^{T}$

SVD and PCA

SVD is intimately related to eigendecomposition

$A$ is square and symmetric: $U$ and $V$ have equal columns up to possible sign differences
if $A$ is positive semi-definit, then the SVD is equal to the eigendecomposition
Squares of $A$ : $A A^{T} \in R^{n \times n}$ as well as $A^{T} A \in R^{m \times m}$ :
$\begin{array}{r} A A^{T} = U Σ Σ^{T} U^{T} = U diag (σ_{1}^{2}, \dots, σ_{n}^{2}) U^{T} \\ A^{T} A = V^{T} Σ^{T} Σ V = V diag (σ_{1}^{2}, \dots, σ_{n}^{2}) \end{array}$
convention: $σ_{r} = 0$ for $min {n, m} < r \leq max {n, m}$

SVD can be applied to the data matrix to identify the principal eigenvectors of the covariance matrix (PCA)

SVD and Matrix Completion

We generalize the simple rank 1 model to a rank $k$ approximation.
We do this by additive superposition of $k$ rank 1 matrices.
If all matrix entries are completely obeserved, the solution is constructively given by the SVD.

$k$ principal left singular vectors, paired with the respective right singular vector form outer products:
$A \approx \sum_{i = 1}^{k} σ_{i} u_{i} v_{i}^{T}$

Low-rank matrix approximation is non-convex, even for the completely observed case

SVD

Computation Complexity

SVD is computable in

O (min {n m^{2}, m n^{2}}

SVD with Imputation

Naive strategy:
1. estimate values for missing matrix elements (e.g. row or column mean)
2. impute them to create a complete matrix
3. run SVD to compute low rank approximation

This is not a good idea

If we have an incomplete observation, SVD is in general not applicable directly to compute low-rank approximations.

NP Hardness

Weighted Forbenius norm problem:
${\hat{A}}_{k} = \arg min {\sum_{i, j} (w_{i, j} a_{i j} - b_{i j})^{2}}, rank (B) = k$

special case: $w_{i j} = w_{i j} \in {0, 1}$ , ?!!? maybe $w_{i j} = w_{j i}$
This is NP-hard even for $k = 1$

Low rank matrix reconstruction is NP hard and one has - in general - to resort to approximation algorithms. The completely observed case is special.