Matrix Approximation
Goal: Matrix Completion
Assume we have a sparse matrix
So our minimal assumption must be that entries within the same row or the same column are not independent.
We do this via conditional independence:

Recommender Systems
We analyze patterns of interest in items (products, movies) and provide personalized recommendations for users.
Collaborative Filtering
We want to exploit collective data from many users and generalize across users, and possibly across items.
Example applications are Amazon, Netflix, online advertising, etc.
Netflix Data
Input: user-item preferences stored in a matrix

Sparseness level can be very low, 1% and less
Formal definition:
Preprocessing / Normalization
Centering
Mean rating per user/item

This lets us center the data:

Variance Normalization
Then the noramlized scores or z-scores are:
Convexity Definition
If
if
Our scalar problem is non-convex (exception:
Gradients
Vector / Matrix notation
$\nabla_u l = Rv, \quad \nabla_vl = R^Tu, \quad R := (uv^T - A) $
Hessian Map
$ \nabla^2 l(u,v) = \left( \begin{matrix}
||v||^2 1_{n \times n} & 2uv^T - A\
(2uv^T - A)^T & ||u||^2 1_{m \times m}
\end{matrix} \right)
0 & -A\
-A^T & 0
\end{matrix} \right)$
This is scalar to the mulitdimensional case. The hessian at zero is not positive-semi-definit , unless
This problem is non-convex for all dimensions
Gradient Flow
Assume
We look at
This ODE can be solved, yields:

Fully observed Rank 1 Model
Useful identiy:
We can then rewrite the objective:
Optimality Conditions
Directionality of
This can be solved with Lagrange multipliers:
First order optimality condition:
Eigenvector Equations
By putting both optimality conditions together we get the eigenvector equations:
Power iterations can be used for computation.
The generalization of this results in the Singular Value Decomposition