1. Dimension Reduction

1.1.1 Motivation

1.2 Linear Autoencoder

The linear autoencoder is obtained by identifying F, G with linear maps
Linear encode:

F:xz=Wx

Linear decoder:

G:zx^=Vz

This leads to following Risk:

R(W,V)=P:=VW=E[12∣∣xPx2]

Def 1.2.1 Centering

The data is centered, if E[x]=0. Data can be centered by transforming xxE[x], i.e. by subtracting the mean.
For centered data and the squared loss: optimal affine reconstruction maps are linear.

The weight matrices found for linear maps are non-identifiable and one needs to be careful not to over-interpret the found representation.

The reconstruction map of a linear autoencoder is of rank less or equal to m. The bottleneck layer constitutes a rank constraint.

1.3 Projection

Orthogonal Projection

For given subspace U the optimal reconstruction map P is the matrix represent- ing the orthogonal projection ΠU . The optimal linear autoencoder represents a projection.

The optimal weight matrix of the autoencoder with tied parameter matrices V=WT is orthogonal.

1.4 PCA

For centered data, the optimal autoencoder represents the projection P which maximizes the variance.

Sufficient Statistics

The optimal projection is fully determined by the covariance matrix of the data, i.e. E[xxT] are sufficient statistics (together with E[x] used in centering).