Evidence Lower Bound

Definition

l(θ)=t=1slnz=1kπzp(x;θz)t=1sz=1kqtz[lnπz+lnp(xt;θz)lnqtz]

Where q is a vector of convex weights for each sample, (qtk denotes the probability that sample t is part of the k-th cluster, intuitively)

Derive ELBO (CIL-style)

Starting with some loss function l(θ):

l(θ)=t=1slnz=1kπzp(x;θz)=t=1slnz=1kqtzπzp(x;θz)qtzt=1sz=1kqtzlnπzp(x;θz)qtz=t=1sz=1kqtz(lnπz+lnp(x;θz)lnqtz)

Where we used Jensens inequality with qtz as the positive parameters:

$

jensen-inequality

Jensen Inequality

Definition Finite Form

For real convex function φ and positive weights ai
φ(aixiai)(aiφ(xi)ai)