Evidence Lower Bound
- yields approximate, often tractable objective functions
Definition
Where is a vector of convex weights for each sample, ( denotes the probability that sample is part of the -th cluster, intuitively)
Derive ELBO (CIL-style)
Starting with some loss function :
Where we used Jensens inequality with as the positive parameters:
$
jensen-inequality
Jensen Inequality
For real convex function and positive weights