Probabilistic Latent Semantic Analysis (pLSA)
- derive a low-dimensional representation of the observed variables in terms of their affinity to certain hidden variables
- assign each word occurence a topic
- document , word , topic
Definition
- Define a distribution over the latent (i.e. topic) variables
- document is characterized by a distribution over topics:
- Define a conditional distribution of a word given a topic
- topics are represented as distributions over words:
Use two stage sampling process:
- Sample a topic from
- Sample a token given the topic from
Log-Likelihood