Probabilistic Latent Semantic Analysis (pLSA)

Definition

  1. Define a distribution over the latent (i.e. topic) variables
    • document is characterized by a distribution over topics: p(z|d)
  2. Define a conditional distribution of a word given a topic
    • topics are represented as distributions over words: p(w|z)
      Use two stage sampling process:
  3. Sample a topic from p(z|d)
  4. Sample a token given the topic from p(w|z)

Log-Likelihood

l(θ;N)=logp(N;θ)=i,jNijlogp(wj|di),p(wj|di)=t=1kp(wj|zt)p(zt|di)