KL divergence

Variational Inference and Bayesian Gaussian Mixture Model

When your computer cannot handle the burden of MCMC, you might as well allow for some bias and do some heavy math yourself

Kang Gyeonghun

2020-08-25

Probabilistic Machine Learning

Suppose that the true density of a random variable $x$ is $p(x)$. Since this is unknown, we can try to come up with an approximation $q(x)$. Then KL divergences is a good measure of mismatch between $p$ and $q$ distribution. $$ \begin{align*} \text{KL divergence:}\quad KL(p||q) = \int p(x)\log \dfrac{p(x)}{q(x)}dx \end{align*} $$ From the formula we can see that KL divergence is a weighted average, with wighted $p(x)$, of an error induced by approximation ($\log p(x) - \log q(x)$).

How do we quantify an amount of information that some data $x$ contains? If the data is pretty much expected than it tells nothing new to us. But if it is so rare then it has some value. In this sense, we can think of an amount of information as a “degree of surprise”, and define $$ \text{information content of data $x$:}\quad h(x) = -\log p(x) $$ where the logarithm ensures $h(x,y)=h(x)+h(y) \Leftrightarrow p(x,y)=p(x)p(y)$, and the negative sign makes $h(x)\geq 0$.

KL divergence

Variational Inference and Bayesian Gaussian Mixture Model

Forward and Reverse KL divergence

Interpretation of MLE in terms of KL divergence

Note on Kullback-Leibler Divergence