I recently read this nice integral representation of the logarithm in 1912.05812v1 by Neri Merhan and Igar Sason. Most ideas in this post are from there. The transform is:

This can be shown by using along with . It all becomes more interesting when we take an expectation though:

This allows us to express the expectation of the logarithm in terms of the moment generating function of . For instance, if is normal, then such a representation will probably be simpler. It is not obvious that it will help with computation, but it does suggest that we can use stuff like concentration bounds for expectations of logarithms.

It’s fun to apply the same idea to the KL.

For simplicity, let . Then, from (2):

The last equation is a different way to express the KL divergence. It’s not particularly useful as is (to my eyes), as the MGF of is a tough cookie to compute. However, with a lower bound to the MGF we could get an upper bound to the KL that is not trivial.