Kostis GourgouliasMy website, served with jekyll.
http://kgourgou.me
Rewriting the Kullback-Leibler with an integral transform<p>I recently read this nice integral representation of the logarithm in 1912.05812v1 by Neri Merhan and Igar Sason. Most ideas in this post are from there. The transform is:
<script type="math/tex">\log(x)=\int_{0}^{\infty}\frac{e^{-t}-e^{-tx}}{t}dt.</script></p>
<p>This can be shown by using <script type="math/tex">\log(x)=\int_{1}^{x}1/t\cdot dt</script> along with <script type="math/tex">\frac{1}{x}=\int_{0}^{\infty}e^{-xu}du</script>. It all becomes more interesting when we take an expectation though:</p>
<script type="math/tex; mode=display">\mathbb{E}[\log(X)]=\int_{0}^{\infty}\frac{e^{-t}-\mathbb{E}[e^{-tX}]}{t}dt</script>
<p>This allows us to express the expectation of the logarithm in terms of the moment generating function of <script type="math/tex">X</script>. For instance, if <script type="math/tex">X</script> is normal, then such a representation will probably be simpler. It is not obvious that it will help with computation, but it does suggest that we can use stuff like concentration bounds for expectations of logarithms.</p>
<p>It’s fun to apply the same idea to the KL.</p>
<script type="math/tex; mode=display">R(Q|P)=\int Q(x) \log \frac{Q(x)}{P(x)}dx.</script>
<p>For simplicity, let <script type="math/tex">w(x):=\log\frac{Q(x)}{P(x)}</script>. Then, from (2):</p>
<script type="math/tex; mode=display">\mathbb{E}_Q[\log w]=\int_{0}^{\infty}\frac{e^{-t}-\mathbb{E}_Q[e^{-tw}]}{t}dt.</script>
<p>The last equation is a different way to express the KL divergence. It’s not particularly useful as is (to my eyes), as the MGF of <script type="math/tex">w</script> is a tough cookie to compute. However, with a lower bound to the MGF we could get an upper bound to the KL that is not trivial.</p>
Mon, 20 Jan 2020 00:00:00 +0000
http://kgourgou.me//Rewriting-the-Kullback-Leibler-with-an-integral-transform/
http://kgourgou.me//Rewriting-the-Kullback-Leibler-with-an-integral-transform/The simplest Bayesian optimization example<p>After a really interesting paper discussion session, <a href="https://github.com/kgourgou/baeysian-opt-for-fun">I decided to implement</a> Bayesian-opt. with the <code class="language-plaintext highlighter-rouge">expected-improvement</code> acquisition function. I’ve already fixed a few bugs, but can’t promise it is bug-free. It will probably work as long as you stick to my examples.</p>
Sun, 10 Nov 2019 00:00:00 +0000
http://kgourgou.me//The-simplest-Bayesian-optimization-example/
http://kgourgou.me//The-simplest-Bayesian-optimization-example/Research<p>Just a short post on past and current research. You can find a complete list of my publications below.</p>
<h1 id="publications-and-preprints">Publications and Preprints</h1>
<p><code class="language-plaintext highlighter-rouge">2019</code>
<em>Tuning the semantic consistency of active medical diagnosis: a walk on the semantic simplex</em>, with A. Buchard, A. Navarro, et al. To be presented at the Stanford Symposium “Fronters of AI-assisted care”</p>
<p><code class="language-plaintext highlighter-rouge">2018</code>
<em>Universal Marginalizer for Amortised Inference and Embedding of Generative Models</em>, with R. Walecki, A. Buchard, et al. Submitted to AISTATS. arXiv: 1811.04727</p>
<p><code class="language-plaintext highlighter-rouge">2017</code>
<em>A Universal Marginalizer for Amortized Inference in Generative Models</em>, NeurIPS workshop on Advances in Approximate Bayesian Inference, 2017, with L. Douglas, I. Zarov, et al. arXiv: 1711.00695.</p>
<p><code class="language-plaintext highlighter-rouge">2017</code>
<em>Information criteria for quantifying loss of reversibility in parallelized KMC</em>, with M. Katsoulakis, L. Rey-Bellet. Accepted at the Journal of Computational Physics 328, 438-454.</p>
<p><code class="language-plaintext highlighter-rouge">2017</code>
<em>How biased is your model? Concentration Inequalities, Information and Model Bias</em>, with M. Katsoulakis, L. Rey-Bellet and J. Wang. Submitted for review at the IEEE Transactions on Information Theory.</p>
<p><code class="language-plaintext highlighter-rouge">2016</code>
<em>Information metrics for long-time errors in splitting schemes for stochastic dynamics and parallel Kinetic Monte Carlo</em>, with M. Katsoulakis and L. Rey-Bellet. Accepted at the SIAM Journal on Scientific Computing 38 (6), A3808-A3832.</p>
Wed, 04 Sep 2019 00:00:00 +0000
http://kgourgou.me//Research/
http://kgourgou.me//Research/Bounds on joint probabilities - Part I<p>Here are some notes on bounding joint probability distributions. Enjoy! This was
converted from <script type="math/tex">\LaTeX</script> with pandoc, so typos, missing figures, etc., to be expected.</p>
<p>Consider the binary random variables <script type="math/tex">X_1, \ldots, X_n</script> following the
distribution <script type="math/tex">P</script>. For some collection of values, say,
<script type="math/tex">x_1, \ldots, x_n</script>, we are interested in computing
<script type="math/tex">P(X_1=x_1,\ldots, X_n=x_n)</script>.</p>
<p>There is rich literature on bounding joint probabilities, say, <script type="math/tex">P(X_1,X_2,X_3)</script>, if one has of
knowledge of the marginals, <script type="math/tex">P(X_i),</script> <script type="math/tex">i=1,2,3</script>, <script type="math/tex">P(X_{i},X_j)</script>,
<script type="math/tex">i\neq j</script>, or of the moments of the marginal distributions. Some
examples of such inequalities follow below.</p>
<p>When the bounds only use <script type="math/tex">P(X_i)</script>, we will say that they utilize
<em>first-order</em> information. Similarly, if <script type="math/tex">P(X_i, X_j)</script> are used in the
bounds, they are of second-order, then third-order, etc.</p>
<h2 id="bonferroni-inequalities">Bonferroni inequalities</h2>
<p>We start with a classical result, inspired from the inclusion-enclusion
formula, known as the <em>Bonferroni</em>
inequalities [@galambos1977bonferroni]. The notation <script type="math/tex">X^c</script> corresponds
to the negation of the <script type="math/tex">X</script> variable, i.e., if <script type="math/tex">X=x</script>, <script type="math/tex">X^c=1-x</script> for
<script type="math/tex">x\in \{0,1\}</script>. First, we define:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
S_1&:=\sum_{i}P(X_i^c),\\
S_k&:=\sum_{1\leq i_1< \ldots < i_k\leq n} P(X_{i_1}^c,\ldots, X_{i_k}^c),\\\end{aligned} %]]></script>
<p>Then, for every odd <script type="math/tex">k</script> in <script type="math/tex">\{1,\ldots, n\}</script>:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
P(X_1,\ldots, X_n)&\geq 1 -\sum_{j=1}^{k} (-1)^{j-1}S_j.
\end{aligned} %]]></script>
<p>We can also get an upper bound for every even <script type="math/tex">k</script>:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
P(X_1,\ldots, X_n)&\leq 1 -\sum_{j=1}^{k} (-1)^{j-1}S_j.\end{aligned} %]]></script>
<p>By the inclusion-exclusion formula, the inequalities become equalities
when <script type="math/tex">k=n</script>. Thus, the inequalities can be made sharper by including more
marginals. However, the upper (and lower) bounds don’t necessarily
become sharper monotonically as <script type="math/tex">k</script> increases; see work
by [@schwager1984bonferroni]. Also, although the inequalities are valid
for all <script type="math/tex">k</script>, they can be uninformative, that is, smaller than zero or
greater than one.</p>
<h2 id="frechet-bounds">Frechet bounds</h2>
<p>An alternative upper bound for the joint is the Frechet-type bound:</p>
<script type="math/tex; mode=display">\begin{aligned}
\label{eq:frechet}
P(X_1,\ldots, X_n)\leq \min_{i}P(X_i).
\end{aligned}</script>
<p>This can be
simply derived by observing that, for any <script type="math/tex">i</script>,</p>
<script type="math/tex; mode=display">P(X_1,\ldots, X_n)=P(X_1,\ldots,X_{i-1},X_{i+1},\ldots, X_n|X_i)P(x_i)\leq P(X_i)</script>
<p>and then picking the tightest bound. We can also include terms like
<script type="math/tex">P(X_i,X_j)</script> to the upper bound, if known, to get an even tighter bound.
As an upper bound, this may be more suitable than the Bonferroni bound;
it is always a valid probability and can be tight when dealing with rare
events. Like the Bonferroni bound, this is distribution-independent.</p>
<p>Now, if all we know about the <script type="math/tex">X_i</script> are the <script type="math/tex">P(X_i)</script>, then the tightest
bounds[^1] we can get are:</p>
<script type="math/tex; mode=display">\begin{aligned}
\label{eq:frechet-first}
\max\{0,1-\sum_i(1-P(X_i))\} \leq P(X_1,\ldots, X_n)\leq \min_{i} P(X_i).\end{aligned}</script>
<p>The lower bound comes from the first Bonferroni lower bound. However, it
can be further sharpened by adding second-order information, that is,
some of the <script type="math/tex">P(X_i^c,
X_j^c)</script>, as discussed by [@hochbergsome]. One example of such a
sharpening is known as the <em>Kounias</em> inequality:</p>
<script type="math/tex; mode=display">\begin{aligned}
\label{eq:kounias}
1-\sum_{i}(1-P(X_i))+\max_j \sum_{i\neq j}P(X_i^c,X_j^c)\leq P(X_1,\ldots, X_n).\end{aligned}</script>
<p>This can be further sharpened by replacing the max term in by</p>
<script type="math/tex; mode=display">\sum_{i,j:(i,j)\in T} P(X_i^c, X_j^c),</script>
<p>where $T$ is the maximal
spanning tree, i.e., the tree that maximizes the sum of the
probabilities[^2]. The new bound then is:</p>
<script type="math/tex; mode=display">\begin{aligned}
\label{eq:wolfe}
1-\sum_{i}(1-P(X_i))+\sum_{i,j:(i,j)\in T} P(X_i^c, X_j^c)\leq P(X_1,\ldots, X_n).\end{aligned}</script>
<p>This bound was first derived in work by [@hunter1976upper] and has been
subsequently generalized to work with more events via the construction
of multi-trees; see work by [@bukszar2001upper].</p>
<h2 id="multiplicative-bounds">Multiplicative bounds</h2>
<p>In some cases, multiplicative bounds, that is,</p>
<script type="math/tex; mode=display">P(X_1,\ldots X_n)\geq P(X_1)\ldots P(X_n),</script>
<p>may also be applicable when the random variables show positive association; see work
by [@esary1967association] for details on that. Those bounds are easier
to apply and often tighter but may not always be correct as they are
distribution dependent. Especially for Bernoulli variables, Theorem 4.
in [@esary1967association] shows that association of the
<script type="math/tex">X_1,\ldots, X_n</script> implies only that</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
P(X_1=1,\ldots, X_n=1)&\geq P(X_1=1)\ldots P(X_n=1),\\
P(X_1=0,\ldots, X_n=0)&\geq P(X_1=0)\ldots P(X_n=0).\end{aligned} %]]></script>
Fri, 29 Jun 2018 00:00:00 +0000
http://kgourgou.me//Bounds-on-joint-probabilities/
http://kgourgou.me//Bounds-on-joint-probabilities/New manuscript: how biased is your model?<p>A few days ago myself along with co-authors Prof. Katsoulakis, Prof. Rey-Bellet, and PhD candidate Jie Wang, pushed on arXiv our latest manuscript titled: How biased is your model? Concentration Inequalities, Information and Model Bias.</p>
<p><strong>Abstract</strong>:
We derive tight and computable bounds on the bias of statistical estimators, or more generally of quantities of interest, when evaluated on a baseline model P rather than on the typically unknown true model Q. Our proposed method combines the scalable information inequality derived by P. Dupuis, K.Chowdhary, the authors and their collaborators together with classical concentration inequalities (such as Bennett’s and Hoeffding-Azuma inequalities). Our bounds are expressed in terms of the Kullback-Leibler divergence R(Q||P) of model Q with respect to P and the moment generating function for the statistical estimator under P. Furthermore, concentration inequalities, i.e. bounds on moment generating functions, provide tight and computationally inexpensive model bias bounds for quantities of interest. Finally, they allow us to derive rigorous confidence bands for statistical estimators that account for model bias and are valid for an arbitrary amount of data.</p>
<p>You can find the full manuscript <a href="https://arxiv.org/abs/1706.10260">here</a>.</p>
Sat, 08 Jul 2017 00:00:00 +0000
http://kgourgou.me//New-paper-on-arxiv/
http://kgourgou.me//New-paper-on-arxiv/Distinguished Thesis Award<p>The Department of Mathematics and Statistics at UMass Amherst
honored my research accomplishments in predictive modeling, data science
and ML with a distinguished thesis award!</p>
Thu, 13 Apr 2017 00:00:00 +0000
http://kgourgou.me//Distinguished-Thesis-Award/
http://kgourgou.me//Distinguished-Thesis-Award/PhD defense is scheduled<p>My PhD defense is scheduled!</p>
<p>Date: 24 of March, 2017.
Time: 10:00 AM.
Place: LGRT 1634.</p>
<p>The title of the thesis is “Information Metrics for Predictive Modeling and
Machine Learning”.</p>
<p>Feel free to join if you are curious!</p>
Thu, 09 Feb 2017 00:00:00 +0000
http://kgourgou.me//PhD-defense-scheduled/
http://kgourgou.me//PhD-defense-scheduled/Graduate Student Leadership Award<p>For my contributions to the data science community at UMass Amherst
and the Five Colleges via founding <a href="http://gridclub.io">GRiD</a>, the Department of
Mathematics and Statistics honored me with the Graduate Student Leadership Award!</p>
<p>You can read more about it in the <a href="http://www.math.umass.edu/sites/www.math.umass.edu/files/newsletters/umass_math_stat_newsletter_2016.pdf">Departmental Newsletter</a>.</p>
Sun, 13 Dec 2015 00:00:00 +0000
http://kgourgou.me//award-leadership/
http://kgourgou.me//award-leadership/