My website, served with jekyll.
kgourgou.me
Bounds on joint probabilities - Part I<p>Here are some notes on bounding joint probability distributions. Enjoy! This was
converted from <script type="math/tex">\LaTeX</script> with pandoc, so typos, missing figures, etc., to be expected.</p>
<p>Consider the binary random variables <script type="math/tex">X_1, \ldots, X_n</script> following the
distribution <script type="math/tex">P</script>. For some collection of values, say,
<script type="math/tex">x_1, \ldots, x_n</script>, we are interested in computing
<script type="math/tex">P(X_1=x_1,\ldots, X_n=x_n)</script>.</p>
<p>There is rich literature on bounding joint probabilities, say, <script type="math/tex">P(X_1,X_2,X_3)</script>, if one has of
knowledge of the marginals, <script type="math/tex">P(X_i),</script> <script type="math/tex">i=1,2,3</script>, <script type="math/tex">P(X_{i},X_j)</script>,
<script type="math/tex">i\neq j</script>, or of the moments of the marginal distributions. Some
examples of such inequalities follow below.</p>
<p>When the bounds only use <script type="math/tex">P(X_i)</script>, we will say that they utilize
<em>first-order</em> information. Similarly, if <script type="math/tex">P(X_i, X_j)</script> are used in the
bounds, they are of second-order, then third-order, etc.</p>
<h2 id="bonferroni-inequalities">Bonferroni inequalities</h2>
<p>We start with a classical result, inspired from the inclusion-enclusion
formula, known as the <em>Bonferroni</em>
inequalities [@galambos1977bonferroni]. The notation <script type="math/tex">X^c</script> corresponds
to the negation of the <script type="math/tex">X</script> variable, i.e., if <script type="math/tex">X=x</script>, <script type="math/tex">X^c=1-x</script> for
<script type="math/tex">x\in \{0,1\}</script>. First, we define:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
S_1&:=\sum_{i}P(X_i^c),\\
S_k&:=\sum_{1\leq i_1< \ldots < i_k\leq n} P(X_{i_1}^c,\ldots, X_{i_k}^c),\\\end{aligned} %]]></script>
<p>Then, for every odd <script type="math/tex">k</script> in <script type="math/tex">\{1,\ldots, n\}</script>:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
P(X_1,\ldots, X_n)&\geq 1 -\sum_{j=1}^{k} (-1)^{j-1}S_j.
\end{aligned} %]]></script>
<p>We can also get an upper bound for every even <script type="math/tex">k</script>:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
P(X_1,\ldots, X_n)&\leq 1 -\sum_{j=1}^{k} (-1)^{j-1}S_j.\end{aligned} %]]></script>
<p>By the inclusion-exclusion formula, the inequalities become equalities
when <script type="math/tex">k=n</script>. Thus, the inequalities can be made sharper by including more
marginals. However, the upper (and lower) bounds don’t necessarily
become sharper monotonically as <script type="math/tex">k</script> increases; see work
by [@schwager1984bonferroni]. Also, although the inequalities are valid
for all <script type="math/tex">k</script>, they can be uninformative, that is, smaller than zero or
greater than one.</p>
<h2 id="frechet-bounds">Frechet bounds</h2>
<p>An alternative upper bound for the joint is the Frechet-type bound:</p>
<script type="math/tex; mode=display">\begin{aligned}
\label{eq:frechet}
P(X_1,\ldots, X_n)\leq \min_{i}P(X_i).
\end{aligned}</script>
<p>This can be
simply derived by observing that, for any <script type="math/tex">i</script>,</p>
<script type="math/tex; mode=display">P(X_1,\ldots, X_n)=P(X_1,\ldots,X_{i-1},X_{i+1},\ldots, X_n|X_i)P(x_i)\leq P(X_i)</script>
<p>and then picking the tightest bound. We can also include terms like
<script type="math/tex">P(X_i,X_j)</script> to the upper bound, if known, to get an even tighter bound.
As an upper bound, this may be more suitable than the Bonferroni bound;
it is always a valid probability and can be tight when dealing with rare
events. Like the Bonferroni bound, this is distribution-independent.</p>
<p>Now, if all we know about the <script type="math/tex">X_i</script> are the <script type="math/tex">P(X_i)</script>, then the tightest
bounds[^1] we can get are:</p>
<script type="math/tex; mode=display">\begin{aligned}
\label{eq:frechet-first}
\max\{0,1-\sum_i(1-P(X_i))\} \leq P(X_1,\ldots, X_n)\leq \min_{i} P(X_i).\end{aligned}</script>
<p>The lower bound comes from the first Bonferroni lower bound. However, it
can be further sharpened by adding second-order information, that is,
some of the <script type="math/tex">P(X_i^c,
X_j^c)</script>, as discussed by [@hochbergsome]. One example of such a
sharpening is known as the <em>Kounias</em> inequality:</p>
<script type="math/tex; mode=display">\begin{aligned}
\label{eq:kounias}
1-\sum_{i}(1-P(X_i))+\max_j \sum_{i\neq j}P(X_i^c,X_j^c)\leq P(X_1,\ldots, X_n).\end{aligned}</script>
<p>This can be further sharpened by replacing the max term in by</p>
<script type="math/tex; mode=display">\sum_{i,j:(i,j)\in T} P(X_i^c, X_j^c),</script>
<p>where $T$ is the maximal
spanning tree, i.e., the tree that maximizes the sum of the
probabilities[^2]. The new bound then is:</p>
<script type="math/tex; mode=display">\begin{aligned}
\label{eq:wolfe}
1-\sum_{i}(1-P(X_i))+\sum_{i,j:(i,j)\in T} P(X_i^c, X_j^c)\leq P(X_1,\ldots, X_n).\end{aligned}</script>
<p>This bound was first derived in work by [@hunter1976upper] and has been
subsequently generalized to work with more events via the construction
of multi-trees; see work by [@bukszar2001upper].</p>
<h2 id="multiplicative-bounds">Multiplicative bounds</h2>
<p>In some cases, multiplicative bounds, that is,</p>
<script type="math/tex; mode=display">P(X_1,\ldots X_n)\geq P(X_1)\ldots P(X_n),</script>
<p>may also be applicable when the random variables show positive association; see work
by [@esary1967association] for details on that. Those bounds are easier
to apply and often tighter but may not always be correct as they are
distribution dependent. Especially for Bernoulli variables, Theorem 4.
in [@esary1967association] shows that association of the
<script type="math/tex">X_1,\ldots, X_n</script> implies only that</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
P(X_1=1,\ldots, X_n=1)&\geq P(X_1=1)\ldots P(X_n=1),\\
P(X_1=0,\ldots, X_n=0)&\geq P(X_1=0)\ldots P(X_n=0).\end{aligned} %]]></script>
Fri, 29 Jun 2018 00:00:00 +0000
kgourgou.me//Bounds-on-joint-probabilities/
kgourgou.me//Bounds-on-joint-probabilities/Website updates and RSS feed<p>Hello all!</p>
<p>I updated my website to put the blog front and center. Will soon be adding
some more posts regarding what I have been up to (and some cool papers I’m currently reading).</p>
<p>Also, will move the RSS feed to some better place but for now you can access it through <a href="http://kgourgou.me/feed.xml">here</a></p>
Sat, 16 Jun 2018 00:00:00 +0000
kgourgou.me//Website-updates/
kgourgou.me//Website-updates/New manuscript: how biased is your model?<p>A few days ago myself along with co-authors Prof. Katsoulakis, Prof. Rey-Bellet, and PhD candidate Jie Wang, pushed on arXiv our latest manuscript titled: How biased is your model? Concentration Inequalities, Information and Model Bias.</p>
<p><strong>Abstract</strong>:
We derive tight and computable bounds on the bias of statistical estimators, or more generally of quantities of interest, when evaluated on a baseline model P rather than on the typically unknown true model Q. Our proposed method combines the scalable information inequality derived by P. Dupuis, K.Chowdhary, the authors and their collaborators together with classical concentration inequalities (such as Bennett’s and Hoeffding-Azuma inequalities). Our bounds are expressed in terms of the Kullback-Leibler divergence R(Q||P) of model Q with respect to P and the moment generating function for the statistical estimator under P. Furthermore, concentration inequalities, i.e. bounds on moment generating functions, provide tight and computationally inexpensive model bias bounds for quantities of interest. Finally, they allow us to derive rigorous confidence bands for statistical estimators that account for model bias and are valid for an arbitrary amount of data.</p>
<p>You can find the full manuscript <a href="https://arxiv.org/abs/1706.10260">here</a>.</p>
Sat, 08 Jul 2017 00:00:00 +0000
kgourgou.me//New-paper-on-arxiv/
kgourgou.me//New-paper-on-arxiv/Distinguished Thesis Award<p>The Department of Mathematics and Statistics at UMass Amherst
honored my research accomplishments in predictive modeling, data science
and ML with a distinguished thesis award!</p>
Thu, 13 Apr 2017 00:00:00 +0000
kgourgou.me//Distinguished-Thesis-Award/
kgourgou.me//Distinguished-Thesis-Award/PhD defense is scheduled<p>My PhD defense is scheduled!</p>
<p>Date: 24 of March, 2017.
Time: 10:00 AM.
Place: LGRT 1634.</p>
<p>The title of the thesis is “Information Metrics for Predictive Modeling and
Machine Learning”.</p>
<p>Feel free to join if you are curious!</p>
Thu, 09 Feb 2017 00:00:00 +0000
kgourgou.me//PhD-defense-scheduled/
kgourgou.me//PhD-defense-scheduled/Graduate Student Leadership Award<p>For my contributions to the data science community at UMass Amherst
and the Five Colleges via founding <a href="http://gridclub.io">GRiD</a>, the Department of
Mathematics and Statistics honored me with the Graduate Student Leadership Award!</p>
<p>You can read more about it in the <a href="http://www.math.umass.edu/sites/www.math.umass.edu/files/newsletters/umass_math_stat_newsletter_2016.pdf">Departmental Newsletter</a>.</p>
Sun, 13 Dec 2015 00:00:00 +0000
kgourgou.me//award-leadership/
kgourgou.me//award-leadership/