Friday, September 22, 2017

Mutual information and information equilibrium


Natalie Wolchover has a nice article on the information bottleneck and how it might relate to why deep learning works. Originally described in this 2000 paper by Tishby et al, the information bottleneck was a new variational principle that optimized a functional of mutual information as a Lagrange multiplier problem (I'll use their notation):

$$
\mathcal{L}[p(\tilde{x} | x)] = I(\tilde{X} ; X) - \beta I(\tilde{X} ; Y)
$$

The interpretation is that we pass the information the random variable $X$ has about $Y$ (i.e. the mutual information $I(X ; Y)$) through the "bottleneck" $\tilde{X}$.

Now I've been interested in the connection between information equilibrium, economics, and machine learning (e.g. GAN's real data, generated data, and discriminator have a formal similarity to information equilibrium's information source, information destination, and detector — the latter I use as a possible way of understanding of demand, supply, and price on this blog). I'm always on the lookout for connections to information equilibrium. This is a work in progress, but I first thought it might be valuable to understand information equilibrium in terms of mutual information.

The best way to illustrate this is with a Venn diagram:


If we have two random variables $X$ and $Y$, then information equilibrium is the condition that:

$$
H(X) = H(Y)
$$

Without loss of generality, we can identify $X$ as the information source (effectively a sign convention) and say in general:

$$
H(X) \geq H(Y)
$$

We can say mutual information is maximized when $Y = f(X)$. The diagram above represents a "noisy" case where either noise (or another random variable) contributes to $H(Y)$ (i.e. $Y = f(X) + n$). Mutual information cannot be greater than the information in $X$ or $Y$. And if we assert a simple case of information equilibrium (with information transfer index $k = 1$), e.g.:

$$
p_{xy} = p_{x}\delta_{xy} = p_{y}\delta_{xy}
$$

then

$$
\begin{align}
I(X ; Y) & = \sum_{x} \sum_{y} p_{xy} \log \frac{p_{xy}}{p_{x}p_{y}}\\
& = \sum_{x} \sum_{y} p_{x} \delta_{xy} \log \frac{p_{x} \delta_{xy} }{p_{x}p_{y}}\\
& = \sum_{x} \sum_{y} p_{x} \delta_{xy} \log \frac{\delta_{xy} }{p_{y}}\\
& = \sum_{x} p_{x} \log \frac{1}{p_{x}}\\
& = -\sum_{x} p_{x} \log p_{x}\\
& = H(X)
\end{align}
$$

Note that in the above, the information transfer index accounts for the "mismatch" in dimensionality in the Kronecker delta (i.e. a die roll that determines the outcome of a coin flip such that a roll of 1, 2, or 3 yields heads and 4, 5, or 6 yields tails).

Basically, information equilibrium is the case where $H(X)$ and $H(Y)$ overlap, $Y = f(X)$, and mutual information is maximized.

No comments:

Post a Comment

Comments are welcome. Please see the Moderation and comment policy.

Also, try to avoid the use of dollar signs as they interfere with my setup of mathjax. I left it set up that way because I think this is funny for an economics blog. You can use € or £ instead.