## Tuesday, April 24, 2018

### The KL divergence as a price

As part of the five year anniversary of this blog, I went back and re-read this blog post that re-derives the information equilibrium condition using some general scaling arguments (much in the manner that things are presented in e.g. The Feynman Lectures on Physics). Let me reproduce one of those arguments here (with a bit of mathjax) about the price as a "detector" of information flow:

...

If prices change, the two distributions [of supply and demand] would have to have been unequal. If they come back to the original stable price — or another stable price — the two distributions must have become equal again. That is to say prices represent information about the differences (or changes) in the distributions. Coming back to a stable means information about the differences in one distribution must have flowed (through a communication channel) to the other distribution. We can call one distribution $D$ and the other $S$ for supply and demand. The price is then a function of changes in $D$ and changes in $S$, or

$$p = f(\Delta D, \Delta S)$$

Note that we observe that an increase in S that's bigger than an increase in $D$ generally leads to a falling price, while an increase in D that is bigger than the increase in $S$ generally leads to a rising price. That means we can try

$$p = \frac{\Delta D}{\Delta S}$$

for our initial guess. Instead of a price aggregating information, we have a price detecting the flow of information. Constant prices tell us nothing. Price changes tell us information has flowed (or been lost) between one distribution and the other.

...

I did want to note that we could have made a different choice — specifically the Kullback-Leibler (KL) divergence $D_{KL}(D, S)$ from information theory. For small perturbations $\Delta D$ and $\Delta S$ from an equilibrium $D_{0} = S_{0}$, we find

\begin{align} D_{KL}(D_{0} + \Delta D, S_{0} = D_{0}) & \sim \Delta D\\ D_{KL}(D_{0} = S_{0}, S_{0} + \Delta S) & \sim - \Delta S \end{align}

which reproduces the same properties above (prices go up for increasing demand, ceteris paribus, and down for increasing supply, ceteris paribus). The interesting piece is that the KL divergence is what is used as the "detector" in Generative Adversarial Networks, a class of machine learning algorithms that is formally similar to the information transfer framework — as discussed in that same blog post.