Tuesday, December 15, 2015

Information theory 101 (information equilibrium edition)

With this blog ostensibly dedicated to the purpose of using information theory to understand economics, it seems only natural to have a short introduction to information theory itself. Or at least as much as is necessary to understand this blog (which isn't much). Nearly everything you'd need is contained in Claude Shannon's 1948 paper:

A mathematical theory of communication [pdf]

In that paper Shannon defined what "communication" was, mathematically speaking. It comes down to reproducing a string of symbols (a message) selected from a distribution of messages at one point (the transmitter, Tx) at another point (the receiver, Rx). Connecting them is what is called a "channel". The famous picture Shannon drew in that paper is here:

A communication channel

Information theory is sometimes made to seem to spring fully formed from Shannon's head like Athena, but it has some precursors in Hartley and Nyquist (both even worked at Bell labs, like Shannon), and Hartley's definition of information (which coincides with Shannon's when all symbols are considered equally probable) is actually the one we resort to most of the time on this blog.

One improvement Shannon made was to come up with a definition of information that could handle symbols with different probabilities. In their book, Shannon and Weaver are careful to note that we are only talking about the symbols when we talk about information, not their meaning. For example I could successfully transmit the series of symbols making up the word


but the meaning could be different to a British English (10⁹ or 10¹²) and an American English (10⁹) speaker. It would be different to a French speaker (10¹²). The information would also be slightly different since letter frequencies (i.e. their probabilities of occurring in a message) differ slightly among the languages/dialects.

Shannon came up with the definition of information by looking at its properties:

  • Something that always happens carries no information (a light that is always on isn't communicating anything -- it has to have at least a tiny chance of going out)
  • The information received from transmitting two independent symbols is the sum of the information from each symbol
  • There is no such thing as negative information

You can see these are intimately connected to probability. Our information function I(p) -- with p a probability -- therefor has to have the mathematical properties

  • I(p = 1) = 0
  • I(p₁ p₂) = I(p₁) + I(p₂)
  • I(p) ≥ 0

The second one follows from the probability of two independent events being the product of the two probabilities. It's also the one that dictates that I(p) must be related to the logarithm. Since all probabilities have to obey 1 ≥ p ≥ 0, we have

I(p) = log(1/p)

This is the information entropy of an instance of a random variable with probability p. The Shannon (information) entropy of a random event is the expected value of it's information entropy

H(p) = E[I(p)] = Σₓ pₓ I(pₓ) = - Σₓ pₓ log(pₓ)

where the sum is taken over all the states pₓ (where Σₓ pₓ = 1). Also note that p log(p) = 0 for p = 0. There's a bit of abuse of notation in writing H(p). More accurately you could write this in terms of a random variable X with probability function P(X):

H(X) = E[I(X)] = E[- log(P(X))]

This form makes it clearer that X is just a dummy variable. The information entropy is actually a property of the distribution the symbols are drawn from P:

H(•) = E[I(•)] = E[- log(P(•))]

In economics, this becomes the critical point; we say that the information entropy of the distribution P₁ of demand (d) is equal to the information entropy of the distribution P₂ of supply (s):

E[I(d)] = E[I(s)]

E[- log(P₁(d))] = E[- log(P₂(s))]

E[- log(P₁(•))] = E[- log(P₂(•))]

and call it information equilibrium (for a single transaction here). The market can be seen as a system for equalizing the distributions of supply and demand (so that everywhere there is some demand, there is some supply ... at least in an ideal market).

Also in economics (at least on this blog), we frequently take P to be a uniform distribution (over x = 1..σ symbols) so that:

E[I(p)] = - Σₓ pₓ log(pₓ) = - Σₓ (1/σ) log(1/σ) = - (σ/σ) log(1/σ) = log σ

The information in n such events (a string of n symbols from an alphabet of size σ with uniformly distributed symbols) is just

n E[I(p)] = n log σ

Or another way using random variable form for multiple transactions with uniform distributions:

E[- log(P₁(•)P₁(•)P₁(•)P₁(•) ... )] = E[- log(P₂(•)P₂(•)P₂(•)P₂(•) ...)]

n₁ E[- log(P₁(•))] = n₂ E[- log(P₂(•))]

n₁ E[- log(1/σ₁)] = n₂ E[- log(1/σ₂)]

n₁ log(σ₁) = n₂ log(σ₂)

Taking n₁, n₂ >> 1 while defining n₁ ≡ D/dD (in another abuse of notation where dD is an infinitesimal unit of demand) and n₂ ≡ S/dS, we can write

D/dD log(σ₁) = S/dS log(σ₂)


dD/dS = k D/S

where k ≡ log(σ₁)/log(σ₂) is the information transfer index. That's the information equilibrium condition.


PS  In another abuse of notation, on this blog I frequently write:

I(D) = I(S)

Where I should more technically write (in the notation above)

E[n₁ I(P(d))] = E[n₂ I(P₂(s))]

where d and s are random variables with distributions P₁ and P₂. Also note that these E's aren't economists' E operators, but rather ordinary expected values.


  1. Awesome Jason, thanks for doing this! It is hard to follow you when you go fast and technical, so it is nice to have a slower explanation at times.

  2. Jason, in your final equation, should it be P1 on the left? You have P2.

    1. Hmmm, and yet it didn't change. That plus the smiley face and the word "probably" leaves me feeling like Bart Simpson.

    2. And I fixed it.

      I wasn't at computer I could fix it with at the time ...

    3. Seeing the update on your latest post (proposed summer paper presentation) I just came back to check this again... Lol.