Thursday, May 1, 2014

Expectations destroy information

I have a beef with the use of expectations in economics. They seem to be so powerful as to make them useless as an explanation; they also seem to lack empirical impact where they originally were designed to have an impact.

Here, I am going to describe the role expectations play in the economy using information theory -- it's not diving too deep, mostly just an argument using Shannon information and measuring information loss with the KL divergence.

The KL divergence measures the extra message length for a given amount of data that must be sent if you have a code optimal for the wrong distribution relative to the true distribution. In general, it represents the information loss (measured in nats or bits depending on which logarithm you use) by being wrong about a distribution relative to being right.

Relating this to information transfer economics, the KL divergence represents the extra information that must be sent in the market (lower information efficiency) given the wrong expected economic state distribution relative to the true future economic state distribution. I previously hypothesized that lower information efficiency was related to recessions.

In this post, I want to show that economic "expectations" generally destroy information unless they are right. I put together some simulations illustrating this point using a relatively simple model. Imagine there are 10 states the economy can be in (think a hidden semi-Markov model). There is a probability distribution that gives the chance it is in one of the state in the next time period, for example, if all states are equally likely, this distribution would be

0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

i.e. a 10% chance it is in any of the ten states.

If the economy was going to be in state #2 with 100% probability, then it looks like this:

0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

We will compare two of these distributions, call them the actual (A) and the expected (E) distributions with the KL divergence. The KL divergence of the two distributions above is

0.0 log(0.0/0.1) + 1.0 log(1.0/0.1) + 0.0 log(0.0/0.1) + ... + 0.0 log(0.0/0.1)

= 0 + 2.303 + 0 + 0 + ... + 0

= 2.303

That is, there is about 2.3 nats (3.3 bits) more information in the second sequence relative to the first. (Which makes sense: specifying a number from 1 to 10 -- in our case state #2 --  requires 3.3 bits.)

In the following, I actually show the negative of the KL divergence (partially because I accidentally did it that way and am too lazy to change it, and partially because it emphasizes that we are looking at information loss).

In the simulations below, I randomly generated 10,000 probability distributions on the 10 states and looked at the KL divergence. I did three cases: one, the expected distribution was a small perturbation from the actual distribution (E = A + δA), two, the expected distribution was the first distribution given above .. all states equally likely aka the least informative prior (E = 0.1), and three, the expected distribution was another randomly generated distribution (E and A are uncorrelated). Here are the results:


We can see that if there aren't any big changes in the distribution (the small perturbation, blue), the information loss is minimal. If there are big changes (the new distribution is uncorrelated, red) then information loss is not only large, but has a long tail. Interestingly, the least informative prior (gray) is not only in the middle of these, but doesn't have much of a tail (there is a sharp cut-off). You'd imagine the blue histogram showing the typical case where past performance gives an indication of future performance, with only small deviations. The red histogram shows what it's like if you're wrong about the future.

What does this mean? Well, if expectations are accurate (people can accurately predict the future), then there is a limited amount of information loss. It may be reductio ad absurdum, but I'd say this pretty much proves that there is information loss in the market mechanism. Who can accurately predict the future? I'll take it a step further and say that expectations are the cause of that loss of information, and information loss appears to be a primary driver of recessions.

When people are especially bad at predicting the future, you not only get massive information loss on average, but there is a long tail of even greater loss (red histogram).

The average case for information loss, falling between the two extremes, is pretty well described by the least informative prior. This is potentially a reason why the information transfer model, which assumes this as a starting point, can do a good job with trends. This least informative prior is also effectively the efficient markets hypothesis in the real-world sense. There are trends and momentum, but price movements are unpredictable. They are not completely unpredictable; the information loss (red histogram) would be large. This least informative prior is also the assumption that expectations do not affect the long run trends.

Ah, but, you say, if you kind of know what you are talking about (you're able to predict the future), then you can do better than the least informative prior. That's why I have this next graph: it shows that the ability to do better than the least informative prior is very sensitive to how wrong you are -- you only need to be a little bit wrong for the tail risk to be a serious negative impact on your average performance. I plot the histograms for a 10% perturbation (used in the graph above) up to a 40% perturbation (in the last one, I show the 100% perturbation from the graph above in red):


The takeaway is that you basically always have to be right otherwise the potential losses from being wrong will add up and cause you to lose more information than the least informative prior over time.

Ah, but, you say, wouldn't an expectation based on the information transfer model give you at least some capability to predict the future -- contradicting your assertion that people are bad at predicting the future? Nope. The information transfer model is built on the least informative prior -- you are assuming maximum ignorance about the future, that is to say, you are assuming as much ignorance about the future as I am claiming people are already in possession of! That is to say, maximum ignorance.

14 comments:

  1. Jason, you're back!... I will dig in later. In the meantime I had some of my own interchanges about expectations recently, which I've summarized here:

    http://pragcap.com/forums/topic/expectations#post-64397

    ReplyDelete
    Replies
    1. In particular, I'd like to point out Marcus Nunes' article (and my question):
      http://thefaintofheart.wordpress.com/2014/04/26/how-to-make-a-great-stagnation-come-true/#comment-13798
      His answer is right underneath. Plus I got David Beckworth to provide an answer too:
      http://macromarketmusings.blogspot.com/2014/04/the-cure-for-neo-fisherism-history.html?showComment=1398733775700#c4946143524646226304

      Do you have any response to either Marcus' or David's answers?

      (BTW, I love the fact that you're examining the concept of expectations on your blog ... it's one of the most "mysterious" things that the MMists talk about on a regular basis, for me anyway. I don't know if you followed the "Neo-Fisherite" debate, but basically Sumner (and Ryan Avent) admitted that expectations could make the Neo-Fisherites correct, if they were "strong" enough). Ryan mentioned it in his review of Noah Smith's Neo-Fisherite article, and Sumner admitted this when I specifically asked him, although he qualified his response saying that people would not have those expectations now (I asked about the expectations of world of Neo-Fisherites). I also got Rowe to admit that his concept of the aggregate commercial banks being capable of creating an excess supply of money could be undone if people did not expect that excess supply to be permanent. My use of the word "admit" here does not mean "admit to the reality of" but rather "admit that the way you view expectations logically leads to." What they're "admitting to" may not have anything to do with reality! :D

      Delete
    2. I think Beckworth is right in his direct response to your comment, but the other quote of Beckworth used by Nunes ...

      "One of the defining features of U.S. monetary policy over the past five years has been its incredibly ad hoc nature. Over this time, the FOMC has conducted monetary policy with a spate of make-it-up-as-we-go-along programs (QE1, QE2, Operation Twist, QE3, and the Evans Rule) that it hoped would spur a robust recovery. These programs did get progressively better as they became more state dependent, but they were often implemented and ended in a haphazard fashion. This stop-go approach to monetary policy was politically costly and prevented the Fed from fully utilizing its ability to manage expectations of future nominal growth."

      ... it seems to be a no true Scotsman argument. Sure the Fed did QE and it didn't work, but that's not really using monetary policy.

      That's why I remain skeptical of the monetarist take.

      I do think stronger expectations could have coincided with the rounds of QE and caused a blip in economic growth (actually, that seems to be what happens), but overall the reserves in the rounds of QE go towards pushing down short term interest rates -- which don't seem to have a strong effect on the economy (in the information transfer model, short term rates actually have no effect on the economy; they're more of an indicator than a causal factor).

      Delete
  2. Jason, when you calculate the KL divergence of two distributions above (call then D1 and D2) and the corresponding sets of 10 probabilities for each are say

    D1: {P1(1), P1(2), .. P1(10)} = {0.1 ,0.1, ... 0.1}

    and

    D2: {P2(1), P2(2), ... P2(10)} = {0, 1, 0, 0, ... 0}

    Then the KL divergence you calculate follows the following formula?

    KL divergence = P2(1)*log(P2(1)/P1(1)) + P2(2)*log(P2(2)/P1(2)) + ... + P2(10)*log(P2(10)/P1(10))

    I'm just trying to match up D1 and D2 with the numbers you plugged in there.

    Do you say that is the KL divergence of D1 wrt D2?

    Also, I guess I haven't thought about this before, but epsilon*log(epsilon) goes to 0 as epsilon goes to 0? L'Hopital's rule I guess?

    ReplyDelete
    Replies
    1. Yes, that's the KL divergence for a discrete distribution. It has terms that look like P log P/Q which is the divergence of Q from P (it's not symmetric -- the divergence of P from Q is different -- so it's not a true distance metric). With your symbols that would be the divergence of D1 from D2.
      http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Definition

      And yep, that's the limit:
      http://en.wikipedia.org/wiki/Entropy_(information_theory)#Definition

      Delete
  3. I realized I wanted to like the phrase:

    "information loss appears to be a primary driver of recessions"

    to this post:

    http://informationtransfereconomics.blogspot.com/2014/03/modeling-macroeconomic-fluctuations.html

    But I already linked to it twice in the post, so that is probably good enough.

    ReplyDelete
  4. "It may be reductio ad absurdum, but I'd say this pretty much proves that there is information loss in the market mechanism."

    Can you expand on this a bit? ... in terms of your example here, this even applies with the "maximally ignorant" expectation of you distribution A above (all 10 states assumed to have equal probability), correct? Hmmm... I don't know, where does the "market mechanism" come in here exactly? Thanks.

    ReplyDelete
    Replies
    1. By "market mechanism", I'm referring to the functioning of supply and demand. The information transfer model is at its root just a model of supply and demand:

      http://informationtransfereconomics.blogspot.com/2013/04/supply-and-demand-from-information.html

      And yes, there would be information loss even in the case of maximum ignorance. What I'm trying to say is that unless people have excellent foresight (which I don't believe), then the information loss is even greater if you take anything other than the maximally ignorant position.

      Additionally, the long tails on the "expectations" lead to greater losses over time even if the mean loss is equal to the maximally ignorant position. In some sense the maximally ignorant position has a maximum information loss (no "black swans").

      Also, I'm not saying this maximally ignorant position is "best" -- it's just a good average to form a theory around. Humans will have expectations; there's no stopping them. These expectations will result in fluctuations around the theory -- and from what I'm trying to say here, these are negative fluctuations, like in Friedman's plucking model.

      Delete
  5. Jason, do you think it's possible to translate the concepts you introduce here into a Nick Rowe (or David Beckworth) style allegorical story?

    I'm having a hard time imagining what the 10 market states might correspond to. I realize this is a super simplified example, and maybe there is not good correlation. A discrete distribution is probably not the best either for that purpose.

    But say we had a very simple story... say interest rates could only take on discrete values, and only a finite number were possible. Then say... maybe some other variable... I don't know ... the "discount rate" (which I only learned about yesterday)... so say 5 interest rates and two possible discount rates. There's 10 possible states right there. Does that capture (conceptually) how one might go about relating your discrete states to states of the economy?

    ReplyDelete
    Replies
    1. The example I was using, though I realize not very explicitly, was the Markov model mentioned e.g. here. The economy has a few "equilibrium" states: high growth, low growth and recession. That's 3 states. I used 10 states.

      However, the more realistic model for interest rates would be to have continuous interest rates that have a continuous probability distribution around some expected rate. The results are basically the same, it just involves more math. Two Gaussian distributions with different means/variances can have a large KL divergence as well. It's just harder to do the Monte Carlo over all possible Gaussian distributions vs distributions over 10 discrete states.

      I started working on a whack-a-mole analogy (10 states would be the 10 moles), where you'd be "betting" on whacking a mole, with lower probability moles having a proportionally higher return. But it seemed to distract from the main thesis of the post which is not the model itself, but the fact that getting an expected probability distribution wrong leads to information loss while an "assume ignorance" approach works better on average unless the economy is really predictable.

      I will try to do a more allegorical post in the future.

      Delete
  6. Jason, you feel like a challenge? I pointed your blog out to Mark A. Sadowski once. He seemed to already be aware of it and furthermore to have a generally favorable view of it (he seemed to regard you as being unlike some other physicists that dabbled in econ... he meant in a good way). However, when I asked you about your view of expectations I thought he'd have a problem with that, which was true:

    http://www.themoneyillusion.com/?p=26552&cpage=2#comment-328760

    I'm not at the level where I can understand what either you or Mark are talking about most of the time, but he seems to be pretty highly regarded as an empiricist. He regards himself as an empiricist, and although he won't label himself a Market Monetarist, he does spend a lot of time defending their views and I think he generally agrees with them. Did you happen to see the time that in a debate with Steve Randy Waldman (at interfluidity.com) Mark essentially "won" the debate, and Steve ended up crossing out at least one whole post (and maybe one other) in it's entirety and putting red "BULLSHIIT" watermarks across all his own plots, with a brief explanation that he did so because Mark had convinced him he was wrong? That was amusing. Anyway, Mark's defense of MMism apparently includes their take on expectations (which is central to the MM story from what I can tell). I'd LOVE to see you guys debate the expectations issue at some point. I'd sit back with my bowl of popcorn and watch... trying to learn as much as possible (though I'm sure a lot would go over my head). Do you feel you have the data to back up your claims in such a match up with Mark? :D

    ReplyDelete
    Replies
    1. For that matter, I'd love to see you debate any of the anti-Neo-Fisherites regarding your post on that subject. Not that I love conflict, but I'm sure they'll have an intelligent carefully considered response, and I'd love to know what that is, and what your counter-response is, etc. My level of understanding is such that whoever I read last convinces me of their viewpoint... which is frustrating for me. Sometimes seeing the proponents of each position directly interact gives me a little bit more information. Do you suppose that any of the anti-Neo-Fisherites (Krugman, DeLong, ... any of the MMists... and probably some of the post-Keynesians, like Nick Edmonds (who said the Neo-Fisherite story sounded "crazy") or JKH) understand the fundamentals of information transfer economics? Or could at least interpret what you are saying in a framework more familiar to them? (Rowe often talks of "reverse engineering" an alien econ concept into terms he can understand).

      Delete
    2. Per the Sadowski comment: I didn't intend to say expectations didn't affect the economy with this post -- I actually proposed the idea that expectations were a primary cause of recessions, and in the short run expectations can dominate the path of the economy. The point I was trying to make is that expectations (even positive ones of e.g. high growth) tend to have a negative effect on the economy (they represent a loss of information). That would put me at odds where e.g. Sumner, Beckworth, Nunes, etc believe that the Fed could get away with doing less QE/monetary expansion if they communicated better. However, I still think if the Fed communicates poorly, the initial reaction could be well different from the final result (otherwise, why would some Fed announcements be greeted with initial enthusiasm that is eventually reversed).

      Basically, I'm one of the people of the concrete steppes, per Nick Rowe's coined phrase.

      I would love to debate ideas, and thank you for your enthusiasm.

      Delete
    3. Also, I try to put these ideas in terms mainstream economists would understand -- and actually, my models don't seem to differ too strongly from Sumner's arguments. Here's a good example:

      http://informationtransfereconomics.blogspot.com/2014/01/strange-new-monetary-worlds.html

      In particular, it seems that the information transfer model has a monetarist limit and a Keynesian limit ...

      http://informationtransfereconomics.blogspot.com/2013/09/an-information-transfer-history-of.html

      Delete

Comments are welcome. Please see the Moderation and comment policy.

Also, try to avoid the use of dollar signs as they interfere with my setup of mathjax. I left it set up that way because I think this is funny for an economics blog. You can use € or £ instead.