Sunday, August 23, 2015

Rational expectations and information theory

KL Divergence using Gaussian distributions from Wikipedia.

Noah Smith once again deleted my comment on his blog, so I'll just have to preserve it (well, the gist of it) here.

He discussed an argument against rational expectations he'd never considered before. Since counterfactual universes are never realized, one can never explore the entire state space to learn the fundamental probability distribution from which macro observable are drawn. Let's call this probability distribution A. The best we can get is some approximation B.

Rational expectations is the assumption A ≈ B.

If this sounds familiar, it's exactly the way one would approach this with the information equilibrium model as I discussed several months ago.

In that post, I showed that the KL divergence measures information loss in the macroeconomy based on the difference between the distributions A and B.

D(A||B) = ΔI

That was the content of my comment on Noah's post. I go a bit further at the link and say that this information loss is measured by the difference between the price level and how much NGDP changes when the monetary base changes

ΔI ~ P - dN/dM = dN*/dM - dN/dM 

Which to me seems intuitive: it compares how much the economy should grow from an expansion of the money supply (ideally) to how much it actually does grow.

Just the aggregate ΔI is measured, however. Two different distributions BB' and B'' can have the same KL divergence so this doesn't give us a way to estimate A better.

Now rational expectations are clearly wrong at some given level of accuracy, but then so are Newton's laws. The question of whether you can apply rational expectations depends on the size of ΔI. Since ΔI is roughly proportional to nominal shocks (the difference between where the economy is and where it should be based on growth of M alone [1]) and these nominal shocks are basically the size of the business cycle, it means rational expectations are not a useful approximation to make when analyzing the business cycle.

As far as I know, this is the first macroeconomic estimate of the limits of the rational expectations assumption that doesn't compare it to a different model of expectations (e.g. bounded rationality, adaptive expectations). (There are lots of estimates for micro.)

[1] In case anyone was curious, this also illustrates the apparent inconsistency between e.g. this post where nominal shocks are negative and e.g. this post where they are positive. It depends on whether you apply them before including the effect of inflation or after. Specifically

0 = dN*/dM - (dN/dM + σ) = (dN*/dM - σ) - dN/dM


  1. "Two different distributions B, B' and B'' can have the same KL divergence so this doesn't give us a way to estimate A better."

    Looks like three.


    1. And here I was worried that using B' and B'' would have been confusing (what about B?)

      "1. Typos in posts don't reveal themselves until you've published. If you schedule a post to publish in the future, the typos will be revealed then. This is an absolute, inviolable rule of blogging. This may be some sort of subtle lesson from the universe about our hubris in the face of fundamental impermanence."

    2. Thanks for the link. I have to say though I'm more confused now because you didn't change "two" to "three" or otherwise structure the sentence differently. I'm I parsing it wrong?

  2. What's the story with Noah? Why do you think he erases your comments?

    1. He doesn't block my comments, even when I occasionally provide a link to one of your posts. Perhaps because it's clear I have no idea what I'm talking about (and thus provide some amusement... much as does Ray Lopez for Scott)? Lol

    2. I think I have enough data to say that he doesn't erase the ones where I just agree with him.

      I think he deletes "self promotion": e.g. 'I agree and said something similar at my blog.'

      which is his prerogative.

  3. O/T: one question of mine that I don't recall that status of (did I ask it? did someone answer it? did I just forget?) during the great Smith Sadowski row of 2015 was, putting aside the non-linearities of MB after 2008, why couldn't Mark have done a Granger causality test for MB causing inflation (and all the rest of what he looked at) for the years from say 2001 to 2008? Mark said (in a response to my question on Nunes' blog) that was possible but difficult, but then both you and he offered the explanation that MB was not the Fed's target as well. This confuses me because I'd think that regardless of whether MB was the ultimate target, that it was used as a lower level target to (help?) achieve the upper level targets of either the FFR or the inflation target.

    What I'd expect to see from such an analysis is that MB Granger causes inflation (and other things) but the effect is perhaps larger (perhaps closer to 1:1) rather than 10:1 on a percentage wise basis.

    I ran that idea piecemeal past Nick Rowe: first asking him if MB was a more "fundamental" level, even when targeting short term interest rates or inflation. He essentially said "yes, that's one way to look at it" (in the back of my mind I was hearing him muttering under his breath "it you're person of the concrete steppes anyway, because we all know that it's Chuck Norris imparted expectations that really matter and are most fundamental").

    But when I went the extra step of asking "So what do you think a Granger causality test would indicate for the effectiveness of MB causing inflation prior to 2008 vs after 2008 on a percentage wise basis?" he didn't go there, explaining that he didn't really trust VAR studies. What's amusing though is that in the comments to Mark on Nunes' blog he asked Mark "So I didn't read your series, but what's the upshot? What percent change in MB causes a 1% change in inflation?" and Mark responded "about 10%" which seemed to satisfy Nick.

    So do you think it's possible to perform as straightforward Sadowski style Granger causation VAR analysis for MB (as the causing variable) for the years prior to 2008? What do you think would be the result? Would it be stopped in it's tracks early on by insufficient sign of causation?

    1. ... please excuse the typos above: hopefully it's still coherent.

    2. Hi Tom,

      I will take this on piecemeal.

      " ... but then both you and he offered the explanation that MB was not the Fed's target as well."

      To be precise, Mark gave that as an explanation and I think I just accepted it as a valid (model-dependent) assumption. It is true that the Fed targeted the base with QE (either by level for QE1 & 2 or rate for QE3 ... X billion dollars of QE, or Y billion dollars/month). The short term interest rate target is actually a range 0.0 to 0.25%.

      In the ITM, a short term interest rate target of 0-0.25% actually decouples the MB from the interest rate target if the MB moves inside a range of values. The short term rates are 0.001% or so because of QE, but that isn't the target rate. You can move the MB around quite a bit though and be consistent with that interest rate range. MB can be roughly about 50% less or + infinity more. If it was 100% less (i.e. the Fed unwound QE completely), short term interest rates would be about 2% as I showed here:

    3. "So do you think it's possible to perform as straightforward Sadowski style Granger causation VAR analysis for MB (as the causing variable) for the years prior to 2008? What do you think would be the result?"

      I would say that the response would be about 1.4:1 for the period before 2008 and 11.1:1 after. Actually, that is exactly what I calculated here:

      "Before 2008, a 100% increase (a doubling) of the monetary base would have lead to a 70% increase in the price level. After 2008, it leads to a 9% increase in the price level."

      So there's your 1:1 and 10:1.

      That's why I said Mark was my first monetarist convert -- he built an information transfer model with IT index kappa of 0.57 before 2008 and 0.89 after.

      Note that if kappa = 0.89, then 1/kappa ~ 9/8 = 1.125 so the price level is

      log P ~ (1.125 - 1) log MB
      log P ~ 0.1 log MB

      I rounded in the above equation. Now if MB ~ exp(r t), then

      P ~ exp(0.1 r t)

      ... or about 1/10th the growth rate r of MB.

      Now that specific model (where you use MB instead of M0) had some issues treating the entire period of available data as a single model for the UK, US and Japan in this post:

      There is a case to be made for monetary regime change and thus two different values of kappa in the US before and after 2008, though. However as I am the only one working on the IT model so far there is no one to disagree with me in the context of the IT model ...

    4. "-- he built an information transfer model with IT index kappa of 0.57 before 2008 and 0.89 after."

      If Mark rode the trolley all the way to the end and analyzed the data before 2009.

    5. "I would say that the response would be about 1.4:1 for the period before 2008 and 11.1:1 after. Actually, that is exactly what I calculated here:"

      Right, I remember. That's why I estimated (crudely) 1:1 and 10:1. That's what I was thinking of: your post on that.

      The remaining question for me though is if Mark were to do his usual Granger analysis, where he checks to see if he can reject one thing NOT causing another with some significance, and then turns it around to see if he can reject the opposite (the 2nd thing causing the 1st), what he'd find. You're effectively saying he's probably find MB Granger causes P, and furthermore he'd probably find something like the 1:1 factor as well, which seems reasonable.

      As I recall in his 1st "Age of ZIRP" post he found that PCIE Granger causes MB to 0.05 significance and MB Granger causes PCIE to 0.01 significance. I don't think he pursued the 1st one (PCIE Granger causes MB) any further.

      I think he uses the Toda & Yamamoto method (Dave Giles does a T&Y example, and based on the relatively high number of comments (325), it was one of Dave's more popular posts).

      So when Mark said it would be more difficult to do the pre-2008 analysis, I wonder why? Separately he brought up the FFR targeting issue. Here's what occurs to me:

      1. Perhaps his Granger non-causality test would indicate that he can't reject the null hypothesis of MB NOT Granger causing PCIE... which would lead him to some more difficult path, or perhaps that's where he'd feel justified to stop.

      2. Perhaps the pre-2008 changes in MB are not large enough or there's some other problem with the data which would force him to get better data (like with a higher sampling rate).

      3. Maybe he'd feel compelled to factor in the FFR targeting or inflation rate targeting somehow and perhaps that complicates the model or makes analysis more difficult or less reliable for some reason.

      I shouldn't have emphasized (to you and every one else) that MB changes are used as a tool to target other things pre-2008 (such as the FFR or inflation rate). I mostly meant to point out that pre-2008 MB *was* varied (because that's the direct result of OMOs, which were undertaken by the Fed during that time), regardless of how or if those changes in MB factored into any other targets. In other words, regardless of what else the Fed was doing, it was doing some OMOs which were directly changing MB: so it seems like we ought to be able to see if that Granger caused anything else (like PCIE) via whatever methods Mark and/or Dave Giles would bless. I assume he'd want to do the pre-2008 and post-2008 separately because of the break in trend.

    6. Jason, forget about the above questions... I think I've beaten this to death. I'd erase it, but I wasn't logged into my Goggle account when I wrote it, so I can't (fee free though!).

    7. Ok, I will. But in thinking about it, the two big spikes in inflation that seem associated with the two periods of the oil crisis in 1973-4 and 1979 might mess up the correlation. Those are supply shocks that need a model to understand -- which I discuss more in this post:

      But you could probably use 1985-2008 just fine.

    8. "But you could probably use 1985-2008 just fine."

      I wasn't even considering (Mark) going that far back: I thought just to 2001 (for a nice symmetric 7 years pre-2008 and 7 years post-2008). But if you think you (or Mark) could go all the way back to 1985 w/o a "structural break" (which Mark indicated you have to avoid), then that would be better still.

  4. "Useful" is a weasel word, useful for analyzing which aspects of the business cycle, and by which criteria...?

    1. Hi LAL,

      I was softening the blow a bit by saying "not useful" instead of "invalid". It is like assuming there is no Moon and attempting to calculate the tides. That is very precise analogy ...

      rational expectations:
      assuming there is no Moon → assuming ΔI ~ σ ≈ 0

      business cycle:
      tides → σ ≠ 0

      Effectively, there shouldn't be a business cycle with rational expectations (in the information transfer framework).

    2. So you are saying that under rational expectations the information transfer model means dN*/dM = dN/dM. Further that the difference between the two accounts for the empirically observed business cycles?

    3. I think the most intriguing interpretation of rbc models is to assume over the relevant time series that monetary policy was adequate to control the economy and was correctly used, then observe the remainder. There is still something there that seems to cycle, although the interpretations are usually secular shifts in leisure preferences or TFP. Could I have a partial rational expectations framework that somehow uses the KL divergence to know when to be in rbc mode and when not to?

    4. Hi LAL,

      RE: both your comments

      In the model as written. There could be improvements. There are some fluctuations due to changes in M (it's not a perfectly smooth function) so there are some fluctuations in N* that could be considered a separate part of the cycle. Essentially there are two kinds of fluctuations:

      fluctuations in N* (and hence N)
      fluctuations in σ

      Now it appears that σ accounts for what we think of as the business cycle -- it is given by changes in employment as I look at here:

      But it's not perfect. It could be measurement error or it could be fluctuations in N* due to fluctuations in M. I don't have all the answers there, but σ seems to account for most of the business cycle.

      However! There is also non-ideal information transfer -- that appears to exacerbate recessions. So that would be an extra bit of cycle on top.

      I don't think I have all the answers on this yet. I made a stab at some of the effects awhile ago here:

      I'm still trying to fully understand it myself. Hence I left this stuff out of the first paper ...


Comments are welcome. Please see the Moderation and comment policy.

Also, try to avoid the use of dollar signs as they interfere with my setup of mathjax. I left it set up that way because I think this is funny for an economics blog. You can use € or £ instead.

Note: Only a member of this blog may post a comment.