Friday, March 1, 2019

DIEM versus VAR or: Too Many Variables Spoil the Broth

Sorry I haven't been updating my blog as frequently lately — with only 2 posts in February and 3 in January,  I'm well below my previous low of 7 posts in April of 2014 (vacation + business trip) and September of 2018 (prepping for the "Outside the Box" workshop). This time, it's due to being massively busy at my real job and devoting most of my spare time to my forthcoming book. There's a bit of being bored with comparing the latest data with my forecasts in there as well — a topic that accounts for a sizable fraction of posts for each month.

This is one of those posts, but it's a more interesting one because we get to close out a head-to-head comparison between a dynamic information equilibrium model (DIEM, see my paper here) and a vector autoregression (VAR) from the Minneapolis Fed — originally requested by @unlearningecon on Twitter. The data being forecast is real GDP growth data for the US (e.g. here). While the data is consistent with both forecasts, the DIEM had considerably tighter confidence as well as a bias consistent with zero while the MF VAR was biased low:


Note that the DIEM unemployment forecast already significantly outperformed the MF VAR under the same metrics (tighter confidence, much lower bias).

The thing is that a VAR model is pretty agnostic — it's basically a regression. Really, the only thing that would make the error larger than the DIEM model would be which variables you decide to include in the VAR. In particular, including too many irrelevant variables effectively adds noise and reduces your confidence. Consider a hypothetical scenario: if you think interest rates affect RGDP growth — but in real life they don't — your VAR is going to have a larger error due to adding in irrelevant interest rates.

To bring this back out to a more general discussion, I think this is a big problem with a lot of macro models. There are a lot of assumptions about how an economy works that go into the choice of the variables used in the model. You might think that irrelevant variables would usually get zero coefficients in a regression if there's no dependence on that variable, but this isn't true for an over-determined system with correlated data. In that situation, weird things like the order of variables in the regression algorithm can become important. There are of course ways to test for these things (such as regressing on subsets of your variables), but I'd be surprised to hear that mainstream macro modelers drop e.g. interest rates from their models. A lot of non-mainstream economists seem to be as wedded to their ideas of which variables are important, making this a problem across the board.

I've mentioned before that I think this might be one useful byproduct of using machine learning in macro modeling (e.g. here) since one thing that machine learning tends to do is destroy irrelevant information. However, curation of your data sets is an issue for machine learning as well, so it's not a panacea. It's still a judgment call, and really the only thing we have to go on is the data.

5 comments:

  1. Congrats on another successful prediction. Thanks for the update and additional insights.

    ReplyDelete
  2. This is really interesting. Thanks. I seen you write about over determined models before. I was thinking "machine learning" too, so I'm glad you brought that up.

    ReplyDelete
    Replies
    1. BTW, By The Way = Tom ;^) (I just can't be bothered to change logins sometimes to leave a comment)

      Delete

Comments are welcome. Please see the Moderation and comment policy.

Also, try to avoid the use of dollar signs as they interfere with my setup of mathjax. I left it set up that way because I think this is funny for an economics blog. You can use € or £ instead.

Note: Only a member of this blog may post a comment.