Wednesday, March 16, 2016

Overfitting and empirical data: qualitative and qualitative models

John Handley has a counterpoint to my draconian empiricism. It brings up an interesting point: what do you do in a world where error is a given and you have to make a model more complex than the data can reject to even be a qualitative success?

I tried my hand at a diagram:

Let me unpack it.
  • (Nate) Silver criterion: This is an arbitrary complexity criterion for good models -- about 1 parameter per 15-20 points of data. Models above this line cannot be rejected given the current empirical data. Models above this line tend to overfit, and therefore drift to the right over time. This horizon also rises over time as more data becomes available. I based this line on the number of quarterly points in a post-war time series.
  • Data horizon: The maximum amount of data that will be available in the near future. Models above this line will likely never be rejected by empirical data on a reasonable time scale.
  • Qualitative models: Models that get the directions of effects right, but are off by factors of order 2.
  • The modeler's compass: These arrows indicate the available directions a model can move. Improvements reduce error but increase complexity. Overfitting causes a model to drift to the right as more data becomes available. Errors can also improve over time as more data becomes available.

I put the NY Fed DSGE model and the IT model on here as examples.

Let's consider a few cases ...
  • X1: Making a terrible model a terrible qualitative model. Adding complexity to something complex that doesn't work doesn't make things better. This model should be abandoned.
  • X2: Making a terrible model an acceptable qualitative model. Adding complexity to something simple that makes it work qualitatively is acceptable modeling.
  • X3: A model so complex, data will never be able to reject it. This is garbage.

Now John gives us a scenario:
This same problem is captured by the failure of NK DSGE models to explain the Great Recession; the only conclusion that can be drawn from the empirical evidence is that the model is incomplete, which everyone already knew anyway, and therefore failed to capture all the possible causes of the Great Recession -- this has no bearing on the correctness of the rest of the model.
How do we represent this on our diagram? Let's take the NK DSGE model to start at the location of the NY Fed DSGE model. The great recession was essentially a large jog to the right -- into the red area ... like this:

Whether this is acceptable really depends on whether the model was more or less complex than the Silver criterion. If it was more complex, then we should think of the previous success as a case of overfitting: i.e. the model is garbage. If it is less complex, then it could be overfitting or it could just have been a qualitative model to begin with.

John makes the case that it is the latter: DSGE models are qualitative models. I don't buy this. For one, they are way too complex to be a qualitative model. Qualitative models above the Silver criterion basically overfit the data and are useless. They're not really telling you anything besides parroting back your priors.

The current state of affairs is that most of the DSGE models are already way too complex to be rejected (according to Noah Smith), therefore adding complexity to deal with an increase in error is a sign of a degenerative research program.


PS John also talks about never having a closed system in economics. However, any system can become an effective closed system if your instrumental variables move faster (move a greater magnitude in a shorter period of time) than your unobserved variables. That's why QE was such a good experiment!


  1. "John makes the case that it is the latter: DSGE models are qualitative models. I don't buy this. For one, they are way too complex to be a qualitative model. Qualitative models above the Silver criterion basically overfit the data and are useless. They're not really telling you anything besides parroting back your priors."

    So, RBC + monopolistic competition (which we basically know exists, even if the dixit-stiglitz model is a generalization) + sticky prices is too complex? I'm not sure I agree, especially since the model described above (nix capital) can be written in reduced form as three equations, at which point it basically becomes dynamic IS-LM + rational expectations. I'd definitely call that qualitative.

    Does anyone actually think that DSGE is structural? I hope not. IMHO, the whole point of having utility maximization and budget constraints is so that the model is internally consistent and nothing else. This is why I prefer DSGE to something like IS-LM or AD-AS (both of which I think you would agree are qualitative); I couldn't care less about Smets-Wouters (e.g.), since I find the idea that it is somehow structural laughable. Same goes for the NY Fed's DSGE.

    Does adding financial frictions make (super complicated and useless) DSGE models empirically successful? Maybe, but I think this falls under the point I make at the end of my post:

    "That said, the appropriate conclusion is probably that all economic models are doomed to be quantitatively unsuccessful (and even if they are, that is not necessarily an indication that the model in question is correct in the sense that it correctly matches causes with effects), even if they do capture some of the correct cause and effect relationships"

    That is, there is basically no way to tell if a model is structural given the lack of closed system, even if it is empirically accurate.

    Regarding QE, I completely agree, QE was a great experiment that proved that Market Monetarism relies entirely on unobservales and is thus unfalsifiable. Or, more seriously, that the qualitative Keynesian prediction that liquidity traps exist is correct. (Still, I much prefer any internally consistent NK model that has money demand to IS-LM, not only because of the internal consistency, but also because of the explicit modelling of AS as long as you don't cheat with the reduced form).

  2. Nice diagrams!

    Re: modeler's compass: overfitting (prior to more data being collected) would initially move to the left I'd guess... and also with an upward component too it seems, no? But those "gains" would be reversed with more data... thus the seduction of being lured into overfitting in the 1st place.

    Also, isn't is possible to sometimes move DOWN and to the right or left? Wasn't Copernicus' concentric circle planetary orbits a case of moving down and to the right (more error than Ptolemaic epi-cycles, but also less complicated)? By the time Kepler added ellipses, it moved back up and to the left, but not as much up as to the left. That might be an interesting trajectory on the diagram.

  3. Is overfitting standard in economics? I have heard more than one economist complain that competing models can be tweaked (my word) to fit currently available data, so that new data are insufficient to eliminate them. Pardon me, but that's not the way to do it, is it?

    Let's assume that WWII was a watershed economically, after which things have been different from before. Suppose that we wish to compare two different economic models. Let's take 1950 as our first year, to allow for things to settle down post war. Then shouldn't we do something like this? Fit the models to the data for the 44 years up to 1994, and then compare them using the data for the 22 years since. Isn't that a simple way to address overfitting?

    1. Bill,

      You literally just described overfitting...

    2. Hi, John.

      What do you mean by overfitting? The tweaking or the testing? Or both?