Saturday, April 11, 2015

All models are wrong, but some are tedious


I've come across the "all models are wrong" trope that seems to make its way in economics more than other model-building fields a couple of times recently.

Stephen Williamson

Saying a macroeconomic model is wrong misses the point. These models are all wrong, in the sense that, with sufficiently good data in sufficiently large quantities, which has in some sense performed the right natural experiments for us, we can reject any model.
Lars Syll
Even though all theories are false, since they simplify, they may still possibly serve our pursuit of truth. But then they cannot be unrealistic or false in any way. The falsehood or unrealisticness has to be qualified.

I believe the original version of the quote is here:

George E. P. Box
Since all models are wrong the scientist cannot obtain a "correct" one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity. (1976)
Box was a statistician (he passed away a couple years ago), and his concern was overfitting -- not a defense against where models fail or concern about the content of the theory. The original quote is more a statement about Occam's razor than a model failing an empirical test or not describing reality. All models are wrong, so don't go gold-plating a Rube Goldberg device. Not the more nihilistic: all models are wrong, so don't worry about empirical data or all models are wrong so worry about failures of realism.

Realism is in the eye of the beholder -- the original formulation of quantum mechanics (Heisenberg's matrix mechanics) was considered not only unrealistic but rather abstract with infinite dimensional matrices. It wasn't widely accepted until Schrodinger came up with his famous wave equation -- a differential equation following typical methodology of the time. It was a more realistic approach with things that looked like waves. You could take that as an analogy for economic models that have agents that look like humans. Additionally, I am sure the formulation of economics in terms of information equilibrium would be considered silly as well.

A real model contains its own limits so worrying about empirical data that goes against the model predictions misses the point. In the specification of the model, you should have some indication of the kinds of empirical data on which it will fail. As a scientist, you hope a model fails an empirical test. One of the most disappointing things about the Standard Model in physics is that it doesn't fail empirical tests [1]. When you have a model, the failures point out where there are things you don't understand.

In the light of the original meaning of "all models are wrong", we see in Willamson's quote that he points to places where more research ought to be done; it is not as he suggests a reason to not worry about where the model is wrong.

In the light of the original meaning of "all models are wrong", we see in Syll's quote that realism is an additional ad hoc constraint based on current (likely flawed) understanding; it is not as he suggests a path to success.

All models are wrong is properly taken as a rallying cry against tedium. Macroeconomists should not be adding variables and complications to their models because there simply isn't enough data to warrant doing so. Read Nate Silver on overfitting -- there are only about 200 quarterly observations of economic data in the post-war US economy where data is relatively good, which implies that a model should at most have about 10 parameters (some DSGE models have 40 parameters or more!). Noah Smith likes to say that macro data is uninformative. Really what that means is that economists have ignored Box: they shouldn't have so much overparameterization. With fewer parameters, the data isn't uninformative ... if you just have two parameters, the data is actually completely informative.

This tedium also manifests as overly complicated agent-based models. Sure, they get more realistic in Syll's sense, but they too overfit in Box's sense.

All models are wrong, but some are tedious.

Footnotes:

[1] There are some things the model doesn't include (neutrino oscillations, for example).

No comments:

Post a Comment

Comments are welcome. Please see the Moderation and comment policy.

Also, try to avoid the use of dollar signs as they interfere with my setup of mathjax. I left it set up that way because I think this is funny for an economics blog. You can use € or £ instead.