## Monday, August 3, 2015

### Statistical significance is not model correctness

I have been having a discussion with Mark Sadowski in comments here about what his series of posts at Marcus Nunes's blog actually show. My more general argument (that Mark's model is actually an information transfer model that demonstrates a liquidity trap) notwithstanding, here is a piece of the conversation:
Me: The data are exponential and any two exponentials will be related by the above relationship [log P(t) = a log MB(t-q) + b]. You're not learning that your model or theory is correct. You're learning that economic systems tend to be exponential. Interpreting statistical significance as model correctness is an inference error.
Mark: Just because two variables both exhibit a similar trend does not mean there is a statistically significantly relationship between them. Moreover there is often statistically significant relationships between variables which do not exhibit similar trends. So frankly this statement is weirdly nonsensical.
I agree that there can be statistically significant relationships between variables with different trends, which is beside the point; my point was that if your data are all roughly samples of exponentially growing functions, you can almost always find a statistically significant relationship between them even if they isn't any relationship at all.

I decided to demonstrate my assertion with a concrete example (the complete details are at the bottom of this post). Let's generate two randomly fluctuating exponentially growing data series (using normally distributed shocks to the growth rate):

Now let's say the red one is the data V1 and blue one is the explanatory variable V2 and fit the data to the model:

log V1(t) = a0 log V2(t - t0) + b0

Here is the model result (solid blue) -- the dashed blue line is a0 log V2(t) + b0, i.e. showing the data for V2 without the lag:

The fit has chosen a lag that lines up some of the random fluctuations in V1 to the random fluctuations in V2  a bit better (actually, statistically significantly better).

And the result is that the parameter p-values of the model fit are all p < 0.01 ... a statistically significant relationship, worthy of publication in an economics journal for example.

Except it was random data.

OK ... maybe that's a fluke?

So I did it 100 times -- and only a few percent of the results failed to achieve p < 0.01 (showing the parameter p-values with the worst p-values):

And that is for random data with no relationship between the two except that they are both exponential. That is what I mean by "[i]nterpreting  statistical significance [p < 0.01] as model correctness is an inference error."

Additionally, adding more data causes the previously identified relationship (the model with parameters a0, t0 and b0) to break down.

The big takeaway from this, and what brings it back to my more general argument, is that in order to make causal inferences you need to see strong changes in the trend of your data ... like the one that happens in 2008 here:

I once called this the cleanest economic experiment ever. Mark's analysis ignores this amazing source of potentially informative data by concentrating only on the post-2008 data (and thus contains only a single log-linear trend).

...

Here is the full Mathematica notebook:

#### 32 comments:

1. OK, great, I'm glad you continued this. I hope Mark responds, and that we ... er... you two eventually come to some sort of resolution so that I can potentially learn something. (c:

If you go back to where we left off with the comments, I asked him about coming up with a lag of 1 for M0. Any thoughts on that?

Also, it sounds like when he compared p-values for the three data variables he looked into, it sounds like he had a big difference between the M0 and the MB explanatory variables, with the former around 0.5 and the latter giving p-values for the three curves much smaller values. I realize that he's skipping the big change that you've pointed out here, but is there any reason for that you can think of off hand? Do you consider it to be of any importance? Why?

OK, thanks Jason.

1. Regarding p-values, they can be very sensitive to small changes.

But in general, I don't have any specific qualms or beefs with the numbers that came out of Mark's analysis. It's the interpretation that I have a problem with.

A good analogy is this analysis:

http://www.bradford-delong.com/2015/03/lots-to-worry-about-in-climate-change.html

There does not seem to be any errors in the math that they apply. It's the interpretation that follows the letter of the mathematical procedure, but nonetheless comes up with the silly result that the IPCC overstates future global warming.

2. Thanks Jason. Does your analysis here say anything about likely lags to be selected? In the previous thread, in response to one of my questions, Mark describes various tests used to determine what lag to use.

2. I have a busy day today and may come back late this afternoon.

However, on first pass it looks to me like you've simply fitted one nonstationary process on another nonstationary process without first detrending the data. This is a classic example of what is known as a spurious regression.

http://davegiles.blogspot.com/2012/05/more-about-spurious-regressions.html

Of course the p-values will be low.

1. Mark, thanks for the link. It doesn't mention p-values by name as far as I can tell though. Their behavior tends to track some of the other statistics they mention, or are you speaking from experience, or both? Did you detrend your data with differences?

2. Tom,
It was the first relevant link that I could find on the subject. Part one is a PDF that can be found at the end of the post. But the situation is even simpler than the post I linked to indicates.

Spurious regressions result in invalid estimates with high R-squared values, high t-statistics and low p-values.

Do you have access to any statistical software? Even Excel will do. Try regressing any time series with a trend on any other time series with a trend and you'll see what I mean. For example you could regress the US consumer price index on the population of Bolivia. Since both increase over time such a regression almost certainly will result in high R-squared values, high t-values, and low p-values. There's no reason for these two series to be correlated. The results are merely capturing the fact that they share an upward trend.

In my posts you'll note that I mention testing each of the time-series to determine their order of integration. I use the ADF test, for which the null hypothesis is non-stationarity, as well as the KPSS test, for which the null is stationarity. (It's good to have a cross-check.)

The issue of nonstationary data is usually dealt with by differencing. But when doing Granger causality tests, in the words of the master econometrician Dave Giles, “this is wrong”. Granger causality tests should be done on level data otherwise the results will be inconsistent. This is in fact addressed by James Hamilton in “Time Series” on pages 651-3. The Toda-Yamamoto method deals with the issue of nonstationarity by declaring extra lags of each variable to be exogenous variables up to the maximum order of integration of the two variables.

Before I went back to school to earn a PhD in economics I was a mathematician. I still earn most of my income teaching calculus and statistical methodology for the mathematics department at the University of Delaware. In Math 202 we teach ordinary least squares (OLS). One of the first things I show students is not to do what Jason Smith is doing here.

3. Hi Mark,

There is no trend to post-2009 MB data (it's a series of plateaus), so what trend did you "de-"?

I took for granted that it is fairly obvious that any de-trending of MB was entirely arbitrary, hence my objections in the post above. I should point to Noah Smith here:

http://noahpinionblog.blogspot.com/2012/07/steve-williamson-explains-modern-macro.html

Model-free de-trending of data is essentially mathematical opinion ... for example, here are four versions of the de-trended MB:

https://twitter.com/infotranecon/status/628678134934564864

Which is right? I'd say none. The monetary base is a series of steps: QE1, QE2 and QE3.

This provides a simple answer Tom's earlier question about M0 vs MB, the arbitrary de-trending is why there are differences in the p-values. If you subtract a linear trend from MB it's quite different from M0 (as shown in the graph at the twitter link). But subtracting the linear trend from MB is not in any way meaningful.

4. Jason,
The goal of detrending a series is to render it "stationary". A stationary series is stochastic process whose distribution does not change when shifted in time. Thus the mean and variance of the process do not change over time, and they also do not follow any trends.

Extracting a cyclical component from a linear trend is extremely unlikely to result in a stationary process (check the ADF and KPSS test results if you do not believe me). Extracting a cyclical component via a Hodrick-Prescott filter may result in a stationary process, but only if the smoothing parameter is of a low enough value (again, check and you will see).

This is not a matter of mathematical *opinion*, this is a matter of mathematical *definition*. If processes are not stationary, regressing one on another will result in biased estimates. Period.

The fact that you think that NoahPinion post is remotely relevant indicates you do not even begin to understand what I am talking about.

5. The reference to Noah Smith was about the arbitrariness of the procedure of de-trending data and claiming it is not strongly model dependent, not the specifics of the application.

The distribution of changes in the monetary base is not stationary -- there are three large changes due to QE1, QE2 and QE3 that do not come from a distribution! Which is my entire point. There are signals in the data that you are sweeping under the rug.

6. Edit:

Should say: distribution of the changes in the de-trended monetary base is not stationary ...

7. "The reference to Noah Smith was about the arbitrariness of the procedure of de-trending data and claiming it is not strongly model dependent, not the specifics of the application."

The Noahpinion piece is about the arbitrariness of the process of de-trending to identify *business cycles*. There is simply no credible way it can be interpreted as being critical of the necessity of rendering a nonstationary process stationary for time series analysis.

"The distribution of changes in the monetary base is not stationary -- there are three large changes due to QE1, QE2 and QE3 that do not come from a distribution! Which is my entire point. There are signals in the data that you are sweeping under the rug."

The natural log of the monetary base from December 2008 through May 2015 is *integrated of order 1*. *First differencing* it renders it *stationary*. Check the ADF and KPSS test results.

8. Um, sorry that is just wrong. The H-P filters are precisely "rendering a nonstationary process stationary for time series analysis". The core of real business cycle theory is that the business cycle is a stationary process.

Rendering a non-stationary process stationary cannot be done in a model independent way, which is entirely my point.

When you first-difference the MB data, you are making a model-dependent assumption on the impact of changes in MB on other variables.

And when you first-difference only 2008-2015 MB data, you are making another model-dependent assumption.

Your analysis is strongly model dependent with the model assumptions tucked away in the selection of the data and the procedure for de-trending it.

I'm not the only person who sees the idea that QE has a strong effect on the economy as having been disproved ... that the rise in the monetary base in 2008 was a clean economic experiment. For example, here is John Cochrane:

http://johnhcochrane.blogspot.com/2015/02/doctrines-overturned.html

And in the end, your result is actually a consistent with a version of model. You have a ~ 2% increase in the base causing a ~ 0.2% increase in various measures. Since you leave off the pre-2008 data, you miss the finding that a ~ 2% increase in MB has increases various measures by ~ 2%. That change from 2% to 0.2% -- a fall by a factor of 10 in the efficacy of monetary policy -- is the "sudden onset" version of the liquidity trap. It's not the best model I have (M0 with a gradual onset is), but it's not entirely inconsistent with what I've been saying. It becomes a matter of degree.

But you ignore the pre-2008 data and de-trend the rest, so your finding of efficacy of monetary policy is model dependent.

9. "Um, sorry that is just wrong. The H-P filters are precisely "rendering a nonstationary process stationary for time series analysis".

Um, no. HP filters can render nonstationary processes stationary if you set the smoothing parameter low enough in value, but that is definitely not what they are used for in practice.

"The core of real business cycle theory is that the business cycle is a stationary process."

The core of standard RBC models is the idea that business cycles are driven mainly by *large* and *cyclically volatile* shocks to productivity. Large and cyclically volatile shocks to productivity are not in principle compatible with stationary processes. The smoothing parameter is almost never set at values low enough in RBC models to render the processes stationary.

"Rendering a non-stationary process stationary cannot be done in a model independent way, which is entirely my point."

I repeat, a stationary series is stochastic process whose distribution does not change when shifted in time. Thus the mean and variance of the process do not change over time, and they also do not follow any trends. This is a mathematical *definition*.

"When you first-difference the MB data, you are making a model-dependent assumption on the impact of changes in MB on other variables."

First differencing a process that is integrated of order one is the standard procedure in time series analysis for rendering it stationary.

10. The original HP filter was designed for RBC models such as Kydland-Prescott (same Prescott) which are essentially DSGE models. The HP filter is designed to figure out the changing E in the time series and subtract it from the time series leaving something that tends to look like a stationary AR process.

Prescott likely didn't design a filter to invalidate his own approach to macroeconomics by leaving a non-stationary time series. I'm not saying that is how they are always used, but making a time series non-stationary is the original purpose.

I understand that first difference is a standard method. But it represents a linear model -- the FD of MB is equivalent to the derivative of MB with its linear trend subtracted (shifted by a constant):

https://twitter.com/infotranecon/status/628978844528197632

so it represents a linear model of the trend of the data. The monetary base reserves do not have a linear trend, so removing a linear trend

a) represents a model dependent result
b) will induce artifacts in the data since it does not have a linear trend
c) is not exactly valid given the non-linear behavior of MB

3. Looking forward to responses here:

From Jason: The detrending issue. Also, you mention the sensitivity of p-values above. To summarize what I think I hear Mark saying regarding the data since 2008 (only), MB probably is a good explanation while M0 is probably not. And you're saying that if we restrict ourselves to 2008 and later data only, that's a reasonable conclusion (putting aside the "sensitivities" you mention in p-values), but why ignore the elephant in the room (i.e. why not include pre-2008 data as well)?

From Mark: Jason's point about excluding pre-2008 data (I've asked before, although I'm not sure you saw my 2nd question on the subject).

1. Tom,
I left a response there.

2. Thanks Mark. I have left you one further question there.

4. "Interpreting statistical significance as model correctness is an inference error."

Indeed. That is a general truth. Statistical significance means that there is good evidence against the null hypothesis. That means that there is confirmatory evidence for every other hypothesis, not just for the model in question. And, as we know, confirmatory evidence is very weak.

5. "I once called this the cleanest economic experiment ever."

So I click on the link and I find the following.

http://informationtransfereconomics.blogspot.com/2014/11/quantitative-easing-cleanest-experiment.html

"Let's plot the Pearson's correlation coefficient of MB (blue) and M0 (red) with P (as well as the correlation of MB and M0, green):

[Graph]

Before QE, all of these are fairly highly correlated -- actually MB and M0 are almost perfectly correlated. This really doesn't tell us very much. NGDP is also highly correlated with the price level. So is population, and in fact any exponentially growing variable.

With the onset of QE, the correlation between MB and P drops precipitously (as well as the correlation between M0 and MB). We see that the counterfactual path MB without QE would have been more correlated with P (effectively given by the red line)....This means central bank reserves have nothing to do with the price level or inflation."

What Jason did in that post is essentially what he is doing in this post.

You cannot regress nonstationary time series on each other. These are spurious regressions. Spurious regressions result in invalid estimates with high R-squared values, high t-statistics and low p-values.

Thus the Pearson's r values (the square root of the R-squared values) in this earlier post were subject to extreme statistical bias and so could not have been interpreted as meaning that these series are correlated. Nor could the fact that the Pearson's r values dropped precipitously be interpreted as meaning anything.

One must correct nonstationarity by differencing, or by some other method before checking to see if two time series have a statistically significant relationship.

Either Jason is guilty of outrageous obfuscation or he simply doesn't know what any statistics student should have learned by the completion of his freshman year in college.

1. Mark,

I think you've been dealing with econometric data so long, you've forgotten what an obvious signal looks like :)

That's probably the reason you've also forgotten your freshman statistics: spurious correlation mainly applies when two variables can be related by a (log-)linear transformation. There is no linear transformation that maps MB to M0 = (MB - reserves) so the change in correlation is a sign of a real change in the data.

In layman's terms, there is an obvious signal in the monetary base in 2008. Which should be even more obvious because there were several Federal Reserve announcements about it.

If you had read more carefully, you'd have noticed that I say the correlated bit tells us nothing ("This really doesn't tell us very much."). It's only the sudden change in 2008 that I point to in order to emphasize the fact that there is a clear signal.

2. Additionally, the US isn't the only observation. We see the same thing in the EU, Japan and the UK where the signals have different shapes and come at different times:

https://twitter.com/infotranecon/status/628693809925222400

https://twitter.com/infotranecon/status/628693763603365888

https://twitter.com/infotranecon/status/628693898580262912

We don't have to muck about with teasing real fluctuations from noise anymore! At least with regard to the monetary base.

3. (Maybe you should try your analysis on other countries?)

4. "That's probably the reason you've also forgotten your freshman statistics: spurious correlation mainly applies when two variables can be related by a (log-)linear transformation. There is no linear transformation that maps MB to M0 = (MB - reserves) so the change in correlation is a sign of a real change in the data."

I repeat, you cannot regress nonstationary time series on each other. These are spurious regressions. Spurious regressions result in invalid estimates with high R-squared values, high t-statistics and low p-values.

Only a person interested in spreading baldface demagoguery would base an argument on a change in an estimate that is totally invalid to begin with.

5. Your objection is only valid in your narrow world of exponentially growing functions that lack strong signals.

I will say that that here is quite literally physical evidence I am right ... that you can look at the correlation of non-stationary time series and extract a signal:

http://www.sandia.gov/RADAR/_assets/images/gallery/modes-10.png

You take two highly non-stationary time series of synthetic aperture radar pulses and look at the drop in the correlation. Tire tracks that occur because someone drove through the area between the two images show up.

That's because it's a real signal.

Are you saying we're not seeing a real signal in that image?

I think it seems like I'm being a "demagogue" to you because you don't quite understand my objection to your analysis.

6. Jason, I don't know much about SAR. Let me try to understand: so a aircraft mounted SAR flies over that scene twice: once before the vehicle leaves those tracks and once after. Processing the time series of pulse returns the 1st time results in the image on the upper left, and doing it the 2nd time results in the image on the lower left. I call them "images" but are they actually 3D maps?

At this point, you have two "images" and can use any number of techniques (I'd think) to extract those tire tracks. But you're saying we can use the original time series data sets (used to construct the images on the left) directly... so then what? I'm struggling with "You ... look at the drop in the correlation." Can you explain? Is it true that each returned radar pulse received by the SAR essentially comes from that whole scene? Are you saying that there are periods of time when the two sequences of received pulses don't correlate well with one another? Of course I'm assuming that the time axis on each series of pulses is reset between flyovers.

Putting aside that these images came from SAR for a moment, I can imagine a simple procedure of using a least squares procedure to skew, shift and scale (both size and brightness) one image wrt the other to maximize their "alignment," and then essentially subtracting one from the other. Perhaps conceptually this can be described as scanning through the pixels and looking for large local residuals or errors in the fit (drops in correlation?) and then "detecting" those pixels? Can we still speak of a "non stationary" signal, but in space (x and y coordinates) rather than time in that case?

OK, super interesting, thanks...

7. Hi Tom,

I'll try to answer your questions.

1. Actually the two images on the side both represent CCDs themselves. In one case, there was no change between the two images and in the second there was. The big image is the difference between the two CCDs.

2. You can make 3D maps from them, but generally they are treated as 2D images.

3. There are small phase and amplitude changes due to slight disturbances to the, say, hundreds of scatterers in the energy returning from a resolution cell. Each pixel in the CCD actually represents the Pearson correlation of several pixels in the original image.

There is more here:

http://www.dtic.mil/dtic/tr/fulltext/u2/a458753.pdf

4. Each returned pulse does come from the whole scene, but the pulses are usually chirped (so different frequencies represent different ranges) and since you are moving, different positions on the ground represent different Doppler shifts. This creates a grid coordinate system so you can put energy into different range and doppler bins and form an image.

5. SAR is collected as a time series of pulses, but by the process in 4 above, they can be transformed into a 2D image. The information is the same, and so the data is non-stationary in x-y as well as in time-frequency.

6. The process you describe is more like what is called 2-color multi-view. CCD is a coherent process that requires (roughly) identical imaging geometry.

Here is someone demoing how to get a 2CMV from some radarsat imagery:

https://www.youtube.com/watch?v=da9XLByKeWI

...

Did I miss anything?

8. Jason, that's fantastic. Thanks for taking the time to answer that. It'll take me a while to digest it. If you missed anything, so be it. I've got enough there to keep me busy for a while.

9. So a "drop in the correlation" refers to the Pearson correlation you mentioned?

10. In your opinion, if Mark were given the required SAR time series (and perhaps some information about the model), where would he likely go wrong in analyzing it (if, indeed, you think it plausible that he'd go wrong). By the time he's detrended the series in a manner he'd approve of, would the chance for successful correlation/estimation/etc be diminished? If he was successful, what would a successful result look like? The big picture, or is something less than that indicative of success?

6. This comment has been removed by the author.

7. This comment has been removed by the author.

Comments are welcome. Please see the Moderation and comment policy.

Also, try to avoid the use of dollar signs as they interfere with my setup of mathjax. I left it set up that way because I think this is funny for an economics blog. You can use € or £ instead.