Sunday, August 9, 2015

Comparing methodologies (monetary base and short term interest rates)

In the process of setting up the Granger-causality test of the relationship between the monetary base (St. Louis Adjusted monetary base) and short term interest rates (3 month secondary market rate) mentioned here, but I thought there should be a quick aside on something Mark Sadowski said:
... if a time series model’s results are reported without routine references their statistical significance, then they should always be viewed with deep skepticism. ...
... an argument consisting of little more than a line graph depicting the time series of values of a quantity should never be accepted as a proof of anything, except that the person who is arguing that it proves something is not familiar with what constitutes acceptable empirical evidence, is incapable of understanding what constitutes acceptable empirical evidence, or is simply willfully ignoring what constitutes acceptable empirical evidence because it contradicts their preferred model.
Almost zero theoretical physics papers pay attention to statistical significance in this way. Obviously physics is "doing it wrong" and that's why the field is in a state of existential methodological crisis. Oh, wait. That's economics.

Personally, I'd say any statistical results presented without reference to an explicit model means that an implicit model has been used -- and you treat any implicit results with deep skepticism.

As Paul Krugman says:
Any time you make any kind of causal statement about economics, you are at least implicitly using a model of how the economy works. And when you refuse to be explicit about that model, you almost always end up – whether you know it or not – de facto using models that are much more simplistic than the crossing curves or whatever your intellectual opponents are using.
Here's an example where you can see this at work ... first setting up the data


The next bit looks for the best single lag to compare the first differences of the two time series (in both directions) ...


This is a basic fit with a single lag ...


It works pretty well for an economic model (changes short term interest rates cause changes in the monetary base: rates go down, base goes up ...)


How about the same thing but looking at levels? Well the traditional stats test is going to be garbage because of spurious correlation ... but the model looks much better than the statistically "proper" version ...



Does a slavish devotion to statistical purity lead us away from real understanding?

45 comments:

  1. By default, what are the 1st and 2nd colors plotted? What's the name of the math tool you use again?

    I'm building an internal model (a very rudimentary one, based on very superficial understanding) of the Smith-Sadowski disagreement of 2015, but however poor my model is, it can make predictions: I predict that Mark will hate this. Lol. Furthermore I predict that there will be more than one thing he hates about it. Offhand, I'd say he'll hate about 5 main things about it.

    ReplyDelete
    Replies
    1. Ha!

      The first color is blue and the second one is reddish-purple by default in Mathematica 8.

      Actually, the disagreement can be summed up in a single sentence by Dave Giles as I also link below:

      "Standard tests for Granger causality (or, more correctly, Granger non-causality) are conducted under the assumption that we live in a linear world."

      I can't help but kind of sing the last part to this.

      In a linear
      In a linear
      In a linear ... world ...

      Delete
    2. I kind of wish I had that quote at hand several days ago. It could have spared us all of this nonsense (if Mark took it seriously).

      Delete
  2. So the extremely tiny p-values in the fit to levels is because there's a trend to both time series? That's why you suspect ahead of time it's going to be garbage and also very small? You state there's a spurious correlation happening, and that's because one trend is being correlated with the other? OK, thanks.

    ReplyDelete
    Replies
    1. Not only that, but the procedure searches for the best fit: "The next bit looks for the best single lag" That introduces non-randomness. I do not know how the calculated p-values take that into account.

      Delete
    2. This comment has been removed by the author.

      Delete
    3. @Bill, I don't understand your concern. How does that introduce "non-randomness."

      Delete
    4. Hi Tom,

      Yes, the extremely small (10^-161) p-values are indicative of something wrong like spurious correlation.

      An regarding Bill's comment, I think he is saying selecting via lowest p-value uses an implicit model -- and I completely agree.

      Delete
    5. Hi, Jason.

      Implicit model is one way of looking at it, I suppose. Violating the logic of Fisherian statistics is another way. ;)

      There are ways of dealing with that kind of thing, but I doubt if they would produce such tiny p-values.

      Delete
    6. BTW, it does not violate Bayesian statistics, however. My inclination would be to search for the best fit in the first 3/5 of the data, and test it with the last 2/5. :)

      Delete
    7. @Tom

      The search is non-random. If it were random, you could end up with any lag. ;)

      According to the Fisher's logic, you have to decide what to test beforehand. Looking for the best fit is what I used to call squirrel hunting as a kid. It is akin to shooting at the side of a barn and then painting a target later.

      Delete
  3. Jason, is there an equivalent to econometrics in physics? Do you call it physonometrics? Lol

    ReplyDelete
    Replies
    1. Ha!

      No, we generally take results within 2-sigma to be interesting theoretical agreement (passing through the error bars on data), 6-sigma to be an experimental discovery (in an experiment like the Higgs), and leave it at that.

      Delete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Jason, can you find an example from physics following the usual vector differential equation:

    x' = f(x,u) + noise
    y = g(x,u) + more noise (a separate "measurement" equation)

    (with u an input vector, and y and output vector and x a state vector) such that given a time series from such a model (but no or limited information about the model), a Granger causality analysis like Mark does might lead you astray, whereas some other procedure wouldn't? It'd be interesting to give the time series to statisticians / model identifiers from different disciplines and give them all the same instructions and see what they come up with. Or maybe it wouldn't be interesting because they'd all say the same thing. ("Go away!?" Lol).

    I don't think this is proof of anything because perhaps Mark can demonstrate that his procedure works best on economic differential equations. Still it would be an indication (if you could indeed demonstrate something by this), that a wrong explicit or implicit model assumption might lead you astray?

    ReplyDelete
    Replies
    1. In physics there is definitely a system that a linear model would fail to capture ... the fractional quantum hall effect:

      http://www.nature.com/nphys/journal/v7/n8/images_article/nphys2008-f4.jpg

      This is the example in my mind. The effect is nonlinear -- it has terms at all orders of its independent variable.

      But step functions in general are problematic for linear analysis if they represent a real effect. Actually a good economic example is the "phase transition" in government spending in the US that happens in the 30s and 40s:

      http://www.ritholtz.com/blog/wp-content/uploads/2011/07/outlays-GDP.png

      Taking the first differences of that data to de-trend it will induce modulations that aren't real (because they happen at all orders of derivatives).

      Granger causality works fine in a world of first order linear models like log-linearized DSGE models.

      Let Dave Giles is the ultimate arbiter of these things :) ...

      http://davegiles.blogspot.com/2012/07/beware-of-tests-for-nonlinear-granger.html

      "Standard tests for Granger causality (or, more correctly, Granger non-causality) are conducted under the assumption that we live in a linear world."

      This is my objection to Mark's analysis. Non-linear signals of unusual size. You can't ignore QE.

      Delete
    2. BTW, I agree with your point. Not that I know much about Granger causality, but if the data violate the assumptions of your analytical methods, best not to use those methods. :)

      Delete
    3. Jason,
      The title of Dave Giles' post is "BEWARE OF Tests for NONLINEAR Granger causality."

      Dave Giles:
      "Accordingly, my advice is to be very sceptical of studies based on the Hiemstra and Jones test for nonlinear Granger causality."

      You get an F for reading comprehension.

      Delete
    4. The first line of the post is that:

      "Standard tests for Granger causality (or, more correctly, Granger non-causality) are conducted under the assumption that we live in a linear world."

      Which is the line I quoted. Giles is writing an introduction. He is saying that Granger causality is limited to the "linear world". That's why nonlinear tests of Granger causality (what the rest of the article is about) were invented. He then goes on to say that the nonlinear tests are suspect.

      It's as if I quoted "Call me Ishmael" and you said I get an F for reading comprehension because the book is called "Moby Dick" so obviously there's no "Ishmael".

      See e.g. here.

      Delete
    5. No, you get an F for thinking that a post that is skeptical of tests of nonlinear Granger causality is skeptical of the fact that we live in a linear world.

      https://orderstatistic.wordpress.com/2015/05/10/in-praise-of-linear-models/

      Delete
    6. So you're saying the jump in the monetary base by 300% in three steps is a linear change? It's just a small perturbation from equilibrium? Are you really saying that?

      House's article is talking about models where changes are on the order of a few percent. Part of the log-linearization process of a DSGE model is keeping only the terms that are small, like ΔY/Y ~ a few percent.

      That usually works for small changes. It's called a perturbative theory in physics. But I don't think in any sense we can say ΔMB/MB is small so that terms of order (ΔMB/MB)² are negligible. We have ΔMB/MB ~ 3 so (ΔMB/MB)² ~ 9. Not exactly a convergent series.

      But DSGE models don't really care about the monetary base ...

      I'll just leave you with the end of Chris House's blog post which you didn't read to the bottom:

      "There are cases like the liquidity trap that clearly entail important aggregate non-linearities and in those instances you are forced to adopt a non-linear approach."

      As the kids say, PWN.

      Delete
    7. I was just going to ask Mark if he agreed with he 2nd to last paragraph of the House article, and why or why not. You beat me to it.

      Delete
    8. Jason,
      "Are you really saying that?"

      I'm saying there's nothing really to be gained by nonlinear modeling. As Chris House points out, the differences are typically embarrassingly small, and even when they are noticeable, they usually go away with refinements in the nonlinear model.

      "I'll just leave you with the end of Chris House's blog post which you didn't read to the bottom:"

      Seeing as I just completed 15 posts on a VAR model that generates statistically significant results with no LM curve to speak of, color me totally unconvinced.

      PWN

      Delete
    9. Mark, so regarding my question then, that's a "no?"

      Delete
    10. You can't inappropriately use a linear model on nonlinear data with large changes, say that the linear model has produced significant results, and then use that significance as justification of the inappropriate use of a linear model on nonlinear data. That is totally illogical.

      If (ΔMB/MB)² is can be neglected, it'd have to be because its coefficient is negligible (i.e. there is something like a liquidity trap), not because (ΔMB/MB)² is small itself (because it's not).

      Chris House says linear models are usually cool and nonlinearities are not important. But then ends saying it's not always the case and the specific example he points out where nonlinearities are important is the liquidity trap. It's a willful misreading of his post to suggest otherwise.

      Delete
    11. Jason,
      You are getting this precisely backwards. The motivation for using nonlinear Granger causality tests is that linear Granger causality tests do not perform well in detecting nonlinear causal relationships, not that linear Granger causality tests will spuriously detect linear relationships when they are in fact nonlinear.

      The monetary base Granger causes a long list of variables in the age of ZIRP using the Toda-Yamamoto Granger causality test, which is a linear Granger causality test. Thus there is already strong evidence of a linear causal relationship, so nothing further can really be gained by testing for a nonlinear causal relationship, nor is it likely that anything further could really be gained by making the model nonlinear.

      If a quadratic term can be neglected that certainly does not imply that the linear coefficient is zero.

      The modern model of the liquidity trap relies on expected short term real interest rates which, apart from those versions used in modeling real exchange rates, do not even appear in my model.

      Delete
    12. Mark,

      "... not that linear Granger causality tests will spuriously detect linear relationships when they are in fact nonlinear"

      Granger causality makes the assumption of linearity; applying it to data with nonlinear relationships is 'garbage in garbage out'. If what you are saying was true, the assumption of linearity could be dropped.

      "If a quadratic term can be neglected that certainly does not imply that the linear coefficient is zero."

      Not sure where you got that. I am saying coefficients to all orders in ΔMB/MB are necessary because ΔMB/MB > 1.

      Attempting to extract that linear coefficient in data that has a linear and a quadratic coefficient with a linear fit is garbage. Take a quadratic function a t^2 + b t + c with {a, b, c} of order 1. Add some noise. Fitting a line to that gives you, say b1 and c1. Fitting a quadratic to that gives you a2, b2 and c2. The coefficients b2 and b1 have little to do with each other unless t << 1.

      You are trying to say that QE is a small linear perturbation to MB and I'm trying to point out that it isn't.

      Delete
    13. Jason,
      Nonlinear predictive power is treated as the residual of the linear predictive power when nonlinear (nonparametric) Granger causality tests and linear Granger causality tests are both performed. (Read the literature.) The existence of a nonlinear causal relationship does not imply that the results of linear Granger causality tests are invalid.

      Delete
    14. All right then. An order of magnitude change in the base causes a few percent increase in output. Makes sense to me!

      Let's double the base again and get the Fed back to its inflation target.

      Delete
  6. I was hoping this Jin:

    Jin, Sain, Pham, Spencer, Ramallo, (2001) "Modeling MR-Dampers: A Nonlinear Blackbox Approach", Proceedings of the American Control Conference Arlington, VA June 25–27

    Found here was the same as this Jin,

    But sadly, I don't think so (as far as I can tell). It would be great to see some published model identification expertise crossing the economic / physical system divide.

    ReplyDelete
  7. OK, another stupid question (surprised?):

    Why are econometricians so focused on unit roots? Do they ever encounter unit magnitude roots (of whatever multiplicity) at locations on the unit circle other than 1? If so, are they a concern, or not?

    ReplyDelete
    Replies
    1. Here is Dave Giles:

      http://davegiles.blogspot.com/2013/06/when-is-autoregressive-model.html

      Unit roots are a reference to the characteristic equation of a stochastic process like an AR process.

      Delete
    2. Thanks: Giles answers my question: complex roots on the unit circle are also a problem (which makes sense to me). I guess I'm using to dealing with what he calls the inverse characteristic function, so in that case, roots outside the unit circle are also a problem. That gives me a warm fuzzy. Here's Giles:

      This model will be stationary (i.e., dynamically stable) if the roots of the characteristic equation,

      1 - γ1z - γ2z2 = 0 ,

      all lie strictly outside the unit circle. Equivalently, the roots of the equation,

      z2 - γ1z - γ2 = 0,

      must lie strictly inside the unit circle.

      Delete
    3. For those of us who embrace chaos in human affairs, roots on the unit circle are part of the solution, not part of the problem. ;)

      Delete
  8. FWIW, my impression of the Smith-Sadowski disagreement was that there was a good bit of mutual misunderstanding. Not that I can clear much up, but I think that you are in closer agreement than is apparent.

    Since Mark has not replied, let me chime in a bit on spurious correlations. As I think both of you agree, data with the same kind of trends can produce them.

    One way of detrending time series data, as Mark pointed out, is to take differences. For instance, if you take the first differences of 1, 2, 3, 4, 5, . . . you get 1, 1, 1, 1. You can detrend any polynomial trend by taking the right order of differences. (Thanks to Isaac Newton. :) )

    But detrending by taking differences does not work for exponential trends. To do that you want to take the logs of the data first.

    Now, when I was a kid I learned to take the logs of financial data as a matter of course. I have been surprised to see that economists in general do not do that. For instance, they take arithmetic averages instead of geometric averages (or instead of taking the logs of the data first). But surely econometricians take the logs first.

    ReplyDelete
    Replies
    1. Bill, yes that makes sense. I never took a statistics course (you could say I'm doing that now), but I can grasp parts of this whole multi-post discussion (if I do a little remedial reading).

      Are you a statistician or econometrician? Are you familiar with the package EView (I think it's called) that Mark uses?

      I'm still missing part of the concept on exactly what sets econometrics apart from other disciplines concerned with stochastic processes. Would an HP filter (for example) ever be useful in an engineering or physics problem? Climate science? Geology?

      Delete
    2. No, I am neither a statistician or econometrician. I soaked up a good bit of statistics as a kid, enough to devise some statistical tests on my own and to tutor it. But I would not do so today, as I have not kept up with the latest developments or the software.

      Delete
  9. Jason:
    "[S]hould we abandon the level approach because it doesn't conform to a narrow view of what is "empirically acceptable"?"

    We should avoid estimating spurious regressions because...

    "i) Estimates of the regression coefficients are inefficient.
    (ii) Forecasts based on the regression equations are sub-optimal.
    (iii) The usual significance tests on the coefficients are invalid."
    Spurious Regressions in Econometrics
    Granger and Newbold (1974)

    http://wolfweb.unr.edu/~zal/STAT758/Granger_Newbold_1974.pdf

    Of course, if your goal is to be recognized as the world's leading advocate of time series derpometrics, be my guest.

    http://edmsauce.wpengine.netdna-cdn.com/wp-content/uploads/2014/05/derp.jpg

    ReplyDelete
    Replies
    1. Your reading comprehension could use a little work here ... I mention that the p-values are a useless measure for the cointegrated series.

      Are you trying to say that all scientific results must be estimated with linear regressions?

      Delete
    2. The two series are NOT cointegrated. Do a standard cointegration test and you'll see.

      Delete
    3. Mark, where did you find this?:

      "[S]hould we abandon the level approach because it doesn't conform to a narrow view of what is "empirically acceptable"?"

      I can't find where that came from.

      Thanks.

      Delete
    4. Tom,
      Look immediately under the last graph.

      Delete
  10. O/T, Jason, you're probably already aware of this but I thought it was interesting. I didn't realize some of the history there.

    ReplyDelete