Tuesday, July 31, 2018

Wage growth versus employment-population ratio


I saw a graph from Adam Ozimek via Brad DeLong today. I effectively reproduced it above using the Atlanta Fed's wage growth tracker data from 1994-present instead of ECI [1] because I have a dynamic information equilibrium model of that measure that I've been tracking since the beginning of the year. Actually, it ends up having a better because the ECI data is noisier and less frequently updated.

It looks remarkably like a stable relationship, right? However, I also have a dynamic information equilibrium model of the "prime age" employment-population ratio (EPOP). If you combine them, you get a Beveridge curve like relationship as discussed in my paper (and here). First, here are the models of wages and EPOP separately (click to enlarge):


And here's the graph that combines the models — and adds in data from 1988 to 1994 from the Atlanta Fed not in the first graph (yellow):


The gray grid lines show the behavior we'd expect in the absence shocks. The DIEM essentially predicts a much lower slope without shocks (mostly recessions, but wage growth saw a positive shock begin in 2014) and tells us the linear relationship observed at the top of this post is largely spurious. Regardless of which model is correct, this is a great example where the conclusions depend on the way you frame the data.

...

Update 6 August 2018

I thought I'd show what Ozimek's linear model would imply for wage growth given the prime age employment population ratio data. Here it is on it's own (red vs data in blue):


As you can see, it's reasonable but falls apart as you go further back. The dynamic information equilibrium model (green) is an improvement:


...

Footnotes:

[1] I have looked at both measures.

NGDP data and validating forecasts

The GDP numbers for Q2 came out last week and everyone seemed to be making a big deal about 4% real GDP growth — it was basically within the error. The same could be said of the 5% real GDP growth back in 2014 Q2 and Q3. I guess the HP filter in my head smooths out most of the quarter to quarter fluctuations as noise.

Anyway, I have several forecasts that made claims about future GDP measurements and it's time once again to mark to market. First is NGDP level and growth rate (click to enlarge):


Next are a couple of head-to-head comparisons of the Dynamic Information Equilibrium Model (DIE model, or DIEM) with NY Fed DSGE model and a VAR from the Minneapolis Fed. There's not much conclusive with the DSGE comparison that's just gotten underway:


However, while the data is consistent with both the VAR forecast and the DIEM, due to the lower error the DIEM is technically winning this contest (see the footnote here for a note about short term forecasting based on a recent data point):


And finally there is "Okun's law" (aka the "quantity theory of labor", see also here):


...

Update

Monthly (core) PCE inflation data came out today and here are the forecasts (red) compared to the post-forecast data (black) for continuously compounded annual rate of change (log derivative) and year-over-year change:




Monday, July 30, 2018

Does accelerating debt growth cause recessions?

As I mentioned in my views on the DSGE model debate (as well as my more general macro critique), I have an issue with what I consider to be a logical leap in the discussion of recessions. Noah Smith writes about this "emerging post-crisis wisdom" in his latest Bloomberg View article:
All of these papers have one thing in common — they use debt to predict recessions years in advance. That fits with the emerging post-crisis wisdom that problems in credit markets are the source of both financial crashes and the ensuing economic slowdowns.
While I have no problem studying these particular models of macroeconomies, the evidence provided largely seems to suffer from post hoc ergo propter hoc reasoning ("after this, therefore because of this"). The presentation from Gennaioli and Shleifer that Noah is discussing (that cites these papers) notes that "[r]apid credit growth is associated with higher risk of a financial crisis" based on Schularick and Taylor (2012). It shows a graph that growth in credit normalized by GDP grows increasingly quickly (accelerates) in the run-up to a financial crisis. In fact, I see a similar effect in the debt data (below). However, consider wage growth — it also accelerates in advance of a recession:


You could possibly think of a model where wage growth causes recession because e.g. the Fed reacts to wage growth with higher interest rates that eventually bring on a recession. However this data also supports the hypothesis that wages just naturally accelerate, but that acceleration is cut off by a recession. A similar view works just as well for debt (FRED series TCMDO = All Sectors; Debt Securities and Loans; Liability, Level, previously called Total Credit Market Debt Owed, hence the label):


Until the 1970s, the acceleration in debt to GDP (and even its growth rate) was consistent with zero. Various de-regulatory policies lead to our more modern system of credit as well as dynamics similar to wage growth: acceleration in debt to GDP growth cut off by financial crises. The model shows two major events: the S&L crisis of the late 80s and the Global Financial Crisis (GFC). The former was a slow rolling crisis that began in the mid-80s, but wasn't associated with a recession until years later. The latter debt growth rate collapse actually follows the Global Financial Crisis (lending more credence to the hypothesis that the recent growth in credit was cut off by the recession.

Regardless of the specific model, the idea that increasing credit growth causes recessions is not robustly supported by the data. In the case of the GFC it seems that it's possible debt growth was cut off by the crisis, much like how accelerating wage growth is cut off by recessions. If that hypothesis is true, then per the graph in the presentation we'd also see an increase in the rate of credit growth — and concluding that the rise in credit growth caused the crisis would be an example of post hoc ergo propter hoc reasoning gone awry. Just because B comes after A does not mean A caused B [1].

I don't have any particular issue with the Minksy-like view of asset bubbles leading to recessions — I have speculated myself that we might be in an "asset bubble era" where recessions are caused by deflating bubbles (dot com, housing). But that picture also sees the recessions before the 2000s as "Phillips curve" recessions more closely related to the labor market. What might be confounding for any theory is the possible recession in the 2019-2020 time-frame: there appears to be no asset bubble in GDP growth (latest bump in Q2 notwithstanding), and labor markets appear healthy. This may even explain a bit of why everyone seems to think we're in an economic boom with no signs of recession despite a flattening yield curve. In the end, I am open to any possible explanation here as I think this area (recessions, financial crises) is not well-understood by anyone. I just feel there is a lot of jumping to insufficiently supported (not necessarily unsupported) conclusions that is rooted in our human bias toward our own agency as well as moral biases about "debt".

...

Update

Here is the dynamic information equilibrium model of the data from Credit-Market Sentiment and the Business Cycle by David Lopez-Salido, Jeremy C. Stein, and Egon Zakrajsek (2015) that claims to get a two year drop on downturns:


It is plausible this gives us advanced warning about the 1974, 1991, 2000 and 2008 recessions (it misses the 1980s recessions and every recession before '74). Given the uncertainty, we can't be conclusive about the 2000 or 2008 recession (the data doesn't drop by more than the uncertainty band until roughly simultaneously with the recession indicator). The data is noisy (and annual) making any precise determination of timing uncertain. But again, plausible. But also consistent with the hypothesis that increasing debt growth is sometimes cut off by recessions.

...

Footnotes:

[1] Although you can make a good case that if A comes before B, then B is unlikely to have caused A.

What do we mean by ignorance?

Because I was in a sense challenged to read Quantum Economics: The New Science of Money by David Orrell (see below), I bought the book and read it. A review of sorts appears below. However, a reference in it lead me to a paper by Linden et al (2008) that I had not seen about quantum entanglement being a source of entropy increase. This made me realize that we need to think about what we mean by ignorance of the micro state (i.e. agents) in information equilibrium. Let me begin by setting up the issue in physics.

There is a camp of physicists as well as a sect of popular understanding [0] that equates entropy with a subjective ignorance of the micro state. Because we observers do not know the location and momentum of every molecule in a cup of coffee, we posit the principle of indifference as a kind of Bayesian prior. As they put it in Linden et al
Leave your hot cup of coffee or cold beer alone for a while and they soon lose their appeal - the coffee cools down and the beer warms up and they both reach room temperature. And it is not only coffee and beer - reaching thermal equilibrium is a ubiquitous phenomenon: everything does it. Thermalization is one of the most fundamental facts of nature.
But how exactly does thermalization occur? How can one derive the existence of this phenomenon from the basic dynamical laws of nature (such as Newton’s or Schrodinger’s equations)? These have been open questions since the very beginning of statistical mechanics more than a century and a half ago.
One - but by no means the only - stumbling block has been the fact that the basic postulates of statistical mechanics rely on subjective lack of knowledge and ensemble averages, which is very controversial as a physical principle. ...
They then proceed to show that a subsystem eventually becomes entangled with the rest of the system resulting in fundamental (i.e. quantum) ignorance (i.e. probabilistic knowledge) of the subsystem rather than subjective (i.e. our) ignorance. Here's another paper that shows effectively the same thing by Peter Riemann, he being somewhat more agnostic about the underlying probabilistic physical principle (ignorance, indifference, ergodic, whatever).

However, Cosma Shalizi has a paper from several years ago that states the subjective ignorance (i.e. maximum entropy as a logical inference method) is a fundamentally incoherent approach that actually leads to a reversal of the arrow of time because of some basic properties of information theory. Briefly, your Bayesian updating can only increase your subjective knowledge of the microstate so you either have to a) have a weird form of Bayesian updating that forgets its past or b) give up the assumption that entropy represents subjective ignorance. But there may be a hole in this argument that I'll discuss below (because I found it entertaining).

My personal approach is to see maximum entropy and statistical mechanics as effective theories. Where many effective theories are appropriate at one physical scale (say, at short distances, or high energy), we can see statistical mechanics as an effective theory when N >> 1 where N is the number of subunits (agents, microstates, etc). That said, whatever its assumptions about the microstates (e.g. subjective ignorance, ergodicity), those assumptions do not necessarily apply to the microstates. In economics language, statistical mechanics is an "as if" theory: ensembles of molecules behave as if our subjective ignorance of the microstate is physically relevant (or as if the system is ergodic, etc).

That's why I think of information equilibrium as an effective description of economic systems. However, there are also some really important distinctions regarding the thermodynamic issues above. For one, there is no economic "arrow of time". We do not require ΔS > 0, and in fact many economically relevant phenomena may well be due to violations of "economic entropy" increasing (see also the last sections of my older paper). While the molecules in a cup of coffee can't decide to try and enter a correlated momentum state (i.e. your coffee suddenly moves up and out of the cup), humans can and will enter a correlated economic state (e.g. selling off shares in a panic). In the information equilibrium approach, this is called non-ideal information transfer.

It's not our subjective ignorance of the employment states of people in the labor force that leads to the description of the unemployment rate (as described in my more recent paper), but rather our fundamental ignorance (i.e. probabilistic knowledge) of what human beings think, how they interact, and how they reach decisions. I frequently put the postulate as a translation of the ergodic hypothesis: agents fully explore the opportunity set (state space), but that's simply an equivalent postulate that leads to the same effective economic theory.

* * *

I was thinking about Cosma Shalizi's argument, and realized that it's a Newtonian argument because it does not account for causality. In fact, it gives an additional way out of Shalizi's trilemma because Bayesian updating must be causal (i.e. you cannot update faster than L/c if L is the size of your thermodynamic system). Large segments of time evolution of phase space are inaccessible to the Bayesian updater because they exist "elsewhere" in a space-time diagram [1]:


For system a foot across (30 cm), this minimum update time is about a nanosecond. The average air molecule travels about 500 m/s at room temperature and atmospheric pressure. It will travel about 0.5 μm in a nanosecond. However, the average spacing between air molecules at standard temperature and pressure is a few nanometers, meaning during that minimum update time you will have an uncertain volume containing on the order of 10^7 other molecules (each with their own uncertainty volume). That's plenty of opportunity to increase any observers subjective ignorance more than the Bayesian updates can reduce it. Interestingly, the speed of the molecules only drops as √T, so going to one-quarter the temperature only reduces the speed by half (so the volume goes as √T³) while the volume of a gas drops as T (ideal gas law). This means your uncertain volume will actually increase relative to the volume per molecule, so the Bayesian updating gets worse at low temperature — and eventually you'll have quantum effects described above kick in!

*  *  *
Jason Smith: The galling thing is that Orrell starts with a quote about making predictions and saying economics is bad science, but then closes promoting a book that (based on the source material in his papers) won't make any predictions or engage with empirical data either. 
David Orrell: That sounds like a prediction based on little data, given that you haven’t read the book. 
Jason Smith: My hunch is based on ... papers listed on a promo site for the book. If the book is wildly different from the way it is presented there, then maybe it will engage with data.

Well, I went and bought this book and read it — sure enough it doesn't really engage with data or make predictions (there is no economic data the "quantum economics" explains, much less explains better than other approaches). It's essentially what you'd expect from the material on the promotional site linked above.

I will be extremely charitable and say that if you think of "quantum economics" as an extended metaphor bringing together disparate but common misconceptions of the economic world around us into one "quantum" mnemonic, then this book might interest you (I like the metaphor of wave function collapse for pricing assets, i.e. the price your house doesn't really exist until a transaction, i.e. until you sell it). Generally economists will sigh at the kind of macro criticism you can find in many places for free on the internet. Physicists will quickly become exasperated at the fragile quantum metaphors (e.g. the word "entanglement" can seem to mean quantum entanglement, quasi-particles like Cooper pairs, or simply strongly interacting in the space of a single paragraph). Scientists in general will balk at the elements that border on new age-y mysticism.

But generally, it's good to get the idea across that it's kind of a Sorites paradox when an aggregation of individual people becomes "an economy" for which a possible solution is that you kind of have to take everyone as part of a single emergent "wavefunction". I personally tend to think of it in the same way a small number of molecules doesn't really have a temperature (i.e. macro observables are weakly emergent), but (since I am not omniscient) simply starting with a strongly interacting ("entangled") state could be the basis of a future theory.

The "weird" aspects of quantum theory (entanglement, fundamental uncertainty and measurement, non-commutative operators, discrete jumps) are tied to economic concepts (debts and credits/spending and income, measuring an asset's price requires you to interact with it, non-commutative decision-making, purchases come in discrete lumps). Now, you don't need to use quantum mechanics to use these metaphors (all either can be made without references to quantum systems or are non-specific [2]), but the extended quantum metaphor does lump them all together. This may or may not be useful depending on whether you're convinced these specific economic elements with quantum metaphors are in fact the key economic elements that will lead to eventual understanding (I think there are more basic issues with what passes for empirical approaches to macro coupled with a relative over-emphasis on money and under-emphasis on labor). There is some mention of using quantum mechanics-style computations in quantitative finance (I personally studied some path integral computations in this vein when I considered becoming a 'quant' on Wall Street).

In the end, the book (obvious in its subtitle "The New Science of Money") is a small-m monetarist view that drops in some MMT shibboleths. I've expressed my general negative view on this kind of "money is all-important to understanding economics" view before (maybe "quantum economics" could become a new "halfway house" for recovering Austrians per Noah Smith). I have no significant disagreements with any of the political sentiments expressed (I mean, the reason I was reminded to read this book was a tweet from UnlearningEcon), and if you want to read some long-form bashing of some choice targets you'll probably get a kick out of this book.

*  *  *

Footnotes:

[0] I really don't know how widespread these ideas are, but I feel like I might have seen them on the internet and their existence is implied (I mean, for there to be papers on this someone has to be positing the subjective view even as a hypothesis to be rejected). As I don't really care to do a literature search to find defenses of the subjective view, I'll just leave it at that.

[1] I've often speculated that the lack of causal information about the region outside the observer's light cone is connected to the probabilistic nature of quantum mechanics. Most quantum systems (double slit interference, Hydrogen atom) involve accelerations that would put the system not only in the "elsewhere" section of a space time diagram but also crossing a "Rindler horizon" and exposure to a non-trivial Unruh effect. If we think of causal information being encoded on horizons (holographic principle), we might lose information not related to conservation laws. All of this smacks of black hole thermodynamics. What if the information loss ("thermalization" by Unruh temperature) crossing these horizons in quantum systems is effectively measured by the information loss between the input distribution and the output distribution? That is to say quantum wavefunctions represent the information loss in the distribution of the final states measured with something like the KL-divergence? I've been distracting myself with these questions lately, but I'm nowhere closer to an answer than I have been for years.

[2] As mentioned earlier, the entanglement metaphor is the least convincing in terms of what it means in physics. The metaphor really doesn't require the non-locality that is central to what entanglement it. Without non-locality, entanglement is just quasi-particle formation (e.g. Cooper pairs) or other strongly interacting an/or non-perturbative dynamics.

Discrete jumps happen in the energy of classically vibrating strings (that's one reason Schrodinger's equation was more rapidly accepted than Heisenberg's more earlier but more abstract matrix mechanics — they're actually equivalent).

The uncertainty principle is nothing more complicated than a short wave train (pulse) has a definite location but uncertain wavelength (i.e. momentum), while a long continuous wave has a more definite wavelength but uncertain position. It's because the momentum and position operators are conjugate variables in the Hilbert space in the same way frequency and time are conjugate variables for Fourier transforms.

Non-commutative operators can be demonstrated with ordinary objects. If Z is up-down and Y is left-right, hold your phone out with the screen facing you. Rotate 90° around the Z-axis and then 90° around the Y-axis (you'd be looking a the bottom of your phone). Start over. Now, rotate 90° around the Y-axis and then the Z-axis (you'd be looking at the side of your phone). These are two different states (and represent the non-commutativity of the group SO(3)).

Wavefunction collapse is really more of a model for undergraduates; the practitioners who actually study this area are more likely to think in terms of decoherence — but the basis of the Born rule and its connection of the wavefunction to probability are an unresolved question in fundamental physics. This is to say the metaphor imports a mystery to try and explain economics. Usually, the point of a metaphor with physics is that you are importing a resolved and well-defined (and empirically validated)

Feynman said that everything about quantum mechanics is in the double slit experiment (there are entanglement experiments using interference), and as such there is no real use of quantum interference in Orrell's book. You can easily adapt the machinery of quantum physics to economics (I applied some quantum field theory to this discussion with Jo Michell). As a grad student, the rule of thumb was that quantum effects came with loops in your Feynman diagrams (the double slit experiment is the simplest coordinate space Feynman diagram with a loop (the famous Feynman diagrams you're likely familiar with are thought of in momentum space)).

Now Orrell goes to great lengths to pre-but (as opposed to rebut) physicists "policing" their "turf", but if you're going to use the terms and cite physicists about their meaning and still get it wrong I feel it is part of my public duty (having received my Phd on public funds) to point out or correct inaccuracies. I get the impression he gets a lot of physicists complaining about these metaphors, but maybe he should consider the reason is that the metaphors are inaccurate or misleading.

Thursday, July 26, 2018

Rental vacancy data versus forecast

I put together a dynamic information equilibrium model of rental vacancy rates over a year ago (you can think of it as an "unemployment rate" for rental housing). I haven't updated it in awhile, so here's the latest post-forecast data (black):


Although it is within error, this does show a bit of the "overshooting" that happens when you estimate the shock parameters for an incomplete shock (I talk about it here, another example is here). Here's a zoomed-in version with an extended forecast horizon (in the absence of a shock):


Thursday, July 19, 2018

DSGE Battle Royale: Christiano v. Stiglitz

Lawrence Christiano, Martin Eichenbaum, and Mathias Trabandt have written (or I guess re-written) a review article of sorts on DSGE models titled On DSGE Models. Gone is the "dilettantes" language of the previous incarnation with the same title (see e.g. Jo Michell here, or Noah Smith here; the link to the original [pdf] now seems to point to a different version). While there was lots of criticism of the original, I actually came to the defense of the idea of numerical experiments in that previous paper (which was also removed). In the new version, it still seems they were "triggered" by Stiglitz's paper Where Macroeconomics Went Wrong (and the 2017 NBER working paper version Where Modern Macroeconomics Went Wrong) — declaring Stiglitz to be representative of criticism and then declaring his criticism as "not informed". 

I don't really have any skin in this particular game: I haven't invested a career in DSGE models, nor are the existence of DSGE models a barrier or threat to acceptance of my own weird ideas (e.g.) — the far larger barrier is not being an economist. I've used DSGE models as benchmarks for the performance of my own models. In fact, awhile ago I took the time to build the basic three equation "New Keynesian" DSGE model in terms of the information-theoretic approach I've been working on. I've both criticized and defended DSGE — likely because much of the commentary seems either hyperbolic or irrelevant.

While that hopefully covers my biases, there's the additional issue of a physicist making declarations about social sciences that often irritates their practitioners. However, in this particular case I am really only leveraging my long experience with mathematical modeling of systems where the data about the macro observables as well as the underlying degrees of freedom is unclear and there is no established tractable theory to answer all the questions. I've also been studying economics for about 6 years now [1] — not just reading blogs, but e.g. keeping up with (some of!) the literature, and working the exercises in textbooks [2]. But I do think an outsider perspective here is useful, not as some impartial judge but rather to break down the us vs. them/DSGE-as-a-four-letter-word status quo that's developed post-Great Recession.

*  *  *

With all of that out of the way, the "debate" over DSGE models is, in a word, maddening. A concise encapsulation is readily available in the two papers' Christiano et al (2018) and Stiglitz (2017) discussion of financial frictions. From Stiglitz's abstract we find he believes:
Inadequate modelling of the financial sector meant [DSGE models] were ill-suited for predicting or responding to a financial crisis ...
Emphasis mine. Christiano et al characterizes this as:
Stiglitz (2017) asserts that pre-crisis DSGE models did not allow for financial frictions or liquidity-constrained consumers.
Again, emphasis mine; "did not allow" and "inadequate" do not mean the same thing so this isn't a charitable reading from Christiano et al. The argument is then made that DSGE models did allow for financial frictions:
Carlstrom and Fuerst (1997) and Bernanke et al. (1999) develop DSGE models that incorporate credit market frictions which give rise a “financial accelerator” in which credit markets work to amplify and propagate shocks to the macroeconomy.
But the maddening part is that earlier in their paper Christiano et al make a similar point to Stiglitz:
At the same time, the financial frictions that were included in DSGE models did not seem to have very big effects. Consider, for example, Bernanke et al. (1999)’s influential model of the financial accelerator. That model is arguably the most influential pre-crisis DSGE model with financial frictions. It turns out that the financial accelerator has only a modest quantitative effect on the way the model economy responds to shocks, see e.g. Lindé et al. (2016). ... Their key finding is that neither [financial friction] model substantially improves on the performance of the benchmark model, either in terms of marginal likelihoods or impulse response functions. So, guided by the post-war data from the U.S. and Western Europe, and experience with existing models of financial frictions, DSGE modelers emphasized other frictions.
Financial frictions didn't seem to have much of an effect, so other frictions were "emphasized". This isn't so much of a challenge to Stiglitz's criticism as an explanation of it. The "influential" way to incorporate financial frictions in DSGE models didn't produce large effects, so they were de-emphasized in the years before the biggest financial crisis and recession in recent years struck. This is effectively what Stiglitz says:
Assumptions matter. All models make simplifications. The question is, as we have said, what simplifications are appropriate for asking what questions. The danger is that the simplifications bias the answers, sometimes in ways that we are not aware of. The DSGE model ignored issues that turned out to be key in the 2008 crisis ...
Emphasis in the original.

However.

I am not sure we have sufficiently convincing evidence that financial frictions are the key to the 2008 financial crisis and recession. This is one of those things that I get lots of grief on the internet for saying because everyone seems to think it's obvious that the financial crisis (housing bubble, shadow banks, over-leverage, the Fed) led to the Great Recession — that the problem of 2008 is trivially solved by inspection.

This is where I am with Christiano et al: more research needs to be done here and the mechanisms are not cut and dried. But while DSGE models might be such an avenue, the financial friction models (e.g. Del Negro et al [pdf]) don't seem to be empirically very accurate either for how complex they are:


That's the model performance with more than 20 parameters?

Going beyond financial frictions to the general performance of DSGE macro modeling, the model results [3] of Christiano et al (2016) [pdf] shown in the paper aren't very good either for how many parameters the model has (the blue curve in Figure 2 reproduced below has 26). Since Christiano et al are reporting this as a decent enough explanation of the (VAR) data to support the conclusion that sticky wages are necessary, we can surmise they think their model (CET 2016) is qualitatively reasonable. In the original paper, the model is presented as a "good description of the data" [4] in its own right.


Aside from the issue in footnote [3] with "VAR data", the blue curves are not even a qualitatively accurate model of the black curves even accounting for the gray bands. On a scale from 1 (Steve Keen) to 10 (Max Planck), I give this qualitative agreement a 2 or a 3. Let's focus on the real GDP impulse response:


I show the original alongside my function fits (using a χ² distribution function and a sum of logistic functions) to the curves. These fits help get an estimate of the number of scales involved in describing the data. The DSGE output basically has 2: the slope at zero and the length of the approach to zero at infinity. The VAR has at least two more: the length of the approach to zero, and the frequency of the oscillation as the function (eventually) returns to zero. I couldn't figure out a description that took fewer than six parameters (a sum of logistic functions, like I use in the dynamic information equilibrium models). The DSGE output is qualitatively less complex — it could be captured with two degrees of freedom, which makes sense as the model has 26 parameters with 12 observables implying about 2 degrees of freedom per observable. The boundary condition at zero does not seem to be the same: the DSGE output obeys f(0) = 0, while the VAR seems consistent with f(0) = 0 plus f'(0)= 0. This may not seem like much, but it is a major indicator. While the model may look qualitatively like the VAR using the naive approach of "hey, it goes up then comes down", there's a lot more to it than that [5]. Except for the inflation curve (which just seems to get the scale wrong), none of the DSGE model observables look even qualitatively like the data.

And this is the main issue with DSGE models for me: they are incredibly complex for how poorly they even qualitatively describe the data. Unfortunately, this is an issue that not only Stiglitz but the entire edifice of mainstream and heterodox economics taken together seem ill-suited to address. Stiglitz (2017) and via citation Korinek (2017) seem to cede that these DSGE models match moments, but not the right ones and not in the right way (sorry for the long quotes, but wanted to get full context because I am drawing a conclusion that isn't explicitly stated).

Stiglitz (2017)
In the end, all models, no matter how theoretical, are tested in one way or the other, against observations. Their components—like the consumption behavior—are tested with a variety of micro- and macro-data. But deep downturns, like the 2008 crisis, occur sufficiently rarely that we cannot use the usual econometric techniques for assessing how well our model does in explaining/predicting these events—the things we really care about. That’s why, as I have suggested, simply using a least-squares fit won’t do. One needs a Bayesian approach—with heavier weight associated with predictions when we care about the answer. Comparing certain co-variances in calibrated models is even less helpful. There are so many assumptions and so many parameters you can choose for your model, many more than the number of moments you can get from the data; so being able to match all moments in the data does not tell you that your assumptions were correct, and thus does not provide much confidence that forecasts or policies based on that model will be accurate.
Korinek (2017)
Second, for given detrended time series, the set of moments chosen to evaluate the model and compare it to the data is largely arbitrary—there is no strong scientific basis for one particular set of moments over another. The macro profession has developed certain conventions, focusing largely on second moments, i.e. variances and covariances. However, this is problematic for some of the most important macroeconomic events, such as financial crises, which are not well captured by second moments. Financial crises are rare tail events that introduce a lot of skewness and fat tails into time series. As a result, a good model of financial crises may well distinguish itself by not matching the traditional second moments used to evaluate regular business cycle models, which are driven by a different set of shocks. In such instances, the criterion of matching traditional moments may even be a dangerous guide for how useful a model is for the real world. For example, matching the variance of output during the 2000s does not generally imply that a model is a good description of output dynamics over the decade.
Both of these criticisms essentially say outside big shocks like the 2008 crisis the DSGE models match the data. It's possible the reason Stiglitz and other economists can't set the bar higher for matching the data is because no one has models that match the data better. As far as I can tell, that seems to be true: no other modelling paradigm in macroeconomics produces anything as empirically accurate as DSGE models outside of major shocks. Well, except AR processes [pdf] (sVARs) — but that's basically giving up and saying it's random.

Now I've often heard the counterargument that economic systems are social systems, so we can't expect strong agreement with the data (often characterized as "physics-level" or "hard science" agreement with the data). But I'm not arguing for precision economics here (qualitative agreement with the basic functional forms of the data would be great!), and the existence of VARs that do better are an immediate counterexample. A model that led to the VAR parameters would be a massive improvement in the agreement with the data. Otherwise we should really abandon research for nihilism and just go with the VARs. 

This creates an unfortunate state of affairs where the one true way to judge a model (asking how well it describes data) isn't being fully employed. It's like Christiano is trying to sell Stiglitz a 1972 Ford Pinto and Stiglitz is complaining that Ford Pintos don't come standard with electronic fuel injection. Christiano retorts that he changed out the entire engine after an accident in 2008, but no one is talking about the fact that the car doesn't even run despite Christiano's assurance that it does using some blurry photographs that may or may not be this specific Ford Pinto (or even a Ford Pinto at all). But Stiglitz took the bus to get here, so can't even ask whether it runs better than his current car [6].

The various other issues all seem to be outgrowths of this basic problem. Do you include X or not? Does including X improve agreement with the data? Do we even have agreement with the data without X?

The truth is that a large enough system of linear equations could easily describe macro data to a given level of accuracy. At their hearts, that's all DSGE models really are. It's a sufficiently general framework that it is not inconceivable that it could describe a series of macro observables. I don't think Stiglitz's "not even a starting point" (burn it all down) view is constructive; however I do think there's a need to get much more creative with the elements. The lack of agreement with the empirical data is so persistent, it leads me to suspect there are one or more core assumptions that might be fruitfully replaced (the Euler equation seems to be a good candidate) [7]. However, if practitioners like Christiano et al fail to look beyond specific critics like Stiglitz (taking him as representative), and continue to believe the output of DSGE models is qualitatively reasonable, it's going to be a long time before those core assumptions get questioned.

*  *  *

PS

There are a few other things I wanted to mention that didn't fit in the main narrative above. First, Christiano et al (2018) seems to ignore the criticism that the ubiquitous modeling elements (Euler equation, Phillips curve) aren't supported empirically. To be fair, Stiglitz (2017) also seems to ignore this.

There was discussion of nonlinearities in both papers, but this one is called entirely for Christiano et al because there is zero evidence that log-linearizing DSGE models has a strong effect (actually evidence to the contrary) or that nonlinear dynamics has any observable impact on macro observables that differs from a linear system with stochastic shocks (see here for an extended discussion).

Regarding unemployment, Stiglitz repeats a refrain I frequently see in Op-Eds, blogs, and academic work referring to unemployment that stays high (or even at a specific level, such as Mortensen and Pissarides (1994) as discussed here). I've never understood this; casual inspection of the unemployment rate data for the US shows no level at which the unemployment rate remains for very long, and the entire post-2008 period shows a nearly constant rate of decline. Yet Stiglitz makes calls in the aftermath of 2008 for models that answer "why the effects of the shocks persist, with say high levels of unemployment long after the initial shock" and sees "simple models [that] have been constructed investigating how structural transformation can lead to a persistent high level of unemployment" as an improvement in the understanding of economic systems despite "persistent high level[s] of unemployment" being empirically false. We definitely would want the rate to fall faster, but a model that says unemployment can stay high is clearly rejected by the data (or at best is a hypothetical macroeconomic scenario we haven't encountered yet). It's another case where I get a lot of grief on the internet when I point this out, and it makes no sense to me why people strongly believe something that is clearly at odds with the data ... oh, right.

*  *  *

Update 20 July 2018

I made a few edits (added a link to unemployment rate data, added "in the aftermath of 2008").

Footnotes:

[1] My introduction to economic theory was through prediction markets. I used information theory to derive some potential metrics for their efficacy. However, the tools proved to be substantially richer than just applications to prediction markets. Since then, I've been exploring the usefulness of the approach to more general micro- and macro-economic questions on this blog.

[2] It feels a bit like being a grad student again, which I find fun because I am weird. I've actually been considering going back to grad school for Phd in Econ.

[3] They're compared to "VAR data" ("We re-estimated the model using a Bayesian procedure that treats the VAR-based impulse responses to a monetary policy shock as data.") that I want to discuss in a future post because one is not comparing to the actual data but rather a different model of the data that constrains the output to be similar to DSGE models. And they're still not getting very close.

[4] The model shown in the "general wage rule" variant in Christiano et al (2016). The "simple wage rule" is considered to capture "the key features of the general wage rule". In the next paragraph, the simple wage rule is considered "a good description of the data".

[5] I sometimes feel I might know how a birder feels when they hear someone say those two seagulls are the same. No, that's L. a. megalopterus and that's L. delawarensis. To me, these DSGE model outputs look entirely unlike the "VAR" data as to be separate species of curves.

[6] Stiglitz does point to a model in his paper, but there does not appear to be any empirical work associated with it at all. Happy to be corrected, but it just looks like a statement of a bunch of assumptions.

[7] Has anyone done a term-by-term analysis of e.g. VARs and DSGE models? A common technique in physics is to write down a general model with unknown coefficients (analogous to the VAR) and then compare that term by term to some theory-based model (analogous to the DSGE) to see how the coefficients and terms match up (and their values, if available). For example, a general linear equation (no lags for simplicity) with three variables would look like:


(1) a x + b y + c z + d = 0

Let's say some theory says


(2) (1/α) x + σ² = k

This tells us that a = 1/α, b = 0, c = 0 and d σ² − k. If e.g. the fit of (1) to the data says c ≠ 0, or that a ≈ 2/α, you have some specific directions your research can go. I've never seen this in DSGE models, but that could be my own finite exploration of the literature. Hypothetically, if your VAR says current inflation has zero dependence on the lagged interest rate (i.e. the coefficient of r(t-1) is unnaturally small) while it is theoretically supposed to be large (i.e. order 1) in your DSGE model, that would point to the core assumption of monetary policy controlling inflation through interest rates being questionable.

I actually think this might be a major benefit of machine learning in econ. While it is true that some modelling priors enter into how you set up your machine learning problem (i.e. implicit theory), machine learning works reasonably well to drop the irrelevant degrees of freedom you chose to add based on your biases and implicit theorizing. Machine learning tends to find a low dimensional representation of your data, which sometimes means elimination of degrees of freedom. If short term interest rates are irrelevant to understanding the macroeconomy (again, hypothetically, as an example of a core assumption in DSGE modeling), a machine learning approach will more readily throw it out than simple regression (at least in my experience — this isn't a robust theory result).

Thursday, July 12, 2018

One purpose of information theory


Information theory turns 70 this year (this month!); Claude Shannon's famous paper A Mathematical Theory of Communication [pdf] was published in 1948 and has been lauded as one of the foundations of the "digital age". One of the first things it did was allow engineers to design communication networks that worked in the presence of noise. As a subject, it's far more general, though.

Unfortunately, Shannon's entropy, often referred to as information entropy, and then shortened to just information, is often confused with the colloquial term "information". This brings connotations of data, of knowledge, of specific sets of symbols with specific meaning (the letters C A T representing the label of an animal in English). But as Shannon and Weaver said in their book from a year later, we must not confuse information theory information with meaning. This collision of terminology is amplified when it encounters economics, where information economics deals specifically with the economic value of meaningful information.

I believe the best way to understand this difference is to understand what information theory illuminates. Information theory gives us a way to quantify concepts when we have limited knowledge about what underlies those concepts. For example, information theory is essentially a more general framework that encompasses thermodynamics in physics — thermodynamics is the science of how collections of atoms behave despite not having remotely enough knowledge about the trillions upon trillions of atoms to make a model. We give up talking about what a single atom in a gas is doing for what an atom could be doing and with what probability. We cease talking about atoms are doing and instead talk about the realm of possibilities (the state space) and the most likely states.

Now thermodynamics is a much more specific discipline than information theory, not in the least because it specifies a particular relationship between energy and the (log of the) size of the state space through the Boltzmann constant k (where thermodynamic entropy S is related to the state space via S = k log W where W counts the size of that state space). But the basis of thermodynamics is the ability to plead ignorance about the atoms formalized and generalized by information theory.

Information theory helps us build efficient communications systems because it allows us to plead ignorance about the messages that will be sent with it. I have no idea which sequence of 280 characters you are going to tweet, but information theory assures us they will be faithfully transmitted over fiber optic cables or radio waves. And if I am ignorant of the message you send, how can its meaning be important — at least in terms of information theory.

Maximum entropy methods in e.g. earth science let us plead ignorance about the exact set of chemical and physical processes involved in the carbon or water cycles to estimate the total flux of thermodynamic energy in the Earth-Sun system. Maximum entropy lets us program a neural network to identify pictures of cats without knowing (i.e. setting) the specific connections of the hundreds of individual nodes in the hidden layer — I mean, it's hidden! In a similar fashion, I've been trying to use information theory to allow me to plead ignorance about how humans behave but still come up with quantitative descriptions of macroeconomic systems [1].

But that's why the information in information theory isn't about meaning. One purpose of information theory is to give us a handle on things we don't have complete knowledge of (so a fortiori can't know the meaning of): the motions of individual atoms, the thousands of node connections in a neural network, the billions of messages sent over the internet, or (maybe) the decisions of millions of humans in a national economy. If we're pleading ignorance, we can't be talking about meaning.

...

Update 13 July 2018

First, let me say this blog post was inspired by this Twitter thread with Lionel Yelibi. And second, there's a related post about Cesar Hidalgo's book Why Information Grows and his "crystals of imagination". Hidalgo has a parable about a wrecked Bugatti Veyron that tries to get the point across that the value of objects is related to the arrangement of atoms (i.e. specific realizations of state space). However, in that particular case the value information is not entirely encoded in the atoms but also (and maybe even primarily) in human heads: someone who didn't know what a Bugatti was would not value it in the millions. They might still value a car closer to tens of thousands of dollars (although even that is also based on my own experience and memory of prices).

...

Footnotes:

[1] In a sense, information equilibrium could be seen as the missing concept for economic applications because it gives a possible way to connect the information in two different state spaces which is critical for economics (connecting supply and demand, jobs with vacancies, or output with input).


Wage growth (and finishing out bitcoin)

The Atlanta Fed wage growth data has been updated for June 2018, and is pretty much in line with the dynamic information equilibrium model I've been tracking since February:


...

Post script/post mortem

Also, while not useful for forecasting, the bitcoin exchange rate model did provide a decent post hoc description of the data over the past several months but getting the average rate of decline a little high (using about −2.6/y, almost exactly 100 times the dynamic equilibrium depreciation rate of gold of −0.027/y when the actual empirical decline from the 18 December 2017 peak to the most recent 11 July 2018 measurement here was only −1.8/y):


It's possible there's another shock in the data [1] earlier this year, but as I said in this blog constantly adding shocks (even if they're really there) doesn't really validate the model. We'd need to validate the framework on other data and use that validity to motivate an unstable bitcoin exchange rate with tons of shocks.

Update

Here's what happens when you include that shock:


Note that in the "proper" frame (a log-linear transform that removes the dynamic equilibrium decline), the stair-step appearance (noted here and in my paper) is more obvious:



...

Footnotes:

[1] We could motivate this shock centered in April 2018 further by noting that the rate of decline from the 5 May 2018 peak to 11 July 2018 was −2.8/y and the rate of decline from 18 December 2017 to 18 March 2018 was −3.2/y meaning the lower rate of decline of −1.8/y from December to July was mostly due to the bump in April of 2018.

July update of CPI (with June data)

The latest CPI data is out today, and we see the continued end of the "lowflation" period in the US that trailed the 2008 recession and the global financial crisis. Overall, there's not of news here so I'll just post the graphs with the latest post-forecast data (black) compared to the forecast/model (red) for both continuously compounded annual rate of change and year-over-year change (as always, click to enlarge):


Here are some zoomed-in versions:


The errors bands are the standard deviation (~70%) of the model errors on the fit data (blue). The dashed red line is the (minimally) revised estimate of the post-recession shock parameters.

...

PS I forgot to include separations in the JOLTs data release earlier this week, so I'm posting it now. Also, I decided to use the interest rate spread estimate for the counterfactual recession timing (2019.7) in the static graphs instead of the previous arbitrary one (2019.5). The animations still show the effect of changing that timing on the counterfactual forecast. I'll also show the JOLTS openings rate with this updated timing guess:



Tuesday, July 10, 2018

Counterfactual 2019 recession update (JOLTS data)

Unfortunately the latest data from JOLTS isn't that informative — we're effectively in the same place we were last month with a continued correlated deviation from the dynamic information equilibrium model "no recession" counterfactual for JOLTS job openings. Here are the counterfactual forecasts updated with the latest data:


The quits and hires are showing trend behavior as before (click to expand):


A correspondent on Twitter did point me to NFIB data as an additional source — it tells a similar story to the JOLTS data with somewhat higher uncertainty:


Their measure is the fraction of firms reporting at least one unfilled job opening in their survey.

The median interest rate spread among several measures continue to decline. I added an AR process estimate of the future median monthly rate spread based on the linear model. It seems to show yield curve inversion is unlikely before the recession hits this pseudo-cycle:



And here's a somewhat more zoomed-in version:


PS Here's the updated JOLTS opening animation showing different counterfactual recession centers from 2018.5 to 2019.5:

As well as the Beveridge curve (latest point is the white dot with black outline):