Tuesday, June 20, 2017

Barriers to entry in the quantitative parable industry

John Cochrane writes about his view of Ricardo Reis' paper defending macroeconomics. Noah Smith previously wrote about it here, and I commented on that post here. I also based my short play/Socratic dialogue in part on Reis' paper.

Cochrane provides another data point lending credence to my thesis that economists don't quite know what mathematical theory is for (see e.g. here, here, here, here, or here). The specific version that Cochrane starts off with is similar to Dani Rodrik's view (that I discuss here):
Many [critics of macroeconomics] bemoan the simplifications of economic models, not recognizing that good economic models are quantitative parables, or abstract art. Models are best when they isolate a specific mechanism in a transparent way.
Nope. 

I put the proper scientific process in picture form in this post:


Cochrane's quantitative parables go on the right hand side of this. The basic idea is that the "simple clean theoretical model" (quantitative parable, toy model) is something that comes after you've had some empirical success. In words, I'd describe the process like this:
  1. Observe one or more empirical regularities [1]
  2. Describe one or more empirical regularities with models (built using realistic or unrealistic assumptions ‒ your choice!)
  3. Observe any errors in theoretical descriptions
  4. Revise theoretical descriptions
  5. Repeat/continue process
  6. Collect empirical regularities and their successful models into theoretical framework
  7. Test scope of theoretical framework with data, derive novel results from framework, teach the framework using toy models demonstrating framework principles, isolate mechanisms theoretically using framework for empirical study, support unrealistic assumptions using scope of the framework, and/or revise framework
  8. Collect issues with theoretical framework and new empirical discoveries into new framework
  9. Repeat/continue process
And this process is followed by all sciences. At any time, people can be working on steps 1 through 4. Steps 6 and 7 come along in a more mature science like physics, chemistry and biology. As far as I know, only physics has gone through step 8 multiple times (with relativity, quantum mechanics, and quantum field theory), but happy to be wrong about that. Economics has gotten to step 5 (Noah Smith has a list of some of the successes, and makes an excellent case for only getting to step 5 in this post [2]).

The key point is that the Cochrane's quantitative parables and models that theoretically isolate mechanisms come much later in the process than economics has progressed. It usually takes a genius or otherwise seminal figure to do step 6. Newton, Einstein, Noether, and Heisenberg in physics. Darwin in evolutionary biology. Hutton and Wegener in geology. Snow in epidemiology. Mendeleev in chemistry. However, we cannot use the converse: the existence of famous/seminal figures does not imply they developed a theoretical framework. And it is also important to stress the empirical piece. Wegener wasn't the first person to posit continental drift, but was the first to include e.g. fossil evidence. Economics in contrast is rife with theoretical frameworks posited by famous economists without comparison to empirical data. Even Keynes and Adam Smith appeal to philosophy and argument rather than data ‒ Keynes famously saying "But it is of the essence of a model that one does not fill in real values for the variable functions."

As an aside, I think I might have finally arrived at a really good way to describe what critics are saying when they say economists have "physics envy": economists think they have a theoretical framework. It explains why economists have papers with overly-mathematical symbols given how poorly the theory describes the data. It explains why they feel they can make unrealistic assumptions even when the result doesn't describe any data. It explains why they think the words "toy model" should even be in their vocabulary. Economists think they are in step 7, but really they are still cycling through step 5.

That seems like a bad start for Cochrane's piece, but the next thing he says is right on:
Critics [of macroeconomics] usually conclude that we need to add the author's favorite ingredients ‒ psychology, sociology, autonomous agent models, heterogeneity, learning behavior, irrational expectations, and on and on ‒ stir the big pot, and somehow great insights will surely come.
There is far too much "we should include X" in economics (including heterodox and non-economists). The only scientific way to say "we should include X" is to say:
APPROPRIATE: We included X and it improved the theoretical description of empirical data, therefore we should include X.
The unfortunate thing that some scientists do however is this:
INAPPROPRIATE: "We included X and it improved the theoretical description of empirical data in our field of science, therefore we should include X in economics.
It would be fine if it improved the theoretical description of empirical data in economics, but theory by analogy only goes so far. I try to call this out as much as possible when I can (biology, evolutionary biology, complexity theory).

I think this is the unfortunate consequence of the history of theoretical frameworks without reference to empirical data in economics. If you don't discipline theory in your field with data, anyone thinks they can come up with a theory because they reckon they know a bit about how humans think about money being a human who has thought about money.

And that's really a bigger takeaway. Many macroeconomists are frustrated with the criticism and the economic ideas coming from people without PhDs in economics. But they set up their field to essentially be armchair mathematical philosophy, and the barriers to entry for armchair mathematical philosophy are extremely low. A high school education is probably more than sufficient. I think that explains the existence of the econoblogosphere. It's really not the same in physics, mathematics, or signal processing from my experience (using examples of my favorites) ‒ those fields all tend to have experts and PhD students sharing ideas.

I think I'll leave it there because this forms a nice single thesis. There is a process to science and mathematical theory has a specific place. However, theory without comparison data or without an empirically successful framework is just armchair mathematical philosophy. There are few barriers to entry in armchair philosophy. For toy models isolating mechanisms however, empirical success is the barrier to entry. 

...

PS I did get a chuckle out of this:
Others bemoan "too much math" in economics, a feeling that seldom comes from people who understand the math.
I think that is sometimes true. 

However, I personally do understand the math. My opinion is that 1) the level of mathematical complexity of macroeconomic models far outstrips the limited amount of empirical data, and 2) the level of mathematical rigor far outstrips the accuracy of those models.

PPS I imagine some people will call me out for hypocrisy regarding the "INAPPROPRIATE" statement above. Aren't you, physicist, saying things from physics should be included in macroeconomic theories?

I would remind those people that I am not forgetting the clause about comparing to empirical data. I am not saying just "economists should use information theory", I am saying the information theory nicely encapsulates several empirical successes. I also test the theory with forecasts.

...

Footnotes:

[1] Yes, you always need to have some sort of underlying model in order to collect and understand data. However this is not that onerous of a requirement as even philosophy can frequently fill this role. An example is Hume's uniformity of nature. The implicit model behind the unemployment rate is that there are some people who do things in exchange for money and some people looking for opportunities to do things in exchange for money. 

Sometimes people make this observation out to be much more consequential than it actually is. Sure, it's important in the case of understanding what fluctuations of NGDP are important (i.e. what is a recession). However a lot of empirical data in economics is "counting things" where the implicit theory is not much more complex than the one that governs "Which recycling bin does this go in?"

[2] Added 4pm. In contrast to Noah's opinion, though, I wouldn't say that makes economics "not quite" science. It's true that most of the "hard" sciences made it to step 7, but they once were in steps 1 through 4. Physics didn't suddenly become "science" after Newton. It just became a science with a theoretical framework. Biology didn't become a science after Darwin, it was just more about empirical data before.

I think macroeconomics and microeconomics prematurely thought it had a framework with rational utility maximization and some ideas from Keynes (the neoclassical synthesis). Economists started to evade the discipline of data. In a sense, what Noah is calling the "credibility revolution" is the slow realization that the purported frameworks like rational utility maximization and DSGE failed some of the tests in step 7 and so macro is falling back to steps 1-4. This is a good development.

Monday, June 19, 2017

Forecast stability (unemployment rate edition)

I have an old post where I proposed a though experiment: What would the charts of economic forecasts look like over time if they were made with the incorrect model?

Well, the FRBSF has revised their unemployment rate forecast once again in the latest issue of FedViews so I went back and compiled their forecasts from the beginning of 2015, 2016, and 2017 (shown in different colors below) into a single chart reminiscent of the thought experiment charts. Basically, the FRBSF has continued to revise their forecast for the unemployment rate downward over time.

In addition to compiling those forecasts, I made forecasts using the dynamic equilibrium model (gray) from the same starting date as the FRBSF forecasts (shown with a vertical gray line).

Here are the results:





While, yes, the dynamic equilibrium model does pretty well in terms of residuals, I wanted to bring attention to another desirable quality of models: the dynamic equilibrium forecasts are relatively stable with respect to incoming data (and in fact converge over time).

Now it is true that the unemployment rate will almost inevitably go up again at some point in a future recession. And recessions do introduce some instability into the dynamic equilibrium forecasts of the unemployment rate (as you can see in the algorithmic recession detector), but that instability occurs exactly where we should tolerate it more: a dynamic non-equilibrium situation like a recession. The FRBSF instability occurs during what we'd otherwise consider a "normal" economy.


Saturday, June 17, 2017

Information transfer economics FAQ (reference post)


FAQ (Frequently Asked Question) pages used to be a more prevalent aspect of the internet, but seem to be fading ‒ probably due to being able to ask those questions directly in a Google search or on Quora or similar sites. Anyway, this will be a reference post that will be occasionally updated with some of the questions and misunderstandings I frequently run across from readers. Here's a link to some basic definitions that I'll assume readers have read.

Don't economists already study information in economics?

The "information" in information transfer economics is from information theory and the study of communication channels. It has a technical definition more directly related to probability than to e.g. knowledge of how the monetary system works or whether a company is about to announce disappointing earnings. The key insight from Shannon in his seminal paper on the subject was that the meaning of messages sent through a communication channel is not as important as the number of possible messages (which determines the information content of each message).

That being said, the work of Christopher Sims comes closest to treating information in a similar way.

Why do you treat economic agents as mindless atoms?

This is a bit of a mischaracterization of how the information equilibrium/dynamic equilibrium framework treats economic agents. The best way to put it is that the framework treats humans as so complex their decisions appears to be algorithmically random (i.e. indistinguishable from a computer program with random outputs for a given set of inputs). The main assumption however is that an ensemble of economic agents will fully explore the possible options available to them in the presence of constraints. The net result is a model that looks like "mindless atoms", but it isn't the whole story. 

This approach is somewhat different from traditional economics where agents are usually assumed to choose the best option available to them. However, economist Gary Becker had a paper from 1962 that also looked at this "random" approach (thanks to economist David Glasner to pointing it out to me). He [Becker] referred to it as "irrational", but I am agnostic as to the motivations of the agents. Even a rational decision can look like an irrational one if you are missing some knowledge of the agent (e.g. selling a stock at a loss in recession looks irrational, but it is possible the person had medical bills to pay).

But mindless atoms don't panic ...

While information equilibrium treats agents effectively as random "mindless atoms" (but really treats them as so complex they look random), the information transfer framework is more general. If agents didn't spontaneously correlate in state space due to human behavior (e.g. panic, groupthink), then the information transfer framework reduces to something that looks like boring standard thermodynamics. However, they do in fact panic. In terms of thermodynamics, this means that the information transfer framework is like thermodynamic, but missing a second law of thermodynamics. The "mindless atoms" will occasionally panic and huddle in a corner of the room and you have non-ideal information transfer as opposed to information equilibrium.

There is less the information transfer framework can say about scenarios where we have non-ideal information transfer, but it still could be used to put bounds on economic variables.

Wait. Isn't this just saying sometimes your theory applies and sometimes it doesn't?

Yes, but in a particular way. For example, the effect of correlations (panic, groupthink) is generally negative on prices.

Additionally, empirical data appears to show that information equilibrium is a decent description of macroeconomic variables except for a sparse subset (i.e. most of the time). That sparse subset seems to correspond to recessions. Since human behavior is one of the ways the system can fail to be in information equilibrium, this is good evidence that information equilibrium fails in exactly the way the more general information transfer framework says it should.

In a very deep way, one can think of information equilibrium being a good approximation in the same way the Efficient Market Hypothesis (EMH) is sometimes a good approximation. Failures of the EMH seem to be correlations due to human behavior.

Why don't you get rich using this theory then?

In a way, I am because my 401(k) is invested in index funds. The theory says that unless you can predict human behavior, you cannot predict episodes of non-ideal information transfer. Therefore the optimal portfolio choice is to diversify and hold for a long period of time. This is what most financial advisers say anyway (and the reason is effectively the same: because the EMH is a good starting approximation).

Some individual people might well be good at predicting human behavior and therefore could potentially outperform the diversified index fund investor, but I am not one of those people. Individual human behaviors frequently baffle me.

I have put together some speculative research about stock markets, but again the results are broadly similar to already known investment strategies.

So you have economics all figured out?

No, not in the slightest. This is all research. I am currently testing the models using empirical data and forecasts. Readers should not confuse my excitement and interest for certainty.

But you're definitely saying mainstream economics is wrong?

Not really. I do believe there are some assumptions made in many (mainstream and heterodox) approaches in economics about the impact of human decision-making, human behavior, and complexity of the macroeconomic system that are unfounded from an empirical perspective. A simpler approach like information equilibrium avoids making strong assumptions about human behavior (we only assume agents explore opportunities most of the time) and uses information theory as a shortcut to understanding complex systems (per the abstract of Fielitz and Borchardt's original paper on information equilibrium for natural systems).

That being said, information equilibrium can be used to formulate many mainstream economic models. Several of these information equilibrium versions of mainstream models tend to be less empirically accurate than information equilibrium models constructed from observed empirical regularities directly (a good example is the New Keynesian DSGE model versus the monetary information equilibrium model).

In the sense that modern economics grew out of the principles of supply and demand and marginalism, one can think of information equilibrium/information transfer as a generalization of those principles. It therefore has some overlap with mainstream economics.

But what about X?

Over the past four years (as of 2017), I have written around a million words on information equilibrium, dynamic equilibrium, information transfer, and the applications to economics. I probably have written about X, so have a look via the search bar (or better yet a Google site search like this for "scope" which will search comments as well). I try to use mainstream economics terminology where I can, so you can use phrases like nominal rigidity or tâtonnement. Comments are open on all of the older posts if you have questions.

If you're looking for a good place to start, I put together a "tour of information equilibrium" chart package.

Friday, June 16, 2017

Wasserstein GAN and information equilibrium

A figure from Arjovsky, Chintala, and Bottou showing the lack of a strong gradient in the Jensen-Shannon divergence, slightly edited for comedic effect.

This paper (Arjovsky, Chintala, and Bottou [ACB]) on Wasserstein Generative Adversarial Networks (WGANs) has been generating (no pun intended) a lot of buzz in the machine learning community. Earlier this year, I mentioned some intuition I had about a possible connection between GANs and information equilibrium as well as the potential for GANs to function as a model of markets.

ACB uses the Wasserstein distance (metric) (also called the Earth Mover's Distance) instead of the original Jensen-Shannon distance (divergence) (a symmetric version of the Kullback-Liebler divergence). One of the benefits of the W-metric for machine learning is that it tends to have non-zero gradients (so e.g. gradient descent solvers won't have as many issues as could happen with other metrics illustrated in the figure at the top of this post). In an odd coincidence, the W-metric has come up two other times recently in unrelated contexts in my real job.

I noticed that this approach also has some interesting connections to information equilibrium. For starters, the W-metric is very much the intuitive guide I usually give for the distribution of supply coming into equilibrium with demand. We have two distributions where a lump of the supply distribution is moved to some place where there is an excess in the demand distribution as part of our approach to equilibrium. Here's a figure illustrating the concept from a nice discussion of WGAN:


There is even a happy accident in terminology in that there is a "cost function" involved. There is an additional happy accident that the exact solution for histogram distributions is obtained via linear programming, discussed in the context of economics in this blog post. By the way, there is an another nice overview of ACB here.

One of the other interesting aspects of WGANs is that the GAN "discriminator" is replaced by a WGAN "critic" (per ACB via the previous link):


The critic makes much more sense in the context of economics: at constant demand, a low price is a "critic" of excess supply and a high price is a "critic" of scarce supply.

The GAN analogy is still a one-sided model of supply and demand (it is a model for constant demand and varying supply or vice versa), rather than a full "general equilibrium" analogy (where supply and demand react to changes in each other).

I'm going to close (for now) with another observation about some of the math involved, namely Lipschitz functions. One of the problems with the abstract WGANs is that the W-metric is generally intractable (like the linear programming economic allocation problem). However one can rewrite the problem using Kantorovich-Rubinstein duality which couches it in terms of Lipschitz functions which are (put simply) functions with bounded slope. The K-Lipshitz condition (bounded by slope K) is:


d​ₙ​​​(f(m​₁​​), f(m​₂​​)) ≤ K dₘ​​(m₁​​, m​₂​​)


for all
m₁ and m​₂ 
where f(m) : m → n and the d's are metrics on the manifolds m and n. Longtime readers of this blog may know where this is going already: this can be represented as an information equilibrium (transfer) condition. If we have information transfer from "demand" A to "supply" B, then:

dA/A ≤ k dB/B
d(log A) ≤ k d(log B)

which is the infinitesimal version of the K-Lipschitz condition (for log A and log B) with the IT index k playing the role of the slope K. That is to say that information transfer relationships are "locally" k-Lipschitz. The definition of K-Lipschitz is actually a global (i.e. for all of m (i.e. log B)), so the KR duality trick would only work functions that are in information equilibrium (i.e. the equal sign, because then log A = k log B + c and the slope is actually equal to k for all log B measured from any two points log b₁ and log b₂).

[Update 19 June 2017: n.b. this also applies to the price p = dA/dB being locally K-Lipschitz, but with K = k − 1.]

The Lipschitz function representation of the WGAN also yields a solution up to an overall scale. I discuss the potential importance of scale invariance in economics in several posts (e.g. here or here).

I am not sure there is anything useful in the observation; it may be wildly off-base. I am still looking into the possible use of of GANs as a model of the "market algorithm", possibly showing us how markets work as well as under what conditions they don't work (and ways to improve them).

Wednesday, June 14, 2017

Today's Fed decision and recession indicators

The Fed makes its interest rate announcement today at 2pm Eastern (11am Pacific). If they raise rates, then a couple of recession indicators will move towards the higher probability of recession end of the spectrum. For example, there's the inverted yield curve (discussed more extensively here):


There is also the "avalanche" indicator (the yellow area indicates above-trend interest rates and recessions never seem to happen without them):


Additionally, several of the JOLTS measures are showing a slowdown relative to the dynamic equilibrium (in this graph, the hires data is starting to fall below the green line):


Up until the last couple data points, the unemployment rate was seeming to flatten out signalling a potential recession in the dynamic equilibrium model:


All of this is of course speculative, with only the inverted yield curve being a mainstream indicator. However, if the information equilibrium/dynamic equilibrium picture is correct, we are starting to accumulate several indicators of a future recession. Each on its own is definitely not enough, and the Fed could hold off raising rates today due to a low CPI inflation number (making yield curve less likely):


However, I wanted to get this out there before the Fed announcement today to both make sure I'm keeping myself honest as well as providing a test of the usefulness of the information equilibrium model.

...

Update 11:32am

They raised them.

Tuesday, June 13, 2017

Consumption, income, and wealth

John Handley has started up a new updated blog, and the most recent post asks an interesting question (in a way that should be lauded for both its use of data and making the code available): if disposable income predicts consumption so well (i.e. the traditional Keynesian consumption function), why did anyone start using the Euler equation with its "almost comical level of inaccuracy"? While we wait for the optimal answer to that question from Beatrice Cherrier, my intuition says the answer is "microfoundations". John asked me to verify his regressions, which I did; the model he considered has the added benefit of being an information equilibrium model so I am writing it down here.

The basic Keynesian consumption function is essentially a linear model

PCE = a DPI + c

where PCE is personal consumption (FRED PCEC) and DPI is disposable income (FRED DPI). However John discovered an excellent relationship between PCE, DPI and total net worth (FRED TNWBSHNO) which I will call TNW for short:

log PCE = a log DPI + b log TNW + c

This has the form of an information equilibrium relationship (as well as a Cobb-Douglas production function where income and wealth are both "factors of production" for consumption):

PCE DPI
PCE TNW

with information transfer indices a and b. It also reduces to the basic Keynesian model in the limit where changes in TNW and DPI are small (in percentage terms). The model works pretty well [1]:



We find a = 0.82, b = 0.18, and c = -0.47. This result has several different implications. First, as John's says via Twitter, it means "consumers are way more hand to mouth than typical model[s] suggest". It also suggests that changes to income have a bigger impact on consumption than equivalent relative changes in wealth. Note that a ~ 1 means that the permanent income hypothesis isn't a good approximation (which requires a << 1) in line with previous results using information equilibrium to describe lifetime income and shocks to income. Additionally, including different methods [1] as well as John's results, the result nearly always gave a + b ≈ 1, i.e. constant returns to scale.

Overall, this is a pretty good model of consumption in terms of income and wealth.

...

Update + 3 hours

One thing to note is that the empirical finding that a + b ≈ 1 implies that there is a constant wealth-to-income ratio (for the same reasons the labor share is constant in the Solow model). This lends credence to the frequent stock-flow consistent model assumption of a constant wealth-to-income ratio (possibly subjected to stochastic shocks per the dynamic equilibrium model).

...

Update + 4.5 hours

Despite lauding John Handley above for making his code available, I forgot to upload the Mathematica code for the model to the repository (information equilibrium). This has been rectified.

...

Footnotes:

[1] I tried multiple different ways of estimating the parameters and all give approximately comparable results (and comparable to John's results as well).



Saturday, June 10, 2017

Milton Friedman's question begging


I know I'm right in saying that Milton Friedman's thermostat is an important idea that all economists ought to be aware of. And I'm pretty sure I'm right in asserting that almost all economists are unaware of this important idea that all economists ought to be aware of.
If Nick is right about economists being unaware of it, then there's a good reason: it's not a logically valid argument.

Nick retells Friedman's parable using a gas pedal g(t) and a car's speed s(t) going up and down hills h(t). The basic premise is that if the driver perfectly achieves a target speed of 100 km/hr, then it will look like the gas pedal movements are unrelated to speed (if we don't know about the hills). The gas will be going up and down, but the speed will be constant.

The argument purportedly can be deployed at anyone using data to claim the gas pedal doesn't matter.

But how did either the plaintiff or the defense know the gas pedal had anything to do with this scenario in the first place? Friedman's thermostat assumes some prior empirically validated model where the gas pedal was conclusively proven to determine the car's speed before the "constant speed" scenario came to be.

That is to say a scientist just given the "constant speed" scenario would say there's no evidence anything influences speed. The "best model" is actually a constant

s(t) = 100 km/hr

Not

s = s(h(t), g(t), t)

If you already know g(t) matters ‒ and adaptively makes s(t) constant ‒ then, yes, the constant s(t) data doesn't disprove that. But if it is a question whether g(t) matters, then assuming g(t) matters in order to disprove claims that g(t) doesn't matter is classic question begging.

Additionally, in order to disprove claims that g(t) doesn't matter one could just deploy the original data and findings that g(t) does matter. This of course makes Friedman's thermostat superfluous.

So hopefully that's why most economists don't know about Milton Friedman's question begging, I mean, thermostat.

Now it is true that when Friedman deployed the argument [pdf], he said there was a time when the thermostat was imperfect before the analog of the "constant speed" scenario ‒ which allows one to determine the effects of g(t). However it is not an undisputed fact that g(t) was determined to affect speed (for example, post war inflation is arguably a demographic effect).

At the end of the day, it's usually best to argue about models in terms of empirical data rather than logic and philosophy.

Friday, June 9, 2017

JOLTS and narratives

The latest JOLTS data came out this past week, and I've previously looked at the data with the dynamic equilibrium model (in fact, that analysis led to my current understanding). However, I saw Nick Bunker's charts on the Equitablog where we see the problem that frequently pops up in economics where trends in data are ascribed to particular narratives that aren't supported by the data. These narratives might be consistent with the data, but are the scientific equivalent of saying unemployment is falling because the economy is becoming more awesome.

In the charts, Bunker claims:

  • Fewer unemployed workers per job opening shifts bargaining power to employees
  • When workers are more confident about the labor market, they quit more
  • As the US labor market tightens, job openings yield fewer hires
The last one is tautological ‒ what does it mean for a labor market to tighten besides fewer hires per job opening? The other two are behavioral relationships: humans are deciding to quit at a higher rate than they would otherwise or e.g. negotiating higher salaries than they would otherwise.


I went through and tried some dynamic equilibrium models for the relevant JOLTS data series (quits, hires, openings, total separations, unemployment) and uploaded the Mathematica code to the dynamic equilibrium GitHub repository. The story told by these models is that not much new is really happening. We're in the same post-shock dynamic equilibrium we've been in since 2010 and nothing has changed. People are still quitting at the same increasing rate they were between the 2000 and 2008 recessions (the quit rate increases by few 10ths of a percentage point per year):


Hires are also increasing, as are openings and separations:




In fact, the unemployment rate shows the same basic structure except inverted (and this model correctly forecasts the unemployment rate from the beginning of 2016 shown in black):


As a scientist, I look at this data and conclude there is probably a single underlying process governing it all. The dynamic equilibrium model is the incredibly simplistic view that the underlying process is a constant log-linear increase in population interrupted by occasional shocks (recessions). If population grows as ~ exp(× t), then the hire rate, job opening rate, unemployment rate, and quit rate all grow (or shrink) at some (relative) rate ~ (k − 1) × r (interrupted by occasional shocks) with different (constant) values of k for each time series. That is to say nothing really happens except recessions and population growth. Workers are quitting via the same process they quit at any other time besides 2008-2009. They aren't more confident. You could actually say that as the economy increases in size more jobs become available meaning it just becomes more likely that someone changes jobs at random (to use the maximum entropy narrative underlying the information equilibrium/dynamic equilibrium model). You can make this connection more explicit with a matching model.

This is not to say there aren't behavioral effects or other economic effects due to other mechanisms. Tightening of labor markets might well lead to a (second order) effect on PCE inflation because of increased bargaining power (or employers increasing wages offered in order to simply find/hire scarce employees). In the dynamic equilibrium picture, inflation in the post war period is explained mostly be women entering the workforce (first order), accompanied by 'Phillips curve'-like effect between recessions (second order, and stronger at higher inflation consistent with the flattening of the Phillips curve).

Aside from a bit of "isn't the information equilibrium model awesome", what I am saying is that we should be careful about the narratives created around economic data. Plausible stories are not evidence, and there could be equally good (or better, or more empirically accurate, or simpler) explanations that tell a different story. Or, as in the case of information equilibrium, no story at all. 

Friday, June 2, 2017

Unemployment forecasts: May data update


Given the latest report of 4.3%, it's looking more like the dynamic equilibrium path without the recession shock is the correct forecast from January of this year. I plotted out the dynamic equilibrium path without the recession (red) to the end of 2018 in order to better compare with the FRB SF forecast (also from January, which I finally added to the same graph):


There are no error bars on the FRB SF forecast, however we should probably estimate them as roughly the same size as the error on the dynamic equilibrium model in red.

I will also start to keep track of the effect new data has on the algorithmic recession indicator forecast (which really only tells us that the unemployment rate has to start rising or falling before some date). The most recent data has pushed that date out from 2018.5 to 2019.1 (end of January 2019):


Emergence and over-selling information theory


“It was hard for me to find anything in the essay that the world’s most orthodox reductionist would disagree with. Yes, of course you want to pass to higher abstraction layers in order to make predictions, and to tell causal stories that are predictively useful — and the essay explains some of the reasons why.”
Scott Aaronson
Economist Diane Coyle and Physicist Sean Carroll both tweeted about this Quanta magazine article about Erik Hoel's "new mathematical explanation of how consciousness and agency arise", so I naturally was intrigued. With the mathematics being information theory (which I apply to economics on this blog), I was doubly intrigued. However (as I tweeted about this myself) I think this may be a case where the description for a mass audience might have gotten away from the source material. That's why I led with the quote from Aaronson (which Carroll also emphasized in his tweet): the magazine article seems to take this a bit too far as a counterargument to reductionism than the math Hoel employs does.

Let me see first if I can explain Hoel's essay [pdf] and the original paper [pdf] (this will be a bit of a simplification). Let's say you want to send a message from point C to point E faithfully in the presence of noise. As Shannon defined the problem when he invented information theory [pdf], you want to make sure that for any possible message you select at C to encode and transmit, you can receive and decode it at E. I chose the letters to represent the information source and destination to represent Cause and Effect. When you decode E (your measured data) using your theory, you hope to get back the cause C.

If you are communicating in the presence of noise, some ways (i.e. theories) of decoding E (i.e. data) are better than others. One metric for this is called the Kullback-Liebler divergence (that I've talked about before on this blog). It measures information loss if you try to decode E with the "wrong" code. In communication theory, it's about losing bits because of mis-matched probability distributions. Hoel thinks of effective theories at different scales as the different codes, and instead of losing bits you are losing the fidelity of your causal description. Because the theories at different scales are in general different, one of them could easily be "the best" (i.e. you could minimize the KL divergence for one encoding, one particular theory at a particular scale). This description maximizes what Hoel called the "effective information" of the causal description.

Scales here means descriptions in terms of different degrees of freedom: atoms or agents. Describing an economy in terms of atoms is probably impossible. Describing it in terms of agents is probably better. Describing it in terms of general relativity is also probably impossible. Hoel's contention is that there could well be a macro scale effective theory that there are one or more optimal descriptions [1] in between atoms and galaxies in the presence of noise. We would say this "more faithful" description in the presence of noise is "emergent". Your description of an economy in terms of agents is lacking, but pretty good in terms of emergent macroeconomic forces.

Like the Arrow-Debreu general equilibrium/equilibria, Hoel's math isn't constructive (nor unique). There is no reason that the lossy agent theory couldn't be the best causal description in the presence of noise. There is also no reason there isn't a tower of causal descriptions at different scales. And just because macroeconomics or quantum chromodynamics is hard, it doesn't mean there must exist a better theory at a proper scale that's more faithful in the presence of noise.

You may have noticed that I keep adding the phrase "in the presence of noise". That's because it's incredibly important to this concept. Without it, each description at each scale is just as good as any other if they are describing the same theory. It's because our measurements are fallible and our computations are limited that different effective theories work better at different scales. It may take thousands of processor-hours to decode a signal using the proper code, but there may be a heuristic solution with some error that takes a fraction of a second. We might only be able to measure the state of the system to a limited accuracy given the complexity or noise involved. The noise is our human limitations. As fallible humans, all we can really ever hope for are effective theories that are easier to understand and effective variables that are easier to measure.

It might be easier to think about it this way: What does it mean for a causal description in terms of agents to be different than a causal description in terms of macroeconomic forces? If they give different results, how are they theories of the same system? If your agent model is bad at describing a macroeconomy and my information equilibrium model is really good, in what way is the information equilibrium model a "coarse-graining" of the agent model? I think most of us would say that these are actually just different theories, and not theories of the same system at different scales. Unless they give the same results without noise, there's no real impetus to say they're both descriptions of same system.

In a sense, physics as a field gave over to the mindset that all we have are effective theories starting in the 70s (here's Weinberg on phenomenological Lagrangians [pdf], and here's a modern lecture passing this mindset on to the next generation). I went through graduate school with the understanding that the standard model (core theory) was really just an effective theory at an energy scale of a few GeV. Newtonian physics is just an effective theory for speeds much slower than the speed of light.

However something that exists because of our limitations is not necessarily a law of the universe; it says more about us than about the systems we are trying to describe. Along with the non-constructive argument (constructive in his paper for only a limited system that doesn't necessarily generalize to e.g. macroeconomic theories or hadronic physics), we can boil down Hoel's essay to:
There sometimes can be simpler descriptions at different scales, and you can use information theory as a formal description of this.
This is quite different from the breathless accounts in the Quanta magazine article and the press blurbs accompanying it.
"his new mathematical explanation of how consciousness and agency arise"
No, it's just a argument that effective descriptions we could call consciousness and agency could arise because measuring and computing the behavior of neurons is hard and therefore probably lossy. It doesn't explain how or even tell us if they exist. It would be nice if this happened, but it doesn't have to.
"New math shows how, contrary to conventional scientific wisdom, conscious beings and other macroscopic entities might have greater influence over the future than does the sum of their microscopic components."
This blurb completely misrepresents Hoel's result to the point of journalistic malpractice. It's not contrary to "conventional scientific wisdom"; his result is basically what physicists already do with effective theories. Hoel put together a possible formal argument in terms of information theory potentially explaining why effective theories area useful approach to science. Macroscopic entities might be easier to use to predict the future than their microscopic components because they're easier to measure and more computationally tractable. In fact, Hoel's result also says that conscious beings might themselves be useless in terms of describing economies and societies (that's something I posited some years ago).
"A new idea called causal emergence could explain the existence of conscious beings and other macroscopic entities."
Again, this doesn't explain the existence, it just formalizes the idea that some causal descriptions may be better than others at different scales. And like the previous quote, it seems to have a predilection for conscious beings when in fact it applies to atoms emerging from quarks, chemistry from atoms, cells from chemistry, humans from cells, and economies from humans. The theory doesn't explain the existence conscious beings in the same way it doesn't explain the existence of atoms. It describes the existence of atoms as an effective description in a particular framework.

It is somewhat unfortunate that Hoel's work was "oversold" with this article because I think it is a genuinely interesting argument. I will probably reference it myself in the future whenever I argue against strict microfoundations or agent-based fundamentalism. It's essentially an argument against the Lucas critique: there is no reason to assume some macro regularity might have an efficient description in terms of microeconomics (in general it is probably worse!). Hopefully the exaggerated versions of the claims don't detract from Hoel's insight.

*  *  *

I wanted to add a couple of additional comments that didn't fit in the post above.

Dimensional reduction

Hoel's idea that he formalized in terms of information theory is something that I've used as an informal general principle in my approach to pretty much any theoretical attempt to understand empirical data. I gave it the name "dimensional reduction", and I've talked about it on multiple occasions over the years (here, here, or here, for example). As I put it in the first of those links:
In general, if you have a microstate model with millions of complex agents that have thousands of parameters, you have a billion dimensional microstate problem (m ~ 1,000 × 1,000,000 = 10⁹). There are three possible outcomes:

The macrostate is a billion dimensional problem (M ~ m)
The macrostate is a bit simpler (M < m)
The macrostate is a much smaller problem (M << m)
In the second and third cases, the dimension of the "phase space" of the problem is reduced; I called this "dimensional reduction" (there is something related in machine learning). The less complex description is hopefully more tractable computationally and hopefully easier to measure e.g. statistically.

But again, it doesn't have to exist. It's just nice if it does because that means it might not be hopelessly complex. Essentially, if macroeconomics is comprehensible to humans, then some kind of dimensional reduction (i.e. a scale with an effective coarse-graining) makes the system tractable (is a less lossy code to decode the cause from the measured effects).

Causal entropy

I did take issue with the characterization of this:
“Romeo wants Juliet as the filings want the magnet; and if no obstacles intervene he moves towards her by as straight a line as they. But Romeo and Juliet, if a wall be built between them, do not remain idiotically pressing their faces against its opposite sides like the magnet and the filings... Romeo soon finds a circuitous way, by scaling the wall or otherwise, of touching Juliet’s lips directly. With the filings the path is fixed; whether it reaches the end depends on accidents. With the lover it is the end which is fixed, the path may be modified indefinitely.” — William James 
The purposeful actions of agents are one of their defining characteristics, but are these intentions and goals actually causally relevant or are they just carried along by the causal work of the microscale? As William James is hinting at, the relationships between intentions and goals seem to have a unique property: their path can be modified indefinitely. Following the logic above, they are causally relevant because as causal relationships they provide for error-correction above and beyond their underlying microscales.
There's another way to approach this that entirely exists at the microscale called causal entropy. In another exaggeration for a mass audience (this time a TED talk), it was called "a new equation for intelligence". However in that case, even "dumb" automatons can accomplish pretty astounding tasks (navigate a maze [YouTube], use tools [pdf]) if they're given a simple directive to maximize causal entropy.

Minima of effective information

Hoel's scales show that the effective description can simplify. In my research, I noted a lot of descriptions simplify at different scales, but in between those scales you nearly always had a mess. This mostly derived from my thesis where I tried to work with an intermediate scale between the high energy scale of perturbative quarks and the low energy scale of hadrons. However it also seems to appear in the AdS/CFT correspondence in string theory (the supergravity theory turns into a non-perturbative mess well before the perturbative string theory starts to be a good effective description). As I put it in one of my first blog posts:
You go from hadrons to a mess and only then to quarks as you zoom in.
Where Hoel talks about effective information being higher at different scales, I made a conjecture that I wrote down on this blog a couple of years ago about there being what are essentially minima of Hoel's effective information between scales:
... the emergent macro-theory tends to have more to do with just the symmetries and bulk properties of the state space rather than the details of the micro-theory. 
In the quark case it's actually pretty interesting -- there is no scale at which both the quark and hadron theories are simple descriptions. At high energy, the quark theory simplifies. At low energy, the hadron theory simplifies [3]. In physics we call this duality -- sometimes the wave description of a quantum system simplifies and sometimes the particle description simplifies. Some phenomena are more easily seen as electric fields and moving charges, some phenomena are more easily seen in terms of magnetic fields. 
For economic phenomena, sometimes the representative agent simplifies and sometimes micro-theory agents simplify. ... 
... [3] I have a conjecture that this always happens.
Emphasis added.

If Hoel's result is true about useful effective theories being local maxima of effective information, then my conjecture is correct and there must exist local minima between them.

*  *  *

Footnotes:

[1] As a side note, I make the argument on this blog that information equilibrium is a pretty good macro scale effective description.