## Monday, August 24, 2015

### Entropy is working for the weekend

Nick Rowe employs his ability for distillation in the task of explaining a monetary coordination failure that results in a recession. In his recent post, he mentions the coordination occurring on the weekend:
Every Saturday Canadian output and employment drop. And they drop again every Sunday. Every weekend, output and employment drop for two successive days. Are weekends mini recessions? I would say "no".
If you've been following along this blog -- in particular this post [1] -- you might ask: I thought you said coordination causes recessions?

I did say that. The important thing to understand is that it is coordination relative to the equilibrium distribution. Let's say here's what (the probability density of) output looks like during a typical week (I made this data up):

Call this distribution P. It's the equilibrium distribution. Some people (by no means everybody) have the opportunity to take weekends off. Now week to week the information loss measured by the KL divergence is zero:

D(P||P) = 0

But if suddenly this happens (call this distribution Q):

We get a KL-divergence that results in an information loss of:

D(P||Q) = 0.18 bit

It turns out that is a loss of about 6.6% relative to the information entropy of P ~ 2.78 bits. That would be a mini recession (if weeks were all the same except that one, it would be a recession of about 0.1% of output).

What would really be happening? What is the story behind this mini-recession? With more people off on days that they used to have on, they might go shopping. But with fewer people to stock the shelves and no one expecting a Thursday rush of long weekend consumers, less gets sold than in the status quo. ATM's might not have enough money in them for Wednesday being the new Friday and cash only establishments would miss out. Restaurants unexpectedly fill up for Friday brunch and people can't get a table.

We seasonally adjust data; that's an admissible procedure only because the seasonality represents the equilibrium distribution.

Well, the first graph was actually a simplification. The actual distribution would be more complex, taking into account your country's public holidays, major sporting events and vacations. That would be the real P.

If part way through the year, 5% more people became unemployed so that your distribution of output changed (i.e. more the output would have been at the beginning of the year relative to an equilibrium year), then you would probably have a recession. That's a bad coordination.

Now I actually think the real coordination comes in the form of pessimism about asset prices and future sales, so the rise in unemployment is a symptom, not the cause. The entropy loss manifests as a fall in employment (and a rise in the number of people with zero wage change) as seen in the post above [1].

1. Jason, I tried repeating the calculation. First off, I think this Wikipedia article is slightly misleading in the 1st paragraph because it specifically says that D(.||.) measures "bits" in the 1st paragraph, but then gives a formula with ln instead of log2. No biggie (but at first I dogmatically followed their ln formula and was not surprised it did not match yours).

Your P distribution is clear enough (0.1, 5x 0.16, 0.1), but the Q distribution I couldn't quite decipher. Sunday looks to be about 0.137 or so, but all the rest look like 3x 0.22 followed by 3x 0.07... which only leaves 0.13 for Sunday, so I went with:

0.13, 3x 0.22, 3x 0.07

Are you sure Q adds up to 1?

Now you wrote D(P||Q) which I then calculate to be 0.1747 bits. That's pretty close to your 0.18, so good enough (probably just my inaccuracy in reading your Q distribution).

However, prior to that calculation I calculated D(Q||P) (because I wasn't looking at your post when doing it), which seemed to make more sense to me because isn't the idea that everyone expects P but one strange week has an ACTUAL distribution of Q? So isn't bits lost when everyone's thinking P but it's actually Q written as D(Q||P)? I get 0.1494 bits lost with D(Q||P).

I do get your same entropy for P (2.78 bits).

1. Q does add up to 1 (I normalized it); I think your estimate is off a bit.

Your second question is a great question.

You have to imagine the communication system of the economy working properly when P is used to encode the information. Usually the market estimates P as close to P. However in the alternative week, it is estimating the distribution as Q, which isn't a good estimate of P.

So it is D(P||Q) and not D(Q||P).

2. With 3 sig figs Q is

.137
.219
.0685

All of those round up for 2 sig figs, so it would be off.

3. Thanks Jason. It sounds like what you said in your above comment is that I got it backwards. Instead of writing this:

"everyone expects P but one strange week has an ACTUAL distribution of Q"

I should have written:

"everyone expects there to be a strange week with distribution Q but that week has an ACTUAL distribution of P (just like all the other normal weeks)"

Is that it?

4. There is no 'expected' week. There is a mini recession because there was a change in the realized distribution from P to Q.

Distribution P is itself likely a bad approximation to the optimal distribution A (uniform is maximum entropy). But D(A||P) = some constant that doesn't matter because we usually realize P.

P is the equilibrium. Something happens that pushes the economy away from equilibrium ... to Q. Since P is the equilibrium, we measure a an information loss D(P||Q).

5. Let me quote Wikipedia:

"Typically P represents the "true" distribution of data, observations, or a precisely calculated theoretical distribution. The measure Q typically represents a theory, model, description, or approximation of P."

Maybe it's that sentence in the current context that's confusing me.

At first I was taking the "true" distribution to be the alternate week's (surprising and unique) output profile.

You write: "But if suddenly this happens (call this distribution Q):"

So naturally I wanted to switch P and Q to make it fit.

I have a feeling that when I say "everyone expects" I'm not saying the same thing you are when you say "the market estimates."

So, is this Q week a unique output surprise? Was that the story? Or was the surprise that it wasn't any different in output (although it had been expected to be)?

6. There was no change in output, just a change in its weekly distribution.

Usually the market when functioning properly produces P. Something happens and people spontaneously coordinate to produce Q. Maybe everyone thinks traffic is going to be bad and works extra on Monday. This is a coordination that produces Q.

The operative word in the Wikipedia article is "Typically" ... The KL divergence is used in some physical models and in communication theory where there is a "true" distribution. Here we just have a couple of observed distributions. Usually, it's P. So that becomes our true distribution.

If the distribution continues to be Q forever after, then we'd take Q to be the true distribution and a change back to P would cause a mini recession.

The kl divergence measures how much extra information beyond Q you need to get to P. It doesn't really matter if P is a theory, a model, truth data or whatever. It's just your reference.

7. Thanks Jason. I appreciate your efforts here... I'm not sure why this is such a troublesome concept for me. The communication theory example where there's a true distribution (P) and an estimated distribution (Q) is easy to digest.

Is there an analogy to be drawn with this sentence in Wikipedia:

"The Kullback–Leibler divergence measures the expected number of extra bits (so intuitively it is non negative; this can be verified by Jensen's inequality) required to code samples from P when using a code optimized for Q, rather than using the true code optimized for P"

Is there something in your example which is analogous to each of these concepts?:

1. coding samples from P (or Q)

2. "code optimized for P"

3. "code optimized for Q"

As an example, I would have thought that the usual P-optimized daily schedule of ATM cash balances would be analogous to part of the "code optimized for P." This ATM schedule is not optimized for the Q distribution, but is used anyway when Q happens (because Q was a surprise to the ATM owners). But that thinking must not be correct, because if D(P||Q) is what we're calculating, then it was a "code optimized for Q" that was actually (and sub-optimally) used rather than a "code optimized for P."

Are there useful analogies to be drawn with 1., 2., and 3. above, or is that just leading me astray?

8. Also, in your comment above you write:

"Distribution P is itself likely a bad approximation to the optimal distribution A (uniform is maximum entropy). But D(A||P) = some constant that doesn't matter because we usually realize P."

All that makes sense to me, but just to be sure:

A). So by "optimal distribution" you mean "maximum entropy" which in turn implies a uniform distribution?

B). By "realize P" you mean that P is the actual (non-optimal, non-maximum entropy) distribution, correct?

C). Given that P is the equilibrium and we usually "realize P," then if instead of an atypical week with distribution Q we instead had an atypical week with distribution A (i.e. A = {1/7, 1/7, 1/7, 1/7, 1/7, 1/7, 1/7}), we'd still have a "mini-recession" even though the A distribution is lower entropy than P, true? In this case, to measure the size of the recession (bits lost) we'd calculate D(P||A) instead of D(A||P), correct?

9. Thinking of the economy in state P as operating with a code optimized for P (like the ATM balances example) is a good analogy. There is a change and we start using distribution Q as an approximation to P. I am not sure how far it can be pushed, though since we are essentially pretending we don't know the microfoundations of the economic model when it is operating with distribution P.

Regarding A, B and C:

A. Yes, but there are different maximum entropy distributions given different conditions. A partition function is a maximum entropy distribution where you are constrained to an ensemble average of something like energy < x >. A Pareto distribution is maximum entropy given a constraint on < log x >. A uniform distribution is the maximum entropy distribution where x is between [x_min, x_max].

B. By realize P, I essentially mean observe P (observe a microstate of agents with work distribution P).

C. Yes, there would be a mini-recession going from P to any other distribution. D(P||X) is minimized for X = P.

10. OK, thanks Jason.

11. In your example as it stands (with P normal and Q the oddball, not the other way around), is it possible to dream up any meaning for D(Q||P)? Or is that just not at all obvious?

12. D(Q||P) would measure a recession for transitioning back to P if Q became the new normal (i.e. the new equilibrium -- if for example, things stayed at Q for long enough).

2. I just stumbled upon this blog. Just reading this post, does output decline then because prices can't adjust fast enough? In other words, there is no way to quickly communicate the unexpected surge in demand and so it goes unsatisfied (for instance in your crowded restaurant example).

In a similar vein, does accepting the notion presented (coordination -> recession) imply then that the main channel for policy lies in the managing of expectations?

very interesting nonetheless.

1. Welcome!

The rate at which prices adjust would definitely enter into the duration of the slump -- if prices adjusted instantly, there could be an instantaneous drop in entropy in the case above. However we'd instantly start treating Q as the new distribution and the time integrated effect could be close to zero.

However, it's more than just prices adjusting. There is a distribution of supply (based on Q) that is mismatched to demand (based on P). If there isn't any of something in place at the right time (e.g. cash in the ATMs) even an infinite price won't get it there faster.

The information transfer model so far only hints at the way to think about non-equilibrium situations; it is similar to how thermodynamics only hints at how to think about non-equilibrium systems. If a system develops entropy gradients, it will tend to even them out. In our atmosphere, you get convection with uneven heating. The specific process by which the entropy gradient above created by the different work-week output is evened out could definitely include price adjustments. And those price adjustments must work slowly enough to see the effect of the entropy gradient.

The answer is yes, but not in a way that we'd normally think about expectations. You'd want e.g the central bank to create expectations of "normal" where everybody thinks differently what normal is. Say half of people expect more than "normal" growth and half expect less. If they started to expect the same thing (everyone expects more than average growth, less than average growth, or exactly equal to average growth), you'd get a negative shock to the economy.

"Managing" expectations is making the world safe for a diversity of expectations.

ET Jaynes called it dither -- the ability for agents to randomly explore the state space ("dither") increases entropy.

3. Thank you for your very thorough response. You are correct, this isn't the way i would think of managing expectations. I ask that you bare with me here, as this is indeed a novel framework (for me at least) to work with.

Do the effects of coordination (negative shocks to the economy) hold regardless of the actual realizations? For instance, if every agent in the economy expects higher than average growth and we actually get this result what happens to the level of real output. I'm thinking in terms of what the "story" is. I can imagine a situation in which the average expectation of the economy is for over-performance, so firms build up inventories and increase production. However, 6 months later it turns out the economy under-performed so there is an inventory swing. This could lead to recession. Now what if instead 6 months later the economy actually performed as expected, there is no inventory swing so what is the negative shock?

Also, when you say negative shocks are you referring to a decline in real output?

1. In general, yes -- coordination leads to lower entropy states than "uncoordinated" states. The idea behind this model is that the economy is in a maximum entropy state most of the time, so all other states are lower entropy.

If every agent built up excess inventories, that would represent a lower entropy state than a set of random inventories (some building up, others depleting). (Actually, build up at some rate close to the economic growth rate would probably be the equilibrium, not zero growth.)

These kind of recessions are emergent properties of economies, so there is no true "story" of what is happening. You can invent them, but they are not true of agents. In diffusion, you could invent a "story" of a density dependent force evening out the distribution of molecules. But there is no such force, only entropy evening out the distribution.

In economics I am under the impression that Calvo pricing used to justify price stickiness is one of these fictitious forces. There appears to be an emergent aspect of the economy that tries to maintain a maximum entropy distribution of price growth rates and attempts to change it (by having prices fall together) has an entropic force pushing against it.

2. Got it, but still how does upward coordination lead to a recession?

I'm under the impression that by recession you really mean adverse deviations from some trend growth state. Is this correct? In other words, the "equilibrium" corresponds to 2.5% but you get 1%. Though not technically a recession in the standard sense it is one in your framework.

With regards to price stickiness, there are plenty of "stories" to explain nominal rigidities. Calvo's is just one of many. Contracts, menu costs, or
so-called efficiency wages are all very simple and yet intuitive ways of explaining this phenomenon. In your framework, the phenomenon is just a manifestation of countervailing entropic forces. But then how do we explain rapidly changing prices as in hyperinflation? Are they just downward rigid?

again, this is a very novel approach and i'm just trying to wrap my head around everything. Thus far, your predictions have led me to investigate further. Regarding your predictions (from the paper and other posts), are they out-of-sample and what's the forecast horizon (one-step ahead)?

3. Your example above of overproduction is an example of a story of how upward coordination could lead to a recession. Producers are optimistic about sales, ramp up production and the sales aren't realized.

That is a negative shock, however I think true recessions are a bit more complicated than that. The negative shocks sometimes produce a much bigger effect than their size alone would dictate. I go into this a bit more here:

http://informationtransfereconomics.blogspot.com/2015/03/non-ideal-information-transfer-tail.html

Regarding hyperinflation, there are actually two solutions to the differential equation that governs the price level. Basically if money is printed faster than nominal output can adjust, you get accelerating inflation.

4. Forgot a link to hyperinflation

http://informationtransfereconomics.blogspot.com/2013/09/hyperinflation.html

The language of the post might be a bit confusing since it uses the physics model language but "floating" vs "constant" is just general vs partial equilibrium.

5. Thanks for the response. I'll check out the links. I should have been clearer in my comment. By "upward coordination" i was referring to situations in which the sales are actually realized. Since this is an instance of coordination and the result (Coordination -> Recession) is independent of the realization, how would this lead to recession? Would situations in which these types of coordination really just be what you described here "Actually, build up at some rate close to the economic growth rate would probably be the equilibrium, not zero growth.)"

This may be a a slight miscommunication.

thanks again. Any information on the questions regarding the forecasts?

6. Sorry about leaving off responding to the question about predictions. I've used various methods with some true out of sample (forecasts) and pseudo out of sample (using part of the available data to predict another part):

Pseudo out of sample:

http://informationtransfereconomics.blogspot.com/2015/08/comparison-of-interest-rate-predictions.html
http://informationtransfereconomics.blogspot.com/2014/05/out-of-sample-predictions-with.html

Forecasts:

http://informationtransfereconomics.blogspot.com/2015/08/latest-pce-inflation-data.html
http://informationtransfereconomics.blogspot.com/2015/07/model-prediction-holding-up-for-japan.html

The horizon is usually about a few years (the number of steps depends on the frequency of the data series). I usually use the same horizon as the models I am comparing to. I go into is a bit more here about the accuracy:

http://informationtransfereconomics.blogspot.com/2014/07/inflation-prediction-errors.html

And I think I understand your question about upward coordination now -- I have to think a bit more about my answer and get back to you.

4. Jason, just to be clear, in your example, the Q week (or any X =/= P week) does suffer an overall decline in output as opposed to the P week, correct? Since it's a "mini-recession?"

5. On the assumption that weekends with less work are the equilibrium condition, what if (all of a sudden, as Nick Rowe likes to say) work were spread evenly over the week? Wouldn't that also indicate a recession?

1. Bill, I think I already asked Jason that in the numerous comments I wrote above (which I wouldn't blame anybody for skipping!). He said essentially switching to any other distribution X (with P =/= X) would result in a D(P||X) > 0, and this includes distribution X = A (Jason describes A as a uniform distribution for the week). So, in other words, yes that would cause a recession too, even though the overall entropy of A < the entropy of P.

2. Yep, that's it. Thanks Tom. It's true that a transition from P to A would in the short term create a recession, but as A is a higher entropy state, overall growth would increase in the long run ... if A becomes the new normal.

3. Thanks, guys. :)

A metaphor I think of is that of a balloon with a certain intrinsic shape, instead of a spherical chicken. ;)

To put it another way, culture matters.

4. An interesting implication seems to be that any sudden change in the distribution, regardless if that change will in the long term create greater, equal or less output than does the current distribution, will result in a recession. Now the question is, what if that sudden change is anticipated well in advance? So that virtually everybody has time to prepare (as if they're preparing for a change to daylight savings time)?

I would guess that if this is a recurring change (like a holiday or daylight savings), it won't be a big deal, but if it's a one time change, regardless of how prepared everybody *thinks* they are, there will be some disruption, depending on the magnitude of the change. Clearly the must ballyhooed Y2K paranoia was not a show stopper, but that might not always be the case. (Y2K did produce an uptick in dry goods and ammo sales probably... and perhaps some hours of lost output due to it being diverted to digging underground end-times bunkers... Lol).

5. ...if anybody here has a well stocked (but unused) Y2K end-times bunker, no offense. I'm sure it's very nice... and your grand kids might appreciate it someday. (c:

Also, that's "much ballyhooed" not "must ballyhooed."

6. Jason, O/T: the discussion arose at Sumner's about falsifiability and the EMH. If the EMH leads to falsifiable predictions, what do you think they are?

1. I think the EMH is a statement of maximum entropy:

http://informationtransfereconomics.blogspot.com/2014/02/ii-entropy-and-microfoundations.html

It is definitely falsifiable (depending on the form ... )

7. I think of the efficient market hypothesis as having a strong form and a weak form. The strong form is 'The price is right.' The weak form is 'Prices follow a random walk.'

It was Graham, IIRC, who observed that the price moves to the largest bid or ask. I think that contradicts the strong form but agrees with the weak form.