Thursday, October 19, 2017

Real growth

In writing my post that went up this afternoon, I became interested in looking closely at US real GDP (RGDP) in terms of a dynamic equilibrium model for nominal GDP (NGDP) and the GDP deflator (DEF) where RGDP = NGDP/DEF. So I went ahead an tried to put together a detailed description of the NGDP data and the DEF data. This required several shocks, but one of the interesting aspects was that there appears to be two regimes:
1. The demographic transition/Phillips curve regime (1960-1990)
2. The asset bubble regime (1990-2010)
The DEF data looks much like the PCE data that I referenced in talking about a fading Phillips curve. The NGDP data is essentially one major transition coupled with two big asset bubbles (dot-com and housing):

These are decent models of the two time series:

Taking the ratio gives us RGDP, and the (log) derivative give us the RGDP growth rate

It's a pretty good model. The main difference is that for the "Phillips curve" recessions, there are large narrow shocks RGDP near the bottom of those business cycles that both narrower and larger in magnitude than we might expect (these are in fact the shocks associated with spiking unemployment rates). We can also separate out the contributions from NGDP and DEF:

Without the data it's easier to see (and I added some labels as well):

It does not currently look like there is another asset bubble forming. This is consistent with the dynamic equilibrium model for household assets, and you can also tell the dot-com bubble was a stock bubble as it shows up in assets and the S&P 500 model. In fact, today's equilibrium in both NGDP and DEF is actually somewhat unprecedented. We might even call it a third regime
3. The equilibrium (2010-present)
In the past, the things that caused business cycles were war, demographic transitions, and asset bubbles. What if there aren't any more recessions? That would be a strange world for macroeconomics. Maybe macro is confused today about productivity slowdowns and secular stagnation because we've finally reached equilibrium when everyone thought the economy was in equilibrium at least at times in the past? In fact, the mid-50s and mid-90s were actually the only times we were close. 

I am pretty sure there will be some asset bubble (or war) in the future because humans. I have no idea what that asset (or war) will be, but it's something we should keep our eyes on. At least, if this model is accurate — therefore I will continue to test it.

But maybe we've finally reached Keynes' flat ocean?

In the right frame, economies radically simplify

I was reading Simon Wren-Lewis on productivity, this out of NECSI, as well as this from David Andolfatto on monetary policy. It sent me down memory lane with some of my posts (linked below) where I've talked about various ways to frame macro data.

The thing is that certain ways of looking at the data can cause you to make either more complicated or less complicated models. And more complicated models don't always seem to be better at forecasting.

Because we tend to think of the Earth at rest, we have to add Coriolis and centrifugal "pseudo forces" to Newton's law because it is a non-inertial frame. In an inertial frame, Newton's laws simplify.

Because ancient astronomers thought not only that they were seeing circles in the sky, but that the Earth was at rest (in the center) they had to add epicycle upon epicycle to the motions of planets. In Copernicus's frame (with a bit of help from Kepler and Newton), the solar system is much simpler (on the time scale of human civilization).

Now let me stress that this is just a possibility, but maybe macroeconomic models are complex because people are looking at the data using the wrong frame and seeing a complex data series?

As I mentioned above, I have written several posts on how different ways of framing the data — different models — can affect how you view incoming data. Here is a selection:

One thing that ties these posts together is that not only do I use the dynamic equilibrium model as an alternative viewpoint to the viewpoints of economists, but that the dynamic equilibrium model radically simplifies these descriptions of economies.

What some see as the output of complex models with puzzles become almost laughably simple exponential growth plus shocks. In fact, not much seems to have happened in the US economy at all since WWII except women entering the workforce — the business cycle fluctuations are trivially small compared to this effect.

We might expect our description of economies to radically simplify when you have the right description. In fact, Erik Hoel has formalized this in terms of effective information: delivering the most information about the state of the system using the right agents.

Whether or not you believe Hoel about causal emergence — that these simplifications must arise — we know we are encoding the most data with the least amount of information because the dynamic equilibrium models described above for multiple different time series can be represented as functions of each other.

If one time series is exp(g(t)), then another time series exp(f(t)) is given by

f(t) = c g(a t + b) + d t + e

And if Y = f(X), then H(Y) ≤ H(X).

[ed. H(X) is the information entropy of the random variable X]

Now this only works for a single shock in the dynamic equilibrium model (the coefficients a and b adjust the relative widths and centroids of the single shocks in the series defined by f and g). But as I mentioned above, most of the variation in the US time series is captured by a single large shock associated with women entering the workforce.

The dynamic equilibrium frame not only radically simplifies the description of the data, but radically reduces the information content of the data. But the kicker is that this would be true regardless of whether you believe the derivation of the dynamic equilibrium model or not.

You don't have to believe there's a force called gravity that happens between any two things with mass to see how elliptical orbits with the sun at one focus radically simplifies the description of the solar system. Maybe there's another way to get those elliptical orbits. But you'd definitely avoid making a new model that requires you to look at the data as being more complex (i.e. a higher information content).

This is all to say the dynamic equilibrium model bounds the relevant complexity of macroeconomic models. I've discussed this before here, but that was in the context of a particular effect. The dynamic equilibrium frame bounds the relevant complexity of all possible macroeconomic models. If a model is more complex than the dynamic equilibrium model, then it has to perform better empirically (with a smaller error, or encompass more variables with roughly the same error). More complex models should also reduce to the dynamic equilibrium model in some limit if only because the dynamic equilibrium model describes the data [1].



[1] It is possible for effects to conspire to yield a model that looks superficially like the dynamic equilibrium model, but is in fact different. A prime example is a model that yields a dynamic equilibrium shock as the "normal" growth rate and the dynamic equilibrium "normal" as shocks. Think of a W-curve: are the two up strokes the normal, or the down? Further data should show that eventually you either have longer up stokes or down strokes, and it was possible you were just unlucky with the data you started with.

Wednesday, October 18, 2017


Scott Sumner has a review of the Rethinking Macroeconomics conference in which he says:
On the negative side, I was extremely disappointed by some of the comments on monetary policy. In response to calls for a higher inflation target to avoid the zero bound problem, Jeremy Stein of Harvard University asked something to the effect "What makes you think the Fed can achieve higher inflation?" (Recall that Stein was recently a member of the Federal Reserve Board.) I was pleased to see Olivier Blanchard respond that there is no doubt that we can achieve 4% inflation, or indeed any trend inflation rate we want. But then Larry Summers also suggested that he shared Stein's doubts (albeit to a lesser extent.) 
I kept thinking to myself: Why do you guys think the Fed is currently engaged in steadily raising the fed funds target? What do you think the Fed is trying to achieve? How can a top Fed official not think the Fed could raise its inflation target during a period when we aren't even at the zero bound? Why has the US averaged 2% inflation since 1990---is it just a miracle?
I've addressed almost this exact statement before (with what is going to be less derision), but the emphasized sentence is either the most innumerate claim I've ever seen from a PhD economist or just an incredibly disingenuous one ... to the point of lying on purpose to deceive.

I tried to select the data series that makes Sumner's claim as close to true as possible. It requires headline PCE inflation, but regardless of the measure you use, you get the same result I will illustrate below.

Why does Sumner choose 1990? Well, it is in fact the only year that makes his claim true:

For later starting years, average inflation is lower; for earlier starting years, average inflation is higher. In fact, average inflation since year Y has been almost monotonically decreasing as a function of Y. Therefore, since it was higher than 2% at some time in the past, the statement "inflation has averaged 2% since Y = Y₀" is true for some Y₀ (and since it is almost monotonic, there is only one such Y₀). It just so happens Y₀ ≈ 1990. There's a miracle all right — but the miracle is that Sumner would pick 1990, not that the Fed would pick 2%. I'm more inclined to believe Sumner chose 1990 in order to keep his prior that the Fed can target whatever inflation rate it wants while the Fed says it's targeting 2% [1].

The other possibility here (Occam's razor) is that inflation is just falling and the Fed has no control over it [2]. But regardless of what is actually happening, Sumner is either fooling himself or others with this "evidence". And as we add more data to this series, unless PCE inflation starts to come in above 2%, Sumner's claim is going to eventually become wrong [3]. Will he reconsider it then? 

This kind of numbers game is really upsetting to me. It is the inflation equivalent of the statements by global warming deniers that there's been "no statistically significant warming since 1997" (which uses the fact that a large volcanic eruption caused temperatures to not rise for a few years, and additionally is playing a rather loose game with the words 'statistically significant' — at the time they were making that claim there wasn't enough data to say any increase was statistically significant unless it was huge).

I know: Hanlon's razor. But in the case of global warming deniers it was a deliberate attempt to mislead.



[1] Somewhere, I don't remember where (possibly Tim Duy?) noticed that the Fed seems to actually be looking at average headline inflation of 2%, which would mean that Sumner should choose 2005 instead of 1990. 

[2] In fact, I think it might be a demographic effect. There is a period of "normal" core PCE inflation of 1.7% in the 1990s:

[3] My estimate says it'll be some time after 2020 for most plausible paths of inflation.

Tuesday, October 17, 2017

10 year interest rate forecasts in the US and UK

A couple of continuing forecast validations — this time, it's the interest rate model (which has been used by a Korean blog called Run Money Run for Korea, Japan, and Taiwan). Specifically, we're looking at the 10-year interest rate model for both the US (which has been going for 26 months now) and UK (only  a few months):

The US graph contains forecasts from  the CBO from December of 2016 as well as a concurrent forecast from the Blue Chip Economic Indicators (BCEI) — which I love to point out costs thousands of dollars to access their insights in their journal.

Social constructs are social constructs

Noah Smith stepped into a bit of a minefield with his "scientific facts are social constructs" thread — making fun of the idea here [tweet seems to be deleted; it was referring to this tweet], attempting to get a handle on the utter philosophical mess that followed here. With the latter tweet, he illustrates that there are many different things "scientific facts are social constructs" could mean. We have no idea of the original context of the statement, except that it was in an anthropology class [0].

Clearly on some level, scientific facts are not social constructs in the sense that they fail to exist or function differently in a different society. My computer and the network it is attached to functions in exactly the way it is supposed to based on scientific facts in order for me to deliver this text to you via http. This is the universe of physics, computer science, and engineering. We are crossing model levels and scales here — from the human to the electron. As Erik Hoel shows, it is entirely possible that you cannot begin to even formulate what you mean by "social construct" and "electric current" at sufficient fidelity simultaneously (one is a description of macro states and the other is a description of micro states).

But this was an anthropology class. In anthropology, the process of science and the social constructs of society (including the process of science) are in a sense at the same level. It is entirely possible for the social process of science to interact with the anthropological states. Think of this as a "quantum uncertainty principle" for social theories. The process of measuring anthropological states depends on the social scientific process measuring it in the metaphorical sense that measuring the position of an electron depends on the momentum of the photon measuring it. It's a good thing to keep in mind.

However, in a sense, we have no possible logical understanding of what is a social construct and what isn't because we have empirical evidence of exactly one human species on one planet. You need a second independent society to even have a chance at observing something that gives you insight as to how it could be different. Is an electron a social construct? Maybe an alien society kind of bypassed the whole "particle" stage and think of electrons instead as spin-1/2 representations of the Poincare group with non-zero rest mass. The whole particle-wave duality and Hydrogen atom orbitals would be seen as a weird socially constructed view of what this alien society views as simply a set of quantum numbers.

But that's the key: we don't have that alien society, so there's no way to know. Let's designate the scientific process by an operator P = Σ |p⟩ ⟨p|. We have one human society state |s⟩, so we can't really know anything about the decomposition of our operator in terms of all possible societies s':

P = Σ Σ ⟨s'|p⟩ |s'⟩ ⟨p|

We have exactly one of those matrix elements ⟨s'|p⟩, i.e. s' = s for ⟨s|p⟩. Saying scientific facts are social constructs is basically an assumption about the entire space spanned by societies |s'⟩ based on its projection in a single dimension.

If you project a circle onto a single dimension, you get a line segment. You can easily say that the line segment could be the projection of some complex shape. It could also be a projection of a circle. Saying scientific facts are social constructs in general is saying that the shape is definitely very complex based on zero information at all, only the possibility that it could be. And yes, that is good to keep in mind. It should be part of Feynman's "leaning over backwards" advice, and has in fact been useful at certain points in history. One of my favorites is the aether. That was a "scientific fact" that was a "social construct": humans thought "waves" traveled in "a medium", and therefore needed a medium for light waves to travel in. This turned out to be unnecessary, and it is possible that someone reading a power point slide that said "scientific facts are social constructs" might have gotten from the aether to special relativity a bit faster [1].

However, the other thing that anthropology tries to do is tease out these social constructs by considering the various human societies on Earth as sufficiently different that they represent a decent sampling of those matrix elements ⟨s|p⟩. And it is true that random projections can yield sufficient information to extract the underlying fundamental signal behind the observations (i.e. the different scientific facts in different sociological bases).

But! All of these societies evolved on Earth from a limited set of human ancestors [2]. Can we really say our measurements of possible human societies are sufficiently diverse to extract information [3] about the invariant scientific truths in all possible societies including alien societies? Do we really have "random projections"? Aren't they going to be correlated?

So effectively we have come to the point where "scientific facts are social constructs" is either vacuous (we can't be sure that alien societies wouldn't have completely different sets of scientific facts) or hubris (you know for certain alien societies that have never been observed have different scientific facts [4]). At best, we have a warning: be aware that you may exhibit biases due to the fact that you are a social being embedded in society. But as a scientist, you're supposed to be listing these anyway. Are anthropologists just now recognizing they are potentially biased humans and in their surprise and horror (like fresh graduate students being told every theory in physics is an effective theory) they over-compensate by fascistically dictating other fields see their light?
Yes, anthropology: 
Anthropologists can affect, and in fact are a part of, the system they're studying. We've been here for awhile. 
xoxo, physics.
Now, can we get back to the search for some useful empirical regularities, and away from the philosophical argy-bargy?



[0] Everyone was listing unpopular opinions the other day and I thought about putting mine up: It is impossible understand even non-mathematical things without understanding math because you have no idea whether or not what you are trying to understand has a mathematical description of which you are unaware. This post represents a bit of that put into practice.

[1] Funny enough, per [0], Einstein's "power point slide" was instead math. His teacher Minkowski showed him how to put space and time into a single spacetime manifold mathematically.

[2] Whether or not evolution itself is a social construct, you still must consider the possibility that evolution could have in fact happened in which case we just turn this definitive absolute statement into a Bayesian probability.

[3] At some point, someone might point out that the math behind these abstract state spaces is itself a social construct and therefore powerless to yield this socially invariant information. However, at that point we've now effectively questioned what knowledge is and whether it exists at all. Which is fine.

[4] I find the fact that you could list "scientific facts are social constructs" as a "scientific fact" (in anthropology) that is itself a social construct to be a bit of delicious irony if not an outright Epimenides paradox.

Thursday, October 12, 2017

Bitcoin model fails usefulness criterion

Well, this would probably count as a new shock to the bitcoin exchange rate:

In fact, you can model it as a new shock:

Since we're in the leading edge of it, it's pretty uncertain. However, I'd like to talk about something I've mentioned before: usefulness. While there is no particular reason to reject the bitcoin dynamic equilibrium model forecast, it does not appear to be useful. If shocks are this frequent, then the forecasting horizon is cut short by those shocks — and as such we might not ever get enough data without having to posit another shock thereby constantly increasing the number of parameters (and making e.g. the AIC worse).

Another way to put this is that unless the dynamic equilibrium model of exchange rates is confirmed by some other data, we won't be able to use the model to say anything about bitcoin exchange rates. Basically, the P(model|bitcoin data) will remain low, but it is possible that P(model|other data) could eventually lead us to a concurrence model.

As such, I'm going to slow down my update rate following this model [I still want to track it to see how the data evolves]. Consider this a failure of model usefulness.


Update 17 October 2017

Starting to get a handle on the magnitude of the shock — it's on the order of the same size as the bitcoin fork shock (note: log scale):

Update 18 October 2017

More data just reduced uncertainty without affecting the path — which is actually a really good indication of a really good model! Too bad these shocks come too frequently.

Wednesday, October 11, 2017

Scaling of urban phenomena

Via Jason Potts, I came across an interesting Nature article [1] on the scaling of urban phenomena. In particular, the authors propose to explain the relationships in the graphic above.

Now the paper goes much further (explaining variance and the scaling exponents themselves) than I will, but I immediately noticed these relationships are all information equilibrium relationships ⇄ N with information transfer indices β:

log Y/Y₀ = β log N/N₀

The reasoning behind this relationship is that the information entropy of the state space (opportunity set) of each phenomena (Y) is in equilibrium with the information entropy of the population (N) state space. This falls under deriving the scaling from the relationship of surfaces to volumes mentioned in the paper (you can think of the information content of a state space as proportional to its volume if states are uniformly distributed, and the IT index measures the relative effective dimension of those two state spaces).

I wonder if adding shocks to the dynamic equilibrium rate (d/dt log Y/N) handles some of the deviations from the linear fit. For example, the slope of the upper left graph should actually relate to the employment population ratio — but as we know there was a significant shock to that ratio in the 70s (due to women entering the workforce). I can't seem to find employment population ratio data at the city level. There is some coarse data where I can get employed in e.g. Seattle divided by King county population as a rough proxy. We can see at the link there's a significant effect due to shocks (e.g. the recessions and the tail end of women entering the workforce). The model the authors use would imply that this graph should have a constant slope. However, the dynamic equilibrium model says that it has constant slope interrupted by non-equilibrium shocks (which would result in data off of the linear fit).

But this paper is interesting, especially in its description of an underlying model — a place where the information equilibrium approach is agnostic.



[1] The article itself is oddly written. I imagine it is due to the house styles of Nature and Harvard, but being concise does not seem to be a primary concern. For example, this paragraph:
The central assumption of our framework is that any phenomenon depends on a number of complementary factors that must come together for it to occur. More complex phenomena are those that require, on average, more complementary factors to be simultaneously present. This assumption is the conceptual basis for the theory of economic complexity.
could easily be cut in half:
The central assumption of our framework is that phenomena depend on multiple simultaneous factors. This assumption is behind economic complexity theory.
Another example:
We observe scaling in the sense that the counts of people engaged in (or suffering from) each phenomenon scale as a power of population size. This relation takes the form E{Y|N} = Y₀ N^β, where E{⋅|N} is the expectation operator conditional on population size N, Y is the random variable representing the ‘output’ of a phenomenon in a city, Y₀ is a measure of general prevalence of the activity in the country and β is the scaling exponent, that is, the relative rate of change of Y with respect to N.
could also be cut in half:
The number of people experiencing each phenomenon is observed to scale as a function of population size E{Y|N} = Y₀ N^β, where E{⋅|N} is the expectation operator conditional on population size N, Y is the number of people experiencing a phenomenon in a city with scale parameter Y₀ and β, the scaling exponent.
I could even go a bit further:
The number of people experiencing each phenomenon is observed to scale as a function of population size Y ~ N^β, where N is the population size, Y is the number of people experiencing a phenomenon in a city, and β is the scaling exponent.

Dynamic equilibrium: US prime age population

There was a tweet saying that the US prime age population (25-54) hadn't increased in a decade. I decided to get a handle on the context in terms of the dynamic equilibrium model:

It's true this population measure hasn't increased in a decade, but that is more a measure of the size of the shock due to the recession (leading to e.g. reduced immigration) than anything special about today. In fact, the growth rate today is consistent with twenty-first century prime age population growth.

JOLTS leading indicators update

The August 2017 JOLTS numbers are out (July numbers comparison is here), and the hires series is continuing a correlated deviation from the dynamic equilibrium:

There's still insufficient data to declare a shock, and the best fit results in only a small shock [1]:



[1] The evolution of the shock counterfactual is relatively stable:

Saturday, October 7, 2017

Compressed sensing and the information bottleneck

For those that don't know, my day job is actually in signal processing research and development in the aerospace sector. As I document in my book, I came by economics research via a circuitous route. One subject I worked on for awhile (and still do to some extent) is called compressed sensing (Igor Carron's blog is a great way to keep up with the state of the art in that field, and his Google site provides a nice introduction to the subject).

One of the best parts about Igor's blog is that he brings together several lines of research from machine learning, matrix factorization, compressed sensing, and other fields and frequently finds connections between them (they sometimes appear in his regular feature "Sunday Morning Insight").

In that spirit — although more of  a Saturday Afternoon Insight — I thought I'd put a thought out there. I've been looking at how the price mechanism relates to the information bottleneck (here, here), but I've also mused about a possible connection between the price mechanism and compressed sensing. I think now there might be a connection between compressed sensing and the information bottleneck.

In compressed sensing, you are trying to measure a sparse signal (a signal that appears in only a sparse subset of your space x, like a point of light in a dark image or a single tone in a wide bandwidth). To do so, you set up your system to make measurements in what is called the dense domain — through some mechanism (Fourier transform, random linear combinations, labeled with Φ) you make the variable you wish to measure appear throughout the space y. Therefore a few random samples of the entire dense space give you information about your sparse signal, whereas a few random samples of an image with a single bright point would likely only return dark pixels with no information about the point.

Is this how the information bottleneck works? We have some domain X in which our signal is just a small part (the set of all images vs the set of images of cats), and we train a feedforward deep neural network (DNN, h1 h2 → ... hm) that creates a new domain Y where our signal information is dense (cat or no cat). Every sample of that domain tells us information about whether there is an image of a cat being fed into the DNN (i.e. if it identifies cats and dogs, a result of dog tells us it's not a cat).

In compressed sensing, we usually know the some properties about the signal that allow us to construct the dense domain (sparse images of points can be made dense by taking a 2D Fourier transform). However, random linear combinations can frequently function as a way to make your signal dense in your domain. In training a DNN, are we effectively constructing a useful random projection of the data in the sparse domain? As we push through the information bottleneck, are we compressing the relevant information into a dense domain?

The connection between compressed sensing and the structure of a neural net has been noted before (see e.g. here or here), the new part (for me at least) is the recognition of the information bottleneck as a useful tool to understand compressed sensing — "opening the black box" of compressed sensing.

Friday, October 6, 2017

(Prime age) civilian labor force participation data

In addition to the unemployment rate, there is also new data for the prime age civilian labor force participation rate which we can use to track the performance of our forecast (last updated here):

Latest unemployment data

New unemployment data is out, so it's time to check to see how the forecasts are doing compared to reality. First, I want to throw out two forecasts as rejected: one from me, and one from the FRBSF. I started putting them on the same graph with the dynamic equilibrium model here, but the original forecast of mine was made as part of my effort to come up with way to forecast recessions. With what I know now, I wouldn't have made this forecast — the tolerance for positing a recession was too low and would have choked on earlier data if used in this model.

Here is the graph with the latest unemployment data on it:

The gray forecast assumed a recession was happening in the next few quarters, while the red dynamic equilibrium forecast assumes no shocks. The former is resoundingly rejected. Now how about a statement from the FRB SF rejecting their previous forecast?

Instead of rejecting their previous forecasts, the FRB SF has continually been updating their forecasts over time as the future they predict fails to materialize (which I noted in this post making the point that forecast instability is a sign you have the wrong model, and it's the point I am making with this gallery). I've also added the FOMC's forecast to the series of head to heads:

The FOMC does basically the same thing, which I've emphasized by adding in their December 2014 forecast in purple.


The FRB SF has yet another forecast update, which I have added to the graph above:

This kind of forecast updating would be fine if it a) was stable, and b) had a longer period of success relative to the forecast length. If a forecast is made for a couple years in the future but only works for a couple of months, you should stop forecasting longer than a couple of months.

The thing is that if there is a recession that starts in the next couple years, the latest forecast will be seen as correct despite the fact that nearly every prior forecast was wrong over this length of time. It is unscientific. Much like the perpetual pessimists always forecasting a recession being seen by some as being successful when a recession happens (Hello, Steve Keen!), it is a failure of Feynman's "leaning over backwards" to reject your own theories and models.

Thursday, October 5, 2017

The price mechanism as information bottleneck

I've been reading and writing about the "information bottleneck" lately (e.g. this paper, or e.g. this post) focusing on how it might relate to the price mechanism. In the post, I argued that the price mechanism works by destroying information instead of aggregating or communicating it.

I thought this might be a neat example to try out Mathematica's Classify machine learning function. So I set up some training data on a simple system with three agents (1, 2, 3), a price that could take on three values (1, 2, 3) for an allocation of three units of one good. Of course, one one hand all the threes make this confusing — but on the other hand this website is free.

There are ten different possible allocations of three widgets across three agents which I designate by a list of three numbers: e.g. {1, 2, 0}, {0, 0, 3}, {1, 1, 1}, etc. Each allocation is then related to a price in the training data; here's a graphical representation of that (noisy) training data (that we'll later relate to the information bottleneck):

The prices are on the right, and the various possible allocations are on the left, with the arrows showing when a price was related to a particular allocation (sometimes multiple times, and sometimes an allocation was related to two different prices). Running c = Classify[trainingData], we get a function c[.] that maps an allocation to a price p:

c[{3, 0, 0}] = 1
c[{2, 1, 0}] = 2

If we look at the various allocations related to each price (and weight them by their probabilities), we can get an idea of a "typical" allocation that yields each price:

Each price is represented by a different color. The horizontal line at 10% represents the probability of any particular allocation if we had a uniform distribution over the different allocations (since there are 10 of them). It's also the result when the machine learning algorithm fails, essentially choosing the least informative prior.

We can see when the price is p = 1, then agent 1 ends up with more of the stock of widgets. When p = 2, the distribution is more uniform (it was set up as the "equilibrium price" in the training data). Although each agent in this particular setup is a consumer, we can think of 1 as the "consumer" and 3 as the "producer". If the price is too high, agent 3 ends up with more of the goods on average (they don't sell); if the price is too low, agent 1 does (over-consumption).

we can look at the information entropy of these allocations, and it is indeed maximized for the equilibrium price p = 2 (by construction):

We have an information bottleneck where these three price values (1.6 bits) are destroying the irrelevant information and capturing relevant information about the opportunity set (3.3 bits, for a loss of 1.7 bits — more than half the information content) [1].

I borrowed this information bottleneck diagram from this paper:

In our case, $X$ is the allocation (state space), and $Y$ is the price. Our classify function c[state] represents $\hat{X}$ and $\hat{Y}$ is the output of that function. It was trained on the data (the diagram at the top of this post). Of course, Classify isn't really doing this with a Deep Neural Network (there's actually just one hidden layer with 8 nodes), but what I'm trying to illustrate here is the formal similarities between destroying information in the price mechanism and the information bottleneck.

We can envision the price mechanism as setting up a primitive neural network machine learning algorithm: the price functioning as an autoencoder of the state space information, destroying the irrelevant information in the information bottleneck, and then the flow of money reinforces the connection between neurons (i.e. exchages between agents).

We can add a second state space defining the demand for widgets (the state space above defines the supply). If these state spaces match up, then the supply and the demand will see the "equilibrium" price for the equilibrium allocation. Deviations on either side will will mean the market price will differ from the price derived from either the supply distribution or the demand distribution. Information will flow from supply to demand (or vice versa) via exchanges, and the price will change to represent the new state. This process will continue until the relevant information content of the supply distribution (captured via the bottleneck, with irrelevant information being destroyed) is equivalent to the information content of the demand distribution — i.e. information equilibrium.

If we take demand as constant (i.e. the real data we are trying to learn), this is identical to training a neural network with a Generative Adversarial Network (GAN) algorithm. Different supply distributions are created via exchanges and the price (the bottleneck) discriminates between them leading what should eventually be identical distributions on both sides when the price can no longer discriminate (i.e. is constant) between the supply distribution and the demand distribution.

Or at least that is how I am thinking about this at the moment. It is possible we need to look at the joint distribution of supply and demand as one big state space. More work to be done!



[1] Additionally, I went through and did random trades among the agents (select two agents at random, and if one agent has more widgets than the other and the other has money at the price dictated by the future allocation — i.e. the allocation that would result from a trade — there's a trade). This eventually produces an equilibrium (an equilibrium price of 2 with a uniform allocation):

I want to eventually make the machine learning algorithm re-train on the new data that's produced from a transaction, which would likely reinforce some price probabilities and reduce others.

Sunday, October 1, 2017

The price mechanism and the information bottleneck

David Glasner has a nice post on "imperfect information" in economics. In it, he discusses how the idea of painting Hayek and Stiglitz as "polar opposites" generally gets it wrong, and that Hayek didn't think markets had "perfect information". What was interesting to me is that a significant number of the arguments with commenters and on Twitter that resulted from my Evonomics piece tried to make a similar point: that Hayek didn't say markets were always perfect. As I mention in my response, I never said that Hayek thought markets were perfect — quoting precisely a passage where Hayek says they're not perfect [1].

My contention is that not only aren't markets perfect, but even if they work they are not working in the way Hayek says they work when he looks at the case of functioning markets. I will also argue that the fact that neither a central planner nor a market can actually receive or transmit the information claimed to be flowing, making Hayek's argument against central planning simultaneously an argument against markets — if they function the way Hayek claims they function. However, I will conclude with a discussion on how the price mechanism may actually function by destroying information. 

Let's start with Glasner quoting Timothy Taylor quoting Hayek:
[The market is] a system of the utilization of knowledge which nobody can possess as a whole, which ... leads people to aim at the needs of people whom they do not know, make use of facilities about which they have no direct [knowledge]; all this condensed in abstract signals ...
Glasner responds to this (and the rest of the quoted section of Taylor's post):
Taylor, channeling Bowles, Kirman and Sethi, is here quoting from a passage in Hayek’s classic paper, “The Use of Knowledge in Society” in which he explained how markets accomplish automatically the task of transmitting and processing dispersed knowledge held by disparate agents who otherwise would have no way to communicate with each other to coordinate and reconcile their distinct plans into a coherent set of mutually consistent and interdependent actions, thereby achieving coincidentally a coherence and consistency that all decision-makers take for granted, but which none deliberately sought. The key point that Hayek was making is not so much that this “market order” is optimal in any static sense, but that if a central planner tried to replicate it, he would have to collect, process, and constantly update an impossibly huge quantity of [knowledge].
There is an issue where in economics the words "information" and "knowledge" are synonymous (just like the colloquial English definitions [2]) that gets in the way of talking about this in terms of information theory. Therefore I traded "information" for "knowledge" in the quotes above (emphasizing with brackets). Knowledge is meaningful, whereas information represents a measure of the size of an available state space (weighted by probability of occupation) regardless of whether a state selected from it is meaningful. The phrases "The speed of light is a constant" and "Groop, I implore thee, my foonting" are drawn from a state space of approximately the same amount of information (the latter actually requires more), but the former is more meaningful and represents more knowledge.

This measure of information was designed to understand how to build systems that enable you to transmit either message. I'm not trying to say that Claude Shannon's definition is "better" than the economics definition or anything — there's simply a technical meaning given to it in information theory because of a distinction that hasn't been necessary in economics. In defining it, Shannon had to emphasize "information must not be confused with meaning".

However, this semantic issue allows us to get a handle on the mathematical issue with Hayek's mechanism. There is no way for this "impossibly huge quantity of knowledge" to be condensed into a price (a single number) because the amount of information (e.g. the thousands of — including "expected" — production numbers [x1, x2, x3, ... ], where the "knowledge" of them represents a specific set [42, 6, 9, ... ]) is too great to be conveyed via that single number without an encoding scheme and drawing out the message over time. You could e.g. encode the numbers as Morse code and fluctuate the price over a few seconds, but the idea that there are messages like that in market prices is so laughable that we don't even need to discuss it. I'll continue use brackets to emphasize use of the technical distinction below.

Therefore one thing that market prices are not doing is "condensing" or "transmitting and processing" dispersed knowledge. Prices are incapable of carrying such an information load. The information is largely being destroyed rather than processed or compressed.

When Stiglitz and others talk about imperfect [knowledge], they are actually talking about the fact that the information has been destroyed. A price of a used car isn't going to allow me to glean enough information about the state of that car — especially if you place the desires of the human used car salesperson to get a good price for it. Where an "honest" salesperson might price the car below Blue Book value because it has been flood damaged, the buyer's imperfect [knowledge] of the flood damage means the salesperson would rationally try to get Blue Book value. However, even a sub-Blue Book price cannot communicate the information state of the accident history, transmission, engine, etc in addition to that flood damage.

There's already an [information] asymmetry between the available states the car could be in and the available states the price could take. There is the additional [knowledge] asymmetry made famous by Akerlof's The Market for Lemons on top of that.

But, you say, the price mechanism seems to function "as if" it is communicating information. I guess you could devise an effective theory where the state space information is actually really small (undifferentiated widgets that have some uniform production input). But that's basically just another way to describe the argument above: in order for the price to transmit dispersed knowledge, there mustn't be much knowledge to be transmitted. In a sense, this makes Hayek's argument against central planning a kind of straw man argument. Sure, a central planner can't collect and process all of this information, but the price mechanism can't do this either.

One of the reasons I belabor this particular point is because in trying to understand how information equilibrium relates to economics, I had to understand this myself. As I said in my "about me" blog post:
... I stumbled upon this paper by Fielitz and Borchardt and tried to apply the information transfer framework to what is essentially Hayek's description of the price mechanism. That didn't exactly work, but it did work if you thought about the problem differently.
The part that "didn't exactly work" was precisely Hayek's description of information being compressed into the price. You had to think about the problem differently: the price was a detector of information flow, but unlike a thermometer or a pressure gauge (that have a tiny interface in order to not influence what it is measuring) the price is maximally connected to the system. The massive amount of information required to specify an economy was actually flowing between the agents in the economy itself (i.e. the economic state space information), with the price representing only a small amount of information.

But if this is true, then we might ask: Since it frequently appears to work in practice, how could the price mechanism work when it does?

I think the answer currently is that we don't know. However, I am under the impression that research into machine learning may yield some insights into this problem. What is interesting is that the price not as receiver but rather as detector is reminiscent of a particular kind of machine learning algorithm called Generative Adversarial Networks (GANs). GANs are used to train neural nets. They start with essentially randomly generated data (the generative bit) which is then compared to the real data you want the neural net to learn. A "discriminator" (or "critic" in some similar methods) checks how well the generator's guesses match the real data. 

Imagine art students trying to copy the style of van Gogh, and the art teacher simply saying you're doing well or not. It is amazing that this can actually work to train a neural net to copy the style of van Gogh (pictured above). A simpler but similar situation is a game of "warmer/cooler" where someone is looking for an object and the person who knows where it is tells them if they are getting warmer (closer) or cooler (farther). In this case, it is not as counterintuitive that this should work. Much like how it is not as problematic for Hayek's price mechanism to operate with generic widgets, what we have in the case of a game of "warmer/cooler" is very low dimensional state space so the sequence of "warmer/cooler" measurements from the "discriminator" is much closer in information content to the actual state space. In the case of van Gogh style transfer, we have a massive state space. There is no way the sequence of art teacher comments could possibly come close to the amount of information required to specify a van Gogh-esque image in state space.

However, information must be flowing from the actual van Gogh (real data) to the generator because otherwise we wouldn't be able to generate the van Gogh-esque image. The insight here is that information flows from the real data to the generator, and the quantity of information flowing will be indicated by the differences between the different discriminator scores. A constant score indicates no information flow. A really big improvement in the score indicates a lot of information has flowed.

Again, we don't know exactly how this works for high dimensional state spaces, but a recent article in Quanta magazine discusses a possible insight. It's called the "information bottleneck". In the information bottleneck, a bunch of information about the state space in the "real data" that doesn't generalize is destroyed (e.g. forgetting irrelevant correlations), leaving only "relevant" information about the state space.

To bring this back to economics, what might be happening is that the price mechanism is providing the bottleneck by destroying information. Once this information is destroyed, what is left is only relevant information about the the economic state space. My private information about a stock isn't aggregated via the price mechanism, but rather is almost entirely obliterated [3] when the market is functioning.

With most of this private information being obliterated in the bottleneck, measurements of the information content of trades should actually be almost zero if this view is correct. It is interesting that Christopher Sims has found that only a few bits of information in interest rates seems to be used by economic agents, and other research shows that most traders seem to be "noise traders". Is the information bottleneck destroying the remaining information?

This is speculation at this stage; I'm just thinking out loud with this post. However the information bottleneck is an intriguing way to understand how the price mechanism can work despite a massive amount of information falling on the floor.



[1] Hayek from The Use of Knowledge in Society:
Of course, these [price] adjustments are probably never "perfect" in the sense in which the economist conceives of them in his equilibrium analysis. But I fear that our theoretical habits of approaching the problem with the assumption of more or less perfect knowledge on the part of almost everyone has made us somewhat blind to the true function of the price mechanism and led us to apply rather misleading standards in judging its efficiency. The marvel is that in a case like that of a scarcity of one raw material, without an order being issued, without more than perhaps a handful of people knowing the cause, tens of thousands of people whose identity could not be ascertained by months of investigation, are made to use the material or its products more sparingly; i.e., they move in the right direction. This is enough of a marvel even if, in a constantly changing world, not all will hit it off so perfectly that their profit rates will always be maintained at the same constant or "normal" level.
[2] The definitions that come up from Google searching "define knowledge" and "define information":
knowledge: facts, information, and skills acquired by a person through experience or education; the theoretical or practical understanding of a subject.
information: facts provided or learned about something or someone.
The difference between these definitions is basically the inclusion of "skills". What's also interesting is that the second definition for information gets better:
information: what is conveyed or represented by a particular arrangement or sequence of things.
Although the information theory definition of information entropy depends on the state space of possibilities that particular arrangement was selected from.

[3] In fact, the cases where my information isn't obliterated but rather amplified may well be the causes of market failures and recessions. Instead of my fear that a stock price is going to fall being averaged away among the optimistic and pessimistic traders, it becomes amplified in a stock market crash. The information transfer framework labels this as "non-ideal information transfer" (a visualization using a demand curve as an example is here).