Monday, April 30, 2018

The ability to predict

While I might agree with the subtitle of this piece "The Problem isn’t Bad Economics, It’s Bad Science" by David Orrell, there appears to be a deficit of Feynman's other test of science:
But there is one feature I notice that is generally missing in Cargo Cult Science. That is the idea that we all hope you have learned in studying science in school—we never explicitly say what this is, but just hope that you catch on by all the examples of scientific investigation. It is interesting, therefore, to bring it out now and speak of it explicitly. It’s a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty—a kind of leaning over backwards. For example, if you’re doing an experiment, you should report everything that you think might make it invalid—not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you’ve eliminated by some other experiment, and how they worked—to make sure the other fellow can tell they have been eliminated.
The first thing to note is that the quote from Feynman at the top of the article is taken out of context, and does not mean what Orrell implies with regard to forecasting in economics. Leaning over backward requires a bit of context; here's the original quote (emphasis added):
There are those who are going to be disappointed when no life is found on other planets. Not I — I want to be reminded and delighted and surprised once again, through interplanetary exploration, with the infinite variety and novelty of phenomena that can be generated from such simple principles. The test of science is its ability to predict. Had you never visited the earth, could you predict the thunderstorms, the volcanoes, the ocean waves, the auroras, and the colorful sunset? A salutary lesson it will be when we learn of all that goes on on each of those dead planets — those eight or ten balls, each agglomerated from the same dust cloud and each obeying exactly the same laws of physics.
Feynman is saying that it is impossible for humans to predict the "infinite variety ... of phenomena" based on only knowledge of the "simple principles" (i.e. underlying physics). He is motivating interplanetary exploration; he is talking about collecting data, not validating theory. He thinks physics is already validated and wants to see what other kinds of weird things it explains. Also, there is a difference here in the meaning of prediction. Feynman is not talking about predicting the future — when a thunderstorm might occur — but rather the existence of thunderstorms. But he is also saying that physics is established and the novel phenomena on other planets (that we can't predict) will be able to be explained by science. No alarms, and no surprises.

Now Feynman was no fan of "social sciences" (from a 1981 BBC interview, though it is not clear that he is talking about economics but rather uses an example about organic food being better for you):
Because of the success of science, there is a kind of a pseudo-science. Social science is an example of a science which is not a science. They follow the forms. You gather data, you do so and so and so forth, but they don’t get any laws, they haven’t found out anything. They haven’t got anywhere – yet. Maybe someday they will, but it’s not very well developed.
Here, Feynman's view of "science" is versus his view of "not science" is what I refer to as "established" and "nascent" science in this blog post. Social sciences like economics are nascent sciences: they don't have any "laws" (i.e. frameworks like Newton's laws that capture empirical regularities). Physics was a nascent science as recently as the 1600s. This is fine; we have to start somewhere.

I went to this length to discuss this quote about prediction because there are things that Feynman would concede is science, but where we are unable to predict in the forecasting sense: the weather more than a week or so in the future; the orbits of planets thousands of years in the future; the precise timing, locations and magnitudes of earthquakes along faults and fatigue fractures. These examples all have at their heart nonlinear and network models that require knowledge of initial conditions at the right scale that is more precise than we might ever obtain making them theoretically or at least practically unpredictable. You can't predict the path of a single photon through the double slit experiment, but you can predict its distribution. It is a question of scope. That is to say the real question is succinctly put by Chris Dillow in his excellent recent post: what can people reasonably be expected to foresee and what not? And at the heart of this question is whether we have good enough models of the phenomena or frameworks to understand them to say whether or not some observable can be forecast and under what scope.

I also wanted to add in Feynman's "leaning over backwards" because this paragraph in Orrell's article fails utterly:
The usual excuse offered for failing to predict the crisis, as Robert Lucas put it in 2009, is that ‘simulations were not presented as assurance that no crisis would occur, but as a forecast of what could be expected conditional on a crisis not occurring.’ That is like a weather forecaster saying their forecast was explicitly based on no storms. (The claim here reflects the efficient market-related idea that changes are caused by random external shocks.) Another is that no one else predicted it either – though a number of heterodox economists and others would beg to disagree.
First, the Bezemer article cited at the end appears to have fabricated its examples of predictions of the crisis (and Bezemer comes at the subject as biased in favor of heterodox economics). A possible prediction (that is also fabricated) from Wynne Godley about the housing crisis appears to be both a luck of rhetoric and cancelled by Godley himself [1]. This fails Feynman's "leaning over backwards" test of science. Second, this is not the usual excuse, and it is not "leaning over backwards" to call it that. The usual excuse is given by Diane Coyle in an article Orrell actually cites in his article:
But macroeconomics is inherently hard because there is very little data. There is only a handful of key variables, all linked to each other and changing only slowly, the outcome of multiple possible causes in a complex system, and with little opportunity for doing experiments. It will never be able to predict a crisis with a high degree of confidence.
This is the same "excuse" noted by Mark Thoma and Noah Smith years ago. There are all kinds of models (a couple helpfully pointed out by Chris Dillow in that same post) that would mean recessions are inherently unpredictable. But we don't even know enough to choose between those models.

This bring us back to the question of What can we reasonably be expected to foresee? Getting back to Feynman's thunderstorms, Orrell believes that the story behind weather forecasting is a useful analogy for transitioning from the view that we can't predict something to being able to make some reasonable forecasts. Admiral Robert FitzRoy, a pioneer in weather forecasting so seminal that he's responsible for the use of the word "forecast", was not "well received ... by ... the scientific establishment" [2]. Weather forecasts currently predict out about 10 to 16 days at varying degrees of resolution (longer at lower resolution). Macro models like DSGE and VARs are actually pretty good a quarter to two in the future (see here, here, and here). The VAR from the Minneapolis Fed has done well with unemployment for over a year:

So is economics now good science given this prediction (and ones like it)? Note in particular that the Minneapolis Fed is a bastion of so-called freshwater macro and neoclassical economics decried in Orrell's article via quotes as being an "unquestioned belief system" that students want to be "liberated from". The obvious retort is that we're not talking about predicting the unemployment rate several months out, but rather financial crises and recessions. That means we're no longer saying "the test of science is the ability to predict", but rather "the test of science is the ability to predict things I selected". If we're doing that, let me just put my unemployment rate forecast (detailed in my paper) out there:



David Orrell's thing is "quantum economics". He's written several papers with an understanding of quantum mechanics I can only guess was based on reading Michio Kaku's popular science books. There is no math in them despite most of the ideas in quantum physics being mathematical in nature (i.e. commuting and non-commuting observables, which is exactly what I noted here about "quantum psychology"). It's just the word quantum that sounds cool. The quantum finance he mentions (but doesn't engage with) is based on the path integral approach (in e.g. this book I used when I wanted to become a "quant" years ago), but path integrals are related to thermodynamic partition functions making this effectively a sum over Brownian motion paths not entirely dissimilar from the statistical averages in the stochastic calculus of the Black-Scholes equation. 

These papers also fail to make any empirical predictions or really engage with data at all. I get the impression that people aren't actually interested in making predictions or an actual scientific approach to macro- or micro-economics, but rather in simply using science as a rhetorical device. On this blog, my critiques of macro- or micro-economics as poor science get roughly 4 to 10 times the number of views as the predictions or empirical work I've done. The same goes for Twitter. This makes sense of Orrell's papers and articles: the stuff that gets views isn't actual science but rather talking about the metaphysics surrounding it.

The galling thing is that Orrell starts with a quote about making predictions and saying economics is bad science, but then closes promoting a book that (based on the source material in his papers) won't make any predictions or engage with empirical data either. Like Steve Keen, this approach represents exactly what is wrong with parts of macro while claiming to decry it.


On Twitter, I apparently gave my thoughts on another of Orrell's articles (in Aeon) before after being asked by Diane Coyle earlier this year.



[1] Godley makes a "prediction" of a household debt-led crisis in 1999 but then says in 2003 that process has been averted with government deficit spending becoming the bigger issue. The exact "eight years" quote in 1999 (that gives the purported 2007 date of crisis) is an opening line: "The US economy has now been expanding for nearly eight years ...". This represents the duration of the expansion after the 1991 recession up until the paper's release. This does not appear to be some sort of model.

In 2003, the Bush administration had started military operations and the post-9/11 boost to military spending along with major tax cuts ended the budget surplus of the late 90s. Godley, using his sectoral balance approach, then says that government current account deficits and not household debt (which had flattened in the aftermath of the dot-com bust and the early 2000s recession) was on an unsustainable path. As the two sectors (government and households) are linked by sectoral balance (one declines and the other increases), this interpretation makes sense. But this interpretation also means that Godley is no longer seeing a housing crisis until 2006, after which it had partially already begun and other more mainstream economists (like Paul Krugman) had already said the same thing in 2005. See here.

[2] FitzRoy apparently faced so much opposition from the scientific establishment that he was a protege of a member of the Royal Astronomical Society, recommended by the President of the Royal Society, and given sufficient budget to support himself and three employees [wikipedia]:
As the protégé of Francis Beaufort [of Beaufort wind scale fame], in 1854, FitzRoy was appointed, on the recommendation of the President of the Royal Society, as chief of a new department to deal with the collection of weather data at sea. His title was Meteorological Statist to the Board of Trade, and he had a staff of three. This was the forerunner of the modern Meteorological Office.

PCE inflation

The latest monthly PCE inflation data is out and so I'll continue to document the performance of the dynamic information equilibrium model forecast. Here's the basic version with post-forecast data in black (error bands are 70% and 90% confidence):

And here is a head-to-head with the FRB NY DSGE model (that I should have updated in this post alongside GDP because quarterly data came out last week; also note the post comparing with Minneapolis Fed VAR models):

The second graph has 50% and 90% confidence intervals (and is 4Q inflation instead of continuously compounded annual rate of change) to match with the DSGE model. As always, click for higher resolution graphs.

Saturday, April 28, 2018

Comparing my forecasts to VARs

On my Twitter link to this post on forecasts, @UnlearningEcon asked about the performance of the dynamic information equilibrium models versus Vector AutoRegressions or VARs (instead of DSGE models). The Minneapolis Fed has a VAR model with forecasts of various vintages available. Exact matches weren't available, but the dynamic equilibrium model over the past 10 years forecasts roughly constant PCE inflation and RGDP (and constantly declining unemployment rate) so these forecasts are sufficient to get a feeling for the relative performance. The dynamic equilibrium models are in red/red-orange and the MF VARs are in green.

First, let's look at RGDP:

The error bands of the MF VAR are larger (both show 70% and 90%), but the MF VAR gets the first couple of quarters after the forecast point pretty close because of the method it uses [1]. How about the unemployment rate? Like the FRB SF structural models (which are given with out confidence levels), the MF VAR shows an eventual rise in the unemployment rate that hasn't appeared in the data (the vertical lines show the forecast starting points for the MF VAR and my model):

The forecast error bands are much smaller for the dynamic information equilibrium model. And finally, here is core PCE inflation:

This shows the MF VAR model alongside a constant PCE inflation = 1.7% model, which is equivalent to the dynamic information equilibrium model over the same period (see here). Again, the error bands are much smaller.

Overall, the error bands of the VAR models are larger, and the only improvement over the dynamic equilibrium models is in the AR process forecasts of the data points immediately after the forecast dates (which can be added to the dynamic equilibrium models as discussed in footnote [1]). As the dynamic equilibrium models are much simpler than macro VARs, this is a definite feather in their cap.



[1] One thing I did want to note is that I tend to give the confidence limits for a forecast that are independent of the most recent point in the data — effectively, an AR(0) process. Modeling using e.g. an ARMA(2,1) processes gives you an estimate of the next point that tends to be better for some economic time series because of mean reversion as well as the details of the ARMA(2,1) model itself. But after a couple data points in most macro models, the confidence limits of the ARMA(2,1) approach the AR(0) process. I'll show you with a graph (click to enlarge):

The 90% confidence using just the standard deviation (σ) of the input data of the forecast is shown in dashed red (the "AR(0)" forecast). This is the model error band I usually give. An ARMA(2,1) process shows some interesting fluctuations, but over time the error in the model parameter estimates average out into a measure of the standard deviation of the input data (1.6 σ for 90%). For a lot of macro models, the result is usually a good first point or two, followed by what is essentially the AR(0) forecast. It rarely seems worth the effort, but I have shown it for some forecasts like the one for the S&P 500:

That one is useful because the AR(0) band in red is based on data all the way back to the 1950s, while the more complicated AR process (in blue, but mixed with the red looks purple) is based only on the past few years. That the means and error both come close to converging is a sign of consistency between the approaches and a good check on my math/code.

Friday, April 27, 2018

Are prices transmitting or destroying information?

One of the most difficult aspects of this work is that economics already has some ideas about information. Well, knowledge — not information theory information entropy which is more a probability measure for data than meaningful data itself. The ability to transfer information is necessary to transmit knowledge, but not sufficient. The different terminology leads to difficulties like these twitter threads with Roger Farmer and David Glasner when I say that a price cannot possibly convey the information it is claimed to convey in economics.

It is very true that I could fluctuate a single number and transmit a lot of data. If that number is an electric field, this is the basis for wireless communication. There are various schemes of modulating the amplitude, frequency, and phase of an electric field  that pack a lot of bits into a short time (high bit rate). For example Quadrature Amplitude Modulation (QAM) is a common technique for doing this. The amplitude and phase relative to a carrier wave are shifted and make the signal appear in one of several locations on a "constellation diagram" (or IQ plot for In-phase and Quadrature) like this:

Each point can be assigned several bits like the second diagram (a Gray-coded QAM-16). The information entropy of the possible data in the above diagram is log(16) ≈ 2.8 nats You can further use various error correcting codes to reach arbitrarily close to the "Shannon limit" on the number of nats/bits per second in a given bandwidth in the presence of noise.

This is not happening with prices.

The information content of prices (or price changes) is based on the information entropy of the distribution those changes are drawn from. The information entropy of a normal distribution is ½ log(2πeσ²) where σ² is its variance. No coding schemes, no bandwidths, no meaningful "Shannon limit". Prices derive their information carrying capacity from the processes underlying them, not mathematical codes.

When I say that prices cannot possibly convey the information they are claimed to convey in economics, I am saying the underlying distribution of states for the objects being sold in a market transaction and the distribution of the prices do not have even remotely comparable information entropies (and often don't even have the same dimension with the former's much larger than the latter's). A car that sells for 500 dollars less than its blue book value could have been in an accident or a flood — or both. There are some fascinating stories about people trying to do this kind of thing with some of the digits in bids for the FCC spectrum auctions [pdf] [1]. But by and large, details about the item being sold and the distribution it was drawn from are being destroyed by the price mechanism. A unique car is not specified by some unique price, but rather the irrelevant details of the car (was my Honda built in the US or Japan, and by which route did it arrive at the dealer) are usefully wiped away [2]. The price does not contain that kind of information — it's not even close.

A simple illustration of this can be made using the model in Gary Becker's 1962 paper Irrational Behavior and Economic Theory. In that paper, he illustrated a demand curve by letting agents randomly, but uniformly, select a region of the opportunity set for two goods bound by a budget constraint. As the price for one good changes, the shape of this opportunity changes causing the price to trace out a demand curve like this:

Because we have a uniform distribution P and undifferentiated widgets, the price can convey the information about the underlying degrees of freedom. However, the same price time series can be associated with an entirely different distribution:

Instead of a uniform distribution, we have a normal distribution in the same opportunity set. The difference in information entropy between the uniform distribution and the normal distribution Q (which still has the same price time series) is exactly the information that is lost by using the price mechanism to determine the underlying distribution. The Kullback-Liebler (KL) divergence D(P||Q) would measure a difference between assuming the distribution is P when it is actually Q.

In general information is lost. The actual economic situation contains expectations, future plans, constraints, and random events. The price just plays a game of "warmer/colder". This can easily be seen to work for some kind of binary search, looking for a hidden object in a room. "Colder. Warmer. Colder. Warmer. Warmer. Warmer." until another player calls out, "I found it!" The amazing thing is that this seems to work for markets with a much more complex problem to solve. And I'm not sure we have really found the right framework to understanding exactly how this works ... until recently. Of course, my blog post here is speculative; however this is the sort of thing that is needed to understand how the price mechanism, which destroys so much information about the underlying distributions, manages to get those distributions to match when markets are working.

In machine learning, there is an algorithm that trains a neural network called a Generative Adversarial Network (GAN). A good explanation of how these GANs work was given here, and I'll reproduce their metaphor and diagram:

Imagine a forger (generator G) blindly trying to reproduce a real painting (R). The object of the game being played here is for the forger to fool a detective (D) into thinking a forged painting is real. The detective is using that aforementioned KL divergence, a measure that destroys about as much information about the distributions it is comparing (distributions of color in paintings) as the price mechanism does (distributions of supply and demand). That this works at all is surprising. That it works really well is astounding.

It has recently been conjectured that what is happening in some of these machine learning/deep learning algorithms is what is called the "information bottleneck". Here is a popular article in Quanta magazine; here's a more technical paper. The idea is that deep learning works by destroying irrelevant information, forcing it through a "bottleneck" (minimum of mutual information) [3].

The price mechanism may create precisely such a bottleneck. Instead of Hayek's "system of telecommunications " where prices are "communicating information" [4], we may have prices destroying irrelevant information. Or we may not. The information bottleneck and analogies with GANs may be completely wrong. Regardless, something more complicated than simple transmission of information must be going on [5]. It's simply impossible for a price to carry that much information.



[1] GTE put in some of its supposedly anonymous bids as something like 13,000,483 dollars or 20,000,483 dollars. The "483" spells "GTE" on a telephone keypad in the US. Others would contain codes for particular blocks of the frequency spectrum. There are some similar ideas with prices that end in ".99" representing "cheap" (my local liquor store says that its prices that end in ".97" represent an industry-wide lowest price). But by and large, people are not communicating the yields of soybean harvest using codes in the last digits of the price on a commodity exchange.

[2] Note this is separate from the asymmetric information (which is really asymmetric knowledge) in the case of Akerlof's The Market for Lemons. Regardless of whether the seller rationally holds back on negative knowledge about the car's state, the price itself still cannot convey as much information as there are possible car states. Only if you have undifferentiated widgets and uniform distributions does this become possible (discussed later in the post).

[3] Deep learning has also been connected to renormalization in physics (another Quanta article that links to the original paper), and it is possible both of these are related to the information bottleneck. Renormalization is a process by which the details of processes at one scale are integrated out to produce effective (and usually simpler) processes at another scale. In fact, in my thesis I used renormalization ("DGLAP evolution") instead of the ad hoc smoothing procedure used by the people who created the model to remove the effects of using a finite basis:

The finer details of the model — requiring a lot of information to specify —  at one scale were removed and only relevant information —  requiring less information — at the other scale was kept.

[4] The Use of Knowledge in Society (1945)

[5] Interestingly, Christopher Sims' work has found that agents do not respond to much of the information in prices either. This makes more sense in the information bottleneck picture (where the prices are destroying information) than Hayek's telecommunications picture.

New GDP numbers and validating some forecasts

The latest GDP numbers for the US are out, so it's time to see if there are any surprises in the forecasts. As usual, no. The dynamic information equilibrium models are all doing fine.

Here's NGDP growth:

Here's the RGDP forecast (red-orange) alongside a forecast from the FRB NY DSGE model (blue); the latest data is black (just like the rest of these graphs):

Here are two models of PCE inflation; the existence of a shock in 2013 is uncertain, but it doesn't really matter for the recent data:

The negative shock centered in 2013 was hypothesized on the basis of the relationship between inflation the labor force and a similar (and more well-defined) shock to CPI inflation. The Great Recession induced a decline in labor force participation that has subsequently showed up in inflation. As both these shocks fade, the "lowflation" in the aftermath of the Great Recession is ending (see links here and here).

And finally, here are the level and growth rate of NGDP/PAYEMS (which I also write NGDP/L or N/L) which is essentially Okun's law (see here):

As always, click for larger images.



Forgot the S&P 500 graph:

Wednesday, April 25, 2018

Info Eq and commonalities in successful Econ (+ gravity models)

Here's a gravity model for ya!
When I got into economics as a hobby, I went through three phases. First, I assumed like I do with most fields that the experts in the field are in general on the up and up. Disagreements on e.g. macro policy were the result of legitimate disagreements about how the economy worked. Second, after playing around with the data, reading Noah Smith on the subject, and having followed those macro disagreements long enough to find some of them to be completely baseless in terms of empirically successful theory, I concluded a lot of it (except for the empirical work) was useless. While you should generally distinguish between macro and micro theory, the macro theory was based on the micro theory and the pieces of the macro theory that seemed most problematic were precisely those related to the micro theory. I was on board with the critics, but my personal preference is not to just lob criticism but rather produce viable alternatives. That's where my blog started five years ago. Recently I've entered my third phase where I've realized there is a econ criticism industry repeating the same tired critiques — but crucially also misses the actual problems (or just makes the same mistakes!) I set out to find alternatives to:

  • You're never going to create an economic model of human being that is in any way accurate, and even if you got close the aggregate model would be intractable. Be agnostic about human behavior.
  • Don't use your gut feelings to choose which things to include in models or how to include them. Money causing inflation sounds reasonable, but it's not empirically supported except for hyperinflation. (And don't use that empirical finding out of scope!)
  • Don't make those models more complex than the data can support. A macro model with tens of interacting variables at best will only be validated after 50 more years of time series data, putting it only slightly above philosophical speculation in meaning.

I made these points in a criticism of an "economic way of thinking" just yesterday. But that's also why I'm basically on board with Noah Smith's recent article about econ critics sounding like a broken record in both the metaphorical senses of repeating themselves and using an outdated method.

But also the litany of "successful economic theory" basically validates my itemized view above and two can even be represented using the information equilibrium framework. Noah lists "[a]uction theory, random-utility models, matching theory, gravity-trade models ...". I haven't taken on auction theory yet, and only noted the similarities between random utility discrete choice (which is more concerned with the state space of choices available to the agents). However the other two I've already looked at on my blog (here and here, as well as my recent paper). These models don't make a lot of assumptions about human behavior ("random" behavior in one), don't include things that aren't empirically supported (gravity models include distance despite it being mysterious from a theoretical perspective), and aren't very complex.

All of this was prolog to another way to look at the gravity model that uses one bit of information in the paper Noah cites by Thomas Chaney [1] that makes it even easier to write down an information equilibrium model. The scales of the trade state space should be set by the value of the output of one country ($NGDP_{a}$), the value of the output of the other country ($NGDP_{b}$), and also determined by the number of the firms engaged in trading ($K$). This establishes three information equilibrium relationships:

T_{a,b} & \rightleftarrows NGDP_{a}\\
T_{a,b} & \rightleftarrows NGDP_{b}\\
T_{a,b} & \rightleftarrows K

The solution to the relevant set of differential equations is:

T_{a,b} \sim \left( NGDP_{a}\right)^{\alpha} \left( NGDP_{b} \right)^{\beta} \left( K \right)^{\gamma}

In the paper, Chaney notes that the distribution of firms engaged in trade essentially scales with inverse distance $D$ via Zipf's law (which is another information equilibrium relationship $K \rightleftarrows 1/D$ [2]) with fewer, larger firms engaged in long range trade, so that we finally obtain:

T_{a,b} = c \frac{\left( NGDP_{a}\right)^{\alpha} \left( NGDP_{b} \right)^{\beta}}{D^{\gamma}}

The precise value of the parameters are based on the relative information of elements of each state space (and are usually just fit empirically). What's additionally interesting is that the information transfer framework allows for non-ideal information transfer, meaning that in general this relationship is just a bound on trade:

T_{a,b} \leq c \frac{\left( NGDP_{a}\right)^{\alpha} \left( NGDP_{b} \right)^{\beta}}{D^{\gamma}}

Sometimes you receive less information than is transmitted, and here the cause is precisely the lack of information flowing per footnote [1].


Update 26 April 2018

Per a comment from @unlearningecon, here is some data (digitized from here [pdf]) using OECD countries showing the gravity model as a bound:



[1] Also in the paper Chaney makes the same assumption about fully exploring the state space that information equilibrium does:
As long as the individuals that make up firms engage in direct communication with their clients and suppliers, and as long as information permeates through these direct interactions, one ought to expect that aggregate trade is close to proportional to country size and inversely proportional to distance.

Tuesday, April 24, 2018

The economic way of thinking?

Via David Andolfatto, I came across a pretty thorough blog post from Jim Rose (Twitter handle) aiming at not just describing but motivating the "economic way of thinking". Now I don't want to attribute this way of thinking to all economists, but it is common enough that I don't think it's entirely unrepresentative.

The thoroughness is a great way to illustrate how assumptions, perspectives, and worldviews (all plausible at each step) aggregate into a complex structure that I think many people inside and outside of economics think of as the "mainstream economics" they have a problem with. Since his blog post is organized by sections, I'm going to illustrate my point by listing some of the same section titles with my commentary below.

Thinking about thinking

We start with the self-evident truth that you cannot begin to study any problem from "gazing at a mass of unorganized data". However, this is then used to motivate the "selection of an analytical framework", and by implication a particular analytical framework (the "economic way of thinking"). The issue here is that this analytical framework covers the territory from the philosophy of Hume's uniformity of nature to the mathematics of utility maximization. You can easily do science with only the framework where you assume there are empirical regularities without having to either put known facts about social systems into a hierarchy of importance or accepting that these facts are the result of optimization by agents.

As a real world example, there was a lot of physics done before Newton came up with the first analytical framework. People knew things about magnets, materials, and motion as a collection of facts organized only by what would eventually be known as Hume's uniformity of nature. The framework does not need to be that complex, and simple usefulness  may be sufficient.

The facts of the social sciences

At this point, the author basically states the central tenet of methodological individualism. However the author also makes an assumption about the place of economics relative to the social sciences in general. It is true that individual opinions and beliefs are important facts for e.g. sociology and psychology. However, it has not been conclusively demonstrated (via e.g. empirically successful theory) that these opinions and beliefs are critical elements of understanding a lot of economic phenomena. Where I criticize economic methodology, my blog has generally been about how this belief that agents matter is just assumed — to the detriment of empirically accurate descriptions of economic phenomena. My recent paper takes Gary Becker's "irrational agents" to their logical conclusion, showing that some basic macroeconomic facts can be explained without pretending knowledge of human behavior. That approach also suggests that it is the space of possible agent behaviors rather than the behaviors of individual agents that is important — the individual beliefs and opinions are often irrelevant to economic observables except in highly correlated cases (e.g. panic in a financial crisis).

PS A lot of behavioral economics (which purports to get even closer to those beliefs of individuals) doesn't improve the empirical accuracy of economic theories either.

Measurement and theory

While anyone would agree with this premise in general, we again have a blanket statement covering the territory from "counting objects" to "optimizing agents" as the theory required in order to make sense of measurements. The unemployment rate needs very little theory (you can count people). A measurement of NAIRU or the natural rate of interest needs much more theory. Some macro theory even obviates measurements of the unemployment rate by declaring everyone to be "on vacation".

The author then says that the "rich tapestry" of the facts of history can only be isolated by theory. But this creates a problem that I discussed at length in a post about Dani Rodrik expressing this same view. The problem is that you have to have the correct theory to correctly isolate effects, but in order to have the correct theory you must have correctly isolated the effects in order to build a theory. It's a chicken or egg problem.

You first need a way to bootstrap yourself from potentially theory-contingent observations to a theory, and natural sciences was incredibly lucky in this regard. Our intuitions about the natural world (e.g. some effects diminish with distance, so using physical separation — or just the fact that we can't do much of anything about the sky) allowed experiments and observations to be done that correctly isolated the effects. That means a lot of empirical observations were already made correctly within what would become the theoretical framework you might use to isolate the systems. (A really good example is that modern thermodynamics is nearly entirely taught today using theoretically isolated systems that assume exactly the conditions that were used in the experiments to establish thermodynamics.)

Economics was really set up with some rotten luck in this regard. A lot of behavioral studies say that our human intuitions are woefully inadequate for addressing how to theoretically isolate the system. It is also possible the utility maximizing framework of microeconomics requires "macrofoundations": the system must be near a macro equilibrium for the utility maximizing framework to be a good approximation. You might not be able to understand microeconomics without a complete macro theory!

For me, this makes economics fascinating. I also have no idea how to solve this problem. I try to look for empirical regularities using a framework that makes as few assumptions about human behavior as possible. Someone else might have a better idea, but assuming a theoretical framework that isn't motivated by empirical success is just mathematical philosophy.

Thinking about abstractions

I will just say that Feynman tells us that the fundamental issue of science is not fooling yourself —that you don't filter and abstract away from facts that might prove your theory or experiment wrong. Putting emphasis on a human's ability as an apparatus of classification is a well-worn route to self-deception.

The power and self-discipline of parsimonious analysis

While I would agree that some kind of theoretical structure is good to prevent "explaining everything and therefore nothing", the author then makes a decidedly unsupported leap to stating: "Complex human objectives are not assumed in economic analysis because everything could be explained and nothing could be falsified."

Gary Becker's "irrational" agents (Becker 1962) are an explicit counterexample to this statement. The random "irrational" agents are equivalent to algorithmically complex agents, and the results could easily be falsified (and in fact the approach reproduces some basic concepts of economics).

[To be continued?]

This post is getting long. I may return to it to address more of the sections, but I think I've made my point that there are a lot of underlying assumptions about how the world works buried in what seems like an objective view. A lot of it derives from a basic assumption that human decisions are important to understand economic observables. But given the lack of observables that have been successfully described, maybe we should take on this unchallenged assumption. Here's Noah Smith:
But the real reason you have this tradeoff [between macro and micro-economic validity] is because you have big huge unchallenged assumptions in the background governing your entire model-making process. By focusing on norms you ignore production costs, consumption utility, etc. You can tinker with the silly curve-fitting assumptions in the macro model all you like, but it won't do you any good, because you're using the wrong kind of model in the first place. 
So when we see this kind of tradeoff popping up a lot, I think it's a sign that there are some big deep problems with the modeling framework. [Emphasis in original]
Is human agency one of these unchallenged assumptions?



Here is Brad DeLong expressing some similar views to Jim Rose.

The KL divergence as a price

As part of the five year anniversary of this blog, I went back and re-read this blog post that re-derives the information equilibrium condition using some general scaling arguments (much in the manner that things are presented in e.g. The Feynman Lectures on Physics). Let me reproduce one of those arguments here (with a bit of mathjax) about the price as a "detector" of information flow:


If prices change, the two distributions [of supply and demand] would have to have been unequal. If they come back to the original stable price — or another stable price — the two distributions must have become equal again. That is to say prices represent information about the differences (or changes) in the distributions. Coming back to a stable means information about the differences in one distribution must have flowed (through a communication channel) to the other distribution. We can call one distribution $D$ and the other $S$ for supply and demand. The price is then a function of changes in $D$ and changes in $S$, or

p = f(\Delta D, \Delta S)

Note that we observe that an increase in S that's bigger than an increase in $D$ generally leads to a falling price, while an increase in D that is bigger than the increase in $S$ generally leads to a rising price. That means we can try

p = \frac{\Delta D}{\Delta S}

for our initial guess. Instead of a price aggregating information, we have a price detecting the flow of information. Constant prices tell us nothing. Price changes tell us information has flowed (or been lost) between one distribution and the other.


I did want to note that we could have made a different choice — specifically the Kullback-Leibler (KL) divergence $D_{KL}(D, S)$ from information theory. For small perturbations $\Delta D$ and $\Delta S$ from an equilibrium $D_{0} = S_{0}$, we find

D_{KL}(D_{0} + \Delta D, S_{0} = D_{0}) & \sim \Delta D\\
D_{KL}(D_{0} = S_{0}, S_{0} + \Delta S) & \sim - \Delta S

which reproduces the same properties above (prices go up for increasing demand, ceteris paribus, and down for increasing supply, ceteris paribus). The interesting piece is that the KL divergence is what is used as the "detector" in Generative Adversarial Networks, a class of machine learning algorithms that is formally similar to the information transfer framework — as discussed in that same blog post.

We've got five years, what a surprise

We've got five years, stuck on my eyes 
We've got five years, what a surprise 
We've got five years, my brain hurts a lot 
We've got five years, that's all we've got
Five Years, David Bowie

The blog is five years old today. It started as essentially an online working paper with an informal abstract (and addendum).  The paper I mention never saw the light of day in that form, but morphed into this paper.

In the past year, I wrote both an Evonomics article as well as a book. The two most viewed blog posts of the past year were related to the former (essentially a draft version of the Evonomics article along with a continuously updated list of presentations and papers).

Thank you to everyone for reading in whatever form!

Monday, April 23, 2018

Tractability for tractability's sake?

Beatrice Cherrier has a great blog post (and follow up) about the (macro)economic methodologies chosen for reasons of tractability and their continued use and influence on theories and models. A couple of core examples are the representative agent and log-linearization. The latter is just a very good numerical approximation of the type made all the time in science and engineering (and Eggertsson and Singh show that it is in fact a good approximation), but the former is more complex in its ramifications.

Sure, the representative agent makes some models computationally tractable but it also abstracts the concept of exchange fundamental to economics [1], and prohibits the study of economic inequality. More dangerously, the representative agent is a bit of sleight of hand that gets around the SMD theorem (q.v. Kirman). Cherrier politely refers to these as "aggregation problems" [2], but I think there are enough issues that I devoted a chapter of my book to it. In fact, most tractability choices for the microfoundations (agents) [3] in the case of economics (which lacks symmetry principles) cannot be assumed to map the tractable micro theory to the "true" (i.e. an accurate) macro theory. I tell this story in more detail in an imagined "dialog" with a microfoundations fundamentalist.

As a physicist and in my work building simulations of systems, I've encountered and made lots of choices in order to make models tractable. A famous example, the Monte Carlo method, was designed to break the intractability of complicated integrals; different methods work better for different cases [4]. However it is important to understand that there are only a few ways to justify tractability choices: 1) directly showing the choices themselves are mathematically justified (e.g. the log-linearization above, Monte Carlo and the work in footnote [4]), 2) showing the resulting model is empirically useful/accurate, 3) for pedagogical purposes [5], or 4) it is the first time you are looking at this model you're making some educated guesses [6]. No other reason is justifiable from a scientific perspective [7]. 

It is important to recognize that tractability itself is not a sufficient justification for making tractability assumptions. Saying you assume a representative agent to make a model tractable just shifts the burden to the model — what makes this model so important that it needs to be made tractable? Is it empirically accurate when made tractable? When tractability itself is given as a justification — tractability for tractability's sake — that's usually a sign that the theory (model) is being seen through the lens of bias. This theory is too important to not obtain results from it [8]. Important here could mean politically — national politics because of macro's topics, but even just academic politics. Whatever the reason, it is not a scientific one. And that's what makes Cherrier's questions about the influence of tractability assumptions so interesting from a history of economic thought perspective. Since so few macro models are empirically accurate, rendering the scientific reasons for making them moot, tractability assumptions often could be a window on motivations other than science [9].



[1] While I find this hilarious in a discipline devoted to the study of transactions between two or more people, it is not as problematic mathematically as it is similar to looking at a mean field approximation with a "dressed particle". In the language of Wilson renormalization, one can "integrate out" the microeconomic scale interactions, replacing it with a new effective degree of freedom (agent). The unfortunate part about macro's representative agent assumption is that it is just that: instead of being careful handling the micro interactions, it's just assumed. In a discussion with Jo Michell I go through in quite a bit more mathematical detail of actually integrating out the micro details of heterogeneous agents. The other aspect of not making that 

[2] I immediately thought of the scene in The Big Lebowski where a man in an iron lung is said to have "health problems".

[3] I find the regularity with which agent-based modeling (ABM) is proffered as a solution to the "unrealistic assumptions" of DSGE models, but then goes on to ignore the fact that the ABM made different but still unrealistic assumptions in order to make the model tractable. For example, this.

[4] I picked up this issue of Scientific American when I was in high school. This article [pdf] on tractability in simulations strongly influenced me over the years, including its idea about an "incompleteness theorem" for computation asking whether some questions might be intractable for any feasible amount of computational resources making their answers unknowable. I have a hunch that in the real world when these macro scale computations with micro scale degrees of freedom become intractable, new "emergent" macro degrees of freedom arise that are tractable. I discussed the issues around this at length when I talked about Erik Hoel's causal emergence.

[5] There are some artificial "toy" models in physics that are set up because they are more tractable for students. But it should always be clear that toy models that demonstrate principles come after empirical successes of those principles.

[6] Typically, these kind of tractability assumptions are made in the privacy of your own notebook, and only ever get published if they lead to an empirically accurate model.

[7] I have on occasion encountered the "but we need some answer" defense: a model is made tractable via possibly dubious assumptions in order to give some answer in the face of uncertainty. E.g. one could think of creating a model made tractable through some ad hoc assumptions to justify some course of policy in the midst of a global financial crisis. While this kind of decisiveness is emblematic of leadership (frequently used as a story device on Star Trek, where e.g. Captain Picard had to make a decision on the basis of speculative theory) and can be rational, it should be made clear that this is not a scientific approach.

[9] These are not necessarily nefarious; it could be as simple as really believing a theoretical approach will eventually yield results. But given the topics macroeconomics covers, there are countless ways for ideology to enter — especially given the lack of empirical support for a lot of macro models.

Thursday, April 19, 2018

An agnostic equilibrium

David Glasner has posted an excellent introduction to his forthcoming paper on intertemporal equilibrium that talks about varying definitions and understandings.

Equilibrium can be one of the more frustrating concepts to talk about in the econoblogosphere because everyone seems to have their own definition — people from physics and engineering (as well as a general audience) often think in terms of static equilibrium (Glasner's "at rest"), and so say things like "Obviously economies are not in equilibrium! They change!". Of course, definitions of economic equilibrium that never apply are useless definitions of economic equilibrium (like Steve Keen's definition here).

I had a rambling draft post that I published anyway three years ago that discussed several definitions of equilibrium, including Noah Smith's claim:
Economists have re-defined "equilibrium" to mean "the solution of a system of equations". Those criticizing econ should realize this fact.
This muddle of language is why I try to do my best to say information equilibrium or dynamic information equilibrium in my posts (at least for the first mention) in order to make it clear which idea of equilibrium I am talking about. And that basic idea is that, in information equilibrium, the distribution of planned purchases (demand) contains the same amount of information entropy as the distribution of planned sales (supply). In information equilibrium, which I contend is a good way to think about equilibrium in economics, knowing one side of all the planned exchanges [1] is knowing the other. This implies that information must have traveled from one side of the exchange to the other. If I write down a number between one and a million and you write down the same number, it is extremely unlikely that I didn't communicate it to you. If I give you six dollars and you give me a pint of blueberries, it is even more unlikely that exchange happened by random chance. Information is getting from one party to the other.

But just like a definition of equilibrium that never applies is useless, so is a definition of equilibrium that always applies. And in the case of "information disequilibrium" (non-ideal information transfer), there is information loss. If we consider demand the source of this economic information [2], information loss leads to lower prices and a deficiency in measured demand.

Glasner's mutually consistent intertemporal plans can easily be represented in terms of information equilibrium (the information required to specify one side of a series of transactions or expected transactions can be used to construct the other side). But the discussion in Glasner's post goes further, talking about the process of reaching an equilibrium of mutually consistent intertemporal plans. At this point, he discusses rational expectations, perfect foresight, and what he calls "correct foresight".

This is where the information equilibrium framework is agnostic, and represents an effective theory description. There aren't any assumptions about even the underlying agents, except that they eventually fully explore the available (intertemporal) opportunity set. Random choices can do this (random walks by a large number of agents, q.v. Jaynes' "dither"), making the the observed states simply the most likely states (n.b. even for random exchanges, knowing one side of all the exchanges is still knowing the other side — i.e. information equilibrium). This is behind "maximum entropy"  (which basically means "most likely") approaches to various problems in the physical sciences as well as in "econophysics". For me maximum entropy/information equilibrium provides a baseline, and the information transfer framework provides a way to understand non-equilibrium states as well. But random agents are just one tool in the toolbox, and really the only requirement (for equilibrium) is that agents fully explore the opportunity set.

Over the years I've had many people upset with me in comments, emails, twitter, etc for the behavior-agnostic aspect of the approach. People aren't random! Of course human behavior is important! I've been called arrogant, or been laughed at for my "hubris". However, I think it is even greater hubris to claim to know how human behavior works [3]. This modest approach comes from my physics background. There was a kind of "revolution" in the 70s and 80s where physicists went from thinking the "Standard Model" and other theories were akin to literal descriptions of reality to thinking they were effective theories that parameterized our ignorance of reality. I'm not sure this lesson has been fully absorbed, and many physicists think that e.g. string theory is the beginning of a 'final fundamental theory' instead of just a better effective theory at scales near the Planck scale. But nearly all particle physicists understand that the Standard Model is just an effective theory. That's actually a remarkable shift in perspective from the time of Einstein where his general theory of relativity was thought to be fundamental [4].

Effective theory has been a powerful tool for understanding physical systems. I like to think of the maximum entropy/information equilibrium approach as an effective theory of economics that's agnostic about the underlying agents and how they reach intertemporal equilibrium. It is true that being agnostic about agents or how equilibrium is reached limits the questions you can address, but so does assuming an incorrect model for these things. But it is good for addressing really basic questions like what economic growth rates can be or what is "equilibrium" when it comes to the unemployment rate. My recent paper using information equilibrium answers this in a way that I have not seen in the literature (that tend to focus on concepts like NAIRU or search and matching): "equilibrium" in the unemployment rate are the periods of constant (in logarithm) falling unemployment between recessions. The 4.1% unemployment (SA) from March represented the equilibrium, but so did the 4.5% unemployment rate from March of 2017. In the past 10 years, unemployment was only away from equilibrium from 2007 to 2010 and again for a brief period in 2014. But every other point from the 9% in 2011 to the 4% today represents information equilibrium [5].

So while it is a simpler approach, information equilibrium allows for more complex ideas of what "equilibrium" means. I think that makes it useful for modeling economic systems, but it comes with a dose of modesty keeping you from pushing your own "gut feelings" about how humans behave.



[1] There is no reason to restrict these to binary exchanges or "barter"; it is fully general, but purely binary exchanges over a series of time steps represent a simpler example. Here's what I am talking about in a more visual representation. You have some supply (apples) and some demand (people who want to buy apples):

A set of (potentially intertemporal) plans of exchanges occurs (money flows the opposite direction of goods):

The resulting distribution of supply exchanged for money has the same information as the original distribution of demand:

This is information equilibrium. As a side note, if everything is in information equilibrium, the resulting distribution of money exchanged also contains exactly the same information under only two possible conditions: there is one good, or the economy has an effective description in terms of aggregate demand and aggregate supply (i.e. effectively one good). Otherwise, money exchange destroys information (you don't know what the money was spent on).

[2] This is a "sign convention" because it could easily be the other way (the math works out somewhat symmetrically). However, we buy things by giving people tokens we value instead of sellers divesting themselves of "bad" tokens so this sign choice is more user-friendly than, say, Ben Franklin's choice of negative and positive charges. This sign convention means that prices in terms of tokens with positive value go up when demand goes up and down when supply goes up, while the other is held constant.

[3] Not saying this of most economists, because most see e.g. rational agent models as approximations to reality. Of course, some don't. But a lot of other people out there seem to have very strong opinions about how humans behave that are "ignored" by mainstream economists.

[4] Einstein's special relativity in the case of string theory would be an effective description in the 4-D bulk with an underlying Poincare invariance on the full 10 or 11 dimensions. But the basic idea of special relativity as a symmetry principle is still considered fundamental. (I think! Haven't kept up with the literature!)

[5] This is different from Roger Farmer's idea of any unemployment rate being an equilibrium unemployment rate I discuss here. According to Paul Krugman's interpretation, Farmer doesn't necessarily believe there is a tendency for unemployment to come down from a high level. The information equilibrium version, this tendency to come down is the equilibrium (regardless of level).