In addition to the unemployment rate, there is also new data for the prime age civilian labor force participation rate which we can use to track the performance of our forecast (last updated here):
Information Transfer Economics
A working paper exploring the idea that information equilibrium is a general principle for understanding economics. [Here] is an overview.
Friday, October 6, 2017
Latest unemployment data
New unemployment data is out, so it's time to check to see how the forecasts are doing compared to reality. First, I want to throw out two forecasts as rejected: one from me, and one from the FRBSF. I started putting them on the same graph with the dynamic equilibrium model here, but the original forecast of mine was made as part of my effort to come up with way to forecast recessions. With what I know now, I wouldn't have made this forecast — the tolerance for positing a recession was too low and would have choked on earlier data if used in this model.
The gray forecast assumed a recession was happening in the next few quarters, while the red dynamic equilibrium forecast assumes no shocks. The former is resoundingly rejected. Now how about a statement from the FRB SF rejecting their previous forecast?
Instead of rejecting their previous forecasts, the FRB SF has continually been updating their forecasts over time as the future they predict fails to materialize (which I noted in this post making the point that forecast instability is a sign you have the wrong model, and it's the point I am making with this gallery). I've also added the FOMC's forecast to the series of head to heads:
The FOMC does basically the same thing, which I've emphasized by adding in their December 2014 forecast in purple.
Update
The FRB SF has yet another forecast update, which I have added to the graph above:
This kind of forecast updating would be fine if it a) was stable, and b) had a longer period of success relative to the forecast length. If a forecast is made for a couple years in the future but only works for a couple of months, you should stop forecasting longer than a couple of months.
The thing is that if there is a recession that starts in the next couple years, the latest forecast will be seen as correct despite the fact that nearly every prior forecast was wrong over this length of time. It is unscientific. Much like the perpetual pessimists always forecasting a recession being seen by some as being successful when a recession happens (Hello, Steve Keen!), it is a failure of Feynman's "leaning over backwards" to reject your own theories and models.
Update
The FRB SF has yet another forecast update, which I have added to the graph above:
This kind of forecast updating would be fine if it a) was stable, and b) had a longer period of success relative to the forecast length. If a forecast is made for a couple years in the future but only works for a couple of months, you should stop forecasting longer than a couple of months.
The thing is that if there is a recession that starts in the next couple years, the latest forecast will be seen as correct despite the fact that nearly every prior forecast was wrong over this length of time. It is unscientific. Much like the perpetual pessimists always forecasting a recession being seen by some as being successful when a recession happens (Hello, Steve Keen!), it is a failure of Feynman's "leaning over backwards" to reject your own theories and models.
Thursday, October 5, 2017
The price mechanism as information bottleneck
I've been reading and writing about the "information bottleneck" lately (e.g. this paper, or e.g. this post) focusing on how it might relate to the price mechanism. In the post, I argued that the price mechanism works by destroying information instead of aggregating or communicating it.
I thought this might be a neat example to try out Mathematica's Classify machine learning function. So I set up some training data on a simple system with three agents (1, 2, 3), a price that could take on three values (1, 2, 3) for an allocation of three units of one good. Of course, one one hand all the threes make this confusing — but on the other hand this website is free.
There are ten different possible allocations of three widgets across three agents which I designate by a list of three numbers: e.g. {1, 2, 0}, {0, 0, 3}, {1, 1, 1}, etc. Each allocation is then related to a price in the training data; here's a graphical representation of that (noisy) training data (that we'll later relate to the information bottleneck):
The prices are on the right, and the various possible allocations are on the left, with the arrows showing when a price was related to a particular allocation (sometimes multiple times, and sometimes an allocation was related to two different prices). Running c = Classify[trainingData], we get a function c[.] that maps an allocation to a price p:
c[{3, 0, 0}] = 1
c[{2, 1, 0}] = 2
If we look at the various allocations related to each price (and weight them by their probabilities), we can get an idea of a "typical" allocation that yields each price:
Each price is represented by a different color. The horizontal line at 10% represents the probability of any particular allocation if we had a uniform distribution over the different allocations (since there are 10 of them). It's also the result when the machine learning algorithm fails, essentially choosing the least informative prior.
We can see when the price is p = 1, then agent 1 ends up with more of the stock of widgets. When p = 2, the distribution is more uniform (it was set up as the "equilibrium price" in the training data). Although each agent in this particular setup is a consumer, we can think of 1 as the "consumer" and 3 as the "producer". If the price is too high, agent 3 ends up with more of the goods on average (they don't sell); if the price is too low, agent 1 does (over-consumption).
we can look at the information entropy of these allocations, and it is indeed maximized for the equilibrium price p = 2 (by construction):
We have an information bottleneck where these three price values (1.6 bits) are destroying the irrelevant information and capturing relevant information about the opportunity set (3.3 bits, for a loss of 1.7 bits — more than half the information content) [1].
I borrowed this information bottleneck diagram from this paper:
In our case, $X$ is the allocation (state space), and $Y$ is the price. Our classify function c[state] represents $\hat{X}$ and $\hat{Y}$ is the output of that function. It was trained on the data (the diagram at the top of this post). Of course, Classify isn't really doing this with a Deep Neural Network (there's actually just one hidden layer with 8 nodes), but what I'm trying to illustrate here is the formal similarities between destroying information in the price mechanism and the information bottleneck.
We can envision the price mechanism as setting up a primitive neural network machine learning algorithm: the price functioning as an autoencoder of the state space information, destroying the irrelevant information in the information bottleneck, and then the flow of money reinforces the connection between neurons (i.e. exchages between agents).
We can add a second state space defining the demand for widgets (the state space above defines the supply). If these state spaces match up, then the supply and the demand will see the "equilibrium" price for the equilibrium allocation. Deviations on either side will will mean the market price will differ from the price derived from either the supply distribution or the demand distribution. Information will flow from supply to demand (or vice versa) via exchanges, and the price will change to represent the new state. This process will continue until the relevant information content of the supply distribution (captured via the bottleneck, with irrelevant information being destroyed) is equivalent to the information content of the demand distribution — i.e. information equilibrium.
If we take demand as constant (i.e. the real data we are trying to learn), this is identical to training a neural network with a Generative Adversarial Network (GAN) algorithm. Different supply distributions are created via exchanges and the price (the bottleneck) discriminates between them leading what should eventually be identical distributions on both sides when the price can no longer discriminate (i.e. is constant) between the supply distribution and the demand distribution.
Or at least that is how I am thinking about this at the moment. It is possible we need to look at the joint distribution of supply and demand as one big state space. More work to be done!
Footnotes:
[1] Additionally, I went through and did random trades among the agents (select two agents at random, and if one agent has more widgets than the other and the other has money at the price dictated by the future allocation — i.e. the allocation that would result from a trade — there's a trade). This eventually produces an equilibrium (an equilibrium price of 2 with a uniform allocation):
I want to eventually make the machine learning algorithm re-train on the new data that's produced from a transaction, which would likely reinforce some price probabilities and reduce others.
Sunday, October 1, 2017
The price mechanism and the information bottleneck
David Glasner has a nice post on "imperfect information" in economics. In it, he discusses how the idea of painting Hayek and Stiglitz as "polar opposites" generally gets it wrong, and that Hayek didn't think markets had "perfect information". What was interesting to me is that a significant number of the arguments with commenters and on Twitter that resulted from my Evonomics piece tried to make a similar point: that Hayek didn't say markets were always perfect. As I mention in my response, I never said that Hayek thought markets were perfect — quoting precisely a passage where Hayek says they're not perfect [1].
My contention is that not only aren't markets perfect, but even if they work they are not working in the way Hayek says they work when he looks at the case of functioning markets. I will also argue that the fact that neither a central planner nor a market can actually receive or transmit the information claimed to be flowing, making Hayek's argument against central planning simultaneously an argument against markets — if they function the way Hayek claims they function. However, I will conclude with a discussion on how the price mechanism may actually function by destroying information.
Let's start with Glasner quoting Timothy Taylor quoting Hayek:
[The market is] a system of the utilization of knowledge which nobody can possess as a whole, which ... leads people to aim at the needs of people whom they do not know, make use of facilities about which they have no direct [knowledge]; all this condensed in abstract signals ...
Glasner responds to this (and the rest of the quoted section of Taylor's post):
Taylor, channeling Bowles, Kirman and Sethi, is here quoting from a passage in Hayek’s classic paper, “The Use of Knowledge in Society” in which he explained how markets accomplish automatically the task of transmitting and processing dispersed knowledge held by disparate agents who otherwise would have no way to communicate with each other to coordinate and reconcile their distinct plans into a coherent set of mutually consistent and interdependent actions, thereby achieving coincidentally a coherence and consistency that all decision-makers take for granted, but which none deliberately sought. The key point that Hayek was making is not so much that this “market order” is optimal in any static sense, but that if a central planner tried to replicate it, he would have to collect, process, and constantly update an impossibly huge quantity of [knowledge].
There is an issue where in economics the words "information" and "knowledge" are synonymous (just like the colloquial English definitions [2]) that gets in the way of talking about this in terms of information theory. Therefore I traded "information" for "knowledge" in the quotes above (emphasizing with brackets). Knowledge is meaningful, whereas information represents a measure of the size of an available state space (weighted by probability of occupation) regardless of whether a state selected from it is meaningful. The phrases "The speed of light is a constant" and "Groop, I implore thee, my foonting" are drawn from a state space of approximately the same amount of information (the latter actually requires more), but the former is more meaningful and represents more knowledge.
This measure of information was designed to understand how to build systems that enable you to transmit either message. I'm not trying to say that Claude Shannon's definition is "better" than the economics definition or anything — there's simply a technical meaning given to it in information theory because of a distinction that hasn't been necessary in economics. In defining it, Shannon had to emphasize "information must not be confused with meaning".
However, this semantic issue allows us to get a handle on the mathematical issue with Hayek's mechanism. There is no way for this "impossibly huge quantity of knowledge" to be condensed into a price (a single number) because the amount of information (e.g. the thousands of — including "expected" — production numbers [x1, x2, x3, ... ], where the "knowledge" of them represents a specific set [42, 6, 9, ... ]) is too great to be conveyed via that single number without an encoding scheme and drawing out the message over time. You could e.g. encode the numbers as Morse code and fluctuate the price over a few seconds, but the idea that there are messages like that in market prices is so laughable that we don't even need to discuss it. I'll continue use brackets to emphasize use of the technical distinction below.
Therefore one thing that market prices are not doing is "condensing" or "transmitting and processing" dispersed knowledge. Prices are incapable of carrying such an information load. The information is largely being destroyed rather than processed or compressed.
When Stiglitz and others talk about imperfect [knowledge], they are actually talking about the fact that the information has been destroyed. A price of a used car isn't going to allow me to glean enough information about the state of that car — especially if you place the desires of the human used car salesperson to get a good price for it. Where an "honest" salesperson might price the car below Blue Book value because it has been flood damaged, the buyer's imperfect [knowledge] of the flood damage means the salesperson would rationally try to get Blue Book value. However, even a sub-Blue Book price cannot communicate the information state of the accident history, transmission, engine, etc in addition to that flood damage.
There's already an [information] asymmetry between the available states the car could be in and the available states the price could take. There is the additional [knowledge] asymmetry made famous by Akerlof's The Market for Lemons on top of that.
But, you say, the price mechanism seems to function "as if" it is communicating information. I guess you could devise an effective theory where the state space information is actually really small (undifferentiated widgets that have some uniform production input). But that's basically just another way to describe the argument above: in order for the price to transmit dispersed knowledge, there mustn't be much knowledge to be transmitted. In a sense, this makes Hayek's argument against central planning a kind of straw man argument. Sure, a central planner can't collect and process all of this information, but the price mechanism can't do this either.
One of the reasons I belabor this particular point is because in trying to understand how information equilibrium relates to economics, I had to understand this myself. As I said in my "about me" blog post:
... I stumbled upon this paper by Fielitz and Borchardt and tried to apply the information transfer framework to what is essentially Hayek's description of the price mechanism. That didn't exactly work, but it did work if you thought about the problem differently.
The part that "didn't exactly work" was precisely Hayek's description of information being compressed into the price. You had to think about the problem differently: the price was a detector of information flow, but unlike a thermometer or a pressure gauge (that have a tiny interface in order to not influence what it is measuring) the price is maximally connected to the system. The massive amount of information required to specify an economy was actually flowing between the agents in the economy itself (i.e. the economic state space information), with the price representing only a small amount of information.
But if this is true, then we might ask: Since it frequently appears to work in practice, how could the price mechanism work when it does?
I think the answer currently is that we don't know. However, I am under the impression that research into machine learning may yield some insights into this problem. What is interesting is that the price not as receiver but rather as detector is reminiscent of a particular kind of machine learning algorithm called Generative Adversarial Networks (GANs). GANs are used to train neural nets. They start with essentially randomly generated data (the generative bit) which is then compared to the real data you want the neural net to learn. A "discriminator" (or "critic" in some similar methods) checks how well the generator's guesses match the real data.
Imagine art students trying to copy the style of van Gogh, and the art teacher simply saying you're doing well or not. It is amazing that this can actually work to train a neural net to copy the style of van Gogh (pictured above). A simpler but similar situation is a game of "warmer/cooler" where someone is looking for an object and the person who knows where it is tells them if they are getting warmer (closer) or cooler (farther). In this case, it is not as counterintuitive that this should work. Much like how it is not as problematic for Hayek's price mechanism to operate with generic widgets, what we have in the case of a game of "warmer/cooler" is very low dimensional state space so the sequence of "warmer/cooler" measurements from the "discriminator" is much closer in information content to the actual state space. In the case of van Gogh style transfer, we have a massive state space. There is no way the sequence of art teacher comments could possibly come close to the amount of information required to specify a van Gogh-esque image in state space.
However, information must be flowing from the actual van Gogh (real data) to the generator because otherwise we wouldn't be able to generate the van Gogh-esque image. The insight here is that information flows from the real data to the generator, and the quantity of information flowing will be indicated by the differences between the different discriminator scores. A constant score indicates no information flow. A really big improvement in the score indicates a lot of information has flowed.
Again, we don't know exactly how this works for high dimensional state spaces, but a recent article in Quanta magazine discusses a possible insight. It's called the "information bottleneck". In the information bottleneck, a bunch of information about the state space in the "real data" that doesn't generalize is destroyed (e.g. forgetting irrelevant correlations), leaving only "relevant" information about the state space.
To bring this back to economics, what might be happening is that the price mechanism is providing the bottleneck by destroying information. Once this information is destroyed, what is left is only relevant information about the the economic state space. My private information about a stock isn't aggregated via the price mechanism, but rather is almost entirely obliterated [3] when the market is functioning.
With most of this private information being obliterated in the bottleneck, measurements of the information content of trades should actually be almost zero if this view is correct. It is interesting that Christopher Sims has found that only a few bits of information in interest rates seems to be used by economic agents, and other research shows that most traders seem to be "noise traders". Is the information bottleneck destroying the remaining information?
This is speculation at this stage; I'm just thinking out loud with this post. However the information bottleneck is an intriguing way to understand how the price mechanism can work despite a massive amount of information falling on the floor.
...
Footnotes:
[1] Hayek from The Use of Knowledge in Society:
Of course, these [price] adjustments are probably never "perfect" in the sense in which the economist conceives of them in his equilibrium analysis. But I fear that our theoretical habits of approaching the problem with the assumption of more or less perfect knowledge on the part of almost everyone has made us somewhat blind to the true function of the price mechanism and led us to apply rather misleading standards in judging its efficiency. The marvel is that in a case like that of a scarcity of one raw material, without an order being issued, without more than perhaps a handful of people knowing the cause, tens of thousands of people whose identity could not be ascertained by months of investigation, are made to use the material or its products more sparingly; i.e., they move in the right direction. This is enough of a marvel even if, in a constantly changing world, not all will hit it off so perfectly that their profit rates will always be maintained at the same constant or "normal" level.
[2] The definitions that come up from Google searching "define knowledge" and "define information":
knowledge: facts, information, and skills acquired by a person through experience or education; the theoretical or practical understanding of a subject.
information: facts provided or learned about something or someone.
The difference between these definitions is basically the inclusion of "skills". What's also interesting is that the second definition for information gets better:
information: what is conveyed or represented by a particular arrangement or sequence of things.
Although the information theory definition of information entropy depends on the state space of possibilities that particular arrangement was selected from.
[3] In fact, the cases where my information isn't obliterated but rather amplified may well be the causes of market failures and recessions. Instead of my fear that a stock price is going to fall being averaged away among the optimistic and pessimistic traders, it becomes amplified in a stock market crash. The information transfer framework labels this as "non-ideal information transfer" (a visualization using a demand curve as an example is here).
Friday, September 29, 2017
Checking in on an inflation forecast
I made a forecast of PCE inflation using the dynamic information equilibrium model described in this post at the beginning of the year, and so far the model is doing well — new monthly core PCE data came out this morning:
Thursday, September 28, 2017
A forecast validation bonanza
New NGDP numbers are out today for the US, so that means I have to check several forecasts for accuracy. I would like to lead with a model that I seem to have forgotten to update all year: dynamic equilibrium for the ratio of nominal output to total employed (i.e. nominal output per employed, as I write N/L):
This one is particularly good because the forecast was made near what the model saw as a turnaround point in N/L (similar to the case of Bitcoin below, also forecast near a turnaround point) saying we should expect a return towards the trend growth rate of N/L of 3.8% per annum. This return appears to be on track.
The forecast of NGDP using the information equilibrium (IE) monetary model (i.e. a single factor of production where money — in this case physical currency — is that factor of production) is also "on track":
The interesting part of this forecast is that the log-linear models are basically rejected.
In addition to NGDP, quarterly [core] PCE inflation was updated today. The NY Fed DSGE model forecast (as well as FOMC forecast) was for this data, and it's starting to do worse compared to the IE monetary model (now updated with monthly core PCE number as well):
* * *
I've also checked my forecast for the Bitcoin exchange rate using the dynamic equilibrium model (which needs to be checked often because of how fast it evolves — it's time scale is -2.6/y so it should fall by about 1/2 over a quarter). It is also going well:
Tuesday, September 26, 2017
Different unemployment rates do not contain different information
I saw this tweet today, and it just kind of frustrated me as a researcher. Why do we need yet another measure of unemployment? These measures all capture exactly the same information.
I mentioned this before, however I thought I'd be much more thorough this time. I used the dynamic equilibrium model on the U3 (i.e. "headline"), U4, U5, and U6 unemployment rates and looked a the parameters.
Here are the model fits along with an indicator of where the different centers of the shocks appear as well as a set of lines showing the different dynamic equilibrium slopes (which are -0.085/y, -0.084/y, -0.082/y, and -0.078/y, respectively):
Within the error of estimating these parameters, they are all the same. How about shock magnitudes? Again, within the error of estimating the shock magnitude parameters, they are all the same:
Basically, you need one model of one rate plus 3 scale factors to describe the data. There appears to be no additional information in the U4, U5, or U6 rates. Actually in information theory this is explicit. Let's call the U3 rate random variable U3 (and likewise for the others). Now U6 = f(U3), therefore:
H(U6) ≤ H(U3)
Monday, September 25, 2017
Was the Phillips curve due to women entering the workforce?
Janet Yellen in her Fed briefing from last week said "Our understanding of the forces driving inflation is imperfect." At least one aspect that's proven particularly puzzling is the relationship between inflation and unemployment: the Phillips curve. In an IMF working paper from November of 2015, Blanchard, Cerutti, and Summers show the gradual fall in the slope of the Phillips curve from the 1960s to the present. I discussed it in January of 2016 in a post here. A figure is reproduced below:
Since that time, I've been investigating the dynamic equilibrium model and one thing that I noticed is that there appears to be a Phillips curve-like anti-correlation signal if you look at PCE inflation data and unemployment data:
See here for more about that graph. It was also consistent with a "fading" Phillips curve. While I was thinking about the unemployment model today, I realized that the Phillips curve might be directly connected with women entering the workforce and the impact it had on inflation via the employment population ratio. I put the fading Phillips curve on the dynamic equilibrium view of the employment population ratio for women:
We see the stronger Phillips curve signal in the second graph above (now marked with asterisks in this graph) follows the "non-equilibrium" shock of women entering the workforce. After that non-equilibrium shock fades, the employment population ratio for women starts to become highly correlated with the ratio for men — showing almost identical recession shocks.
This suggests that the Phillips curve is not just due to inflation resulting from increasing employment, but rather inflation resulting from new people entering the labor force. The Phillips curve disappears when we reach an employment-population ratio equilibrium. This would explain falling inflation since the 1990s as the employment-population ratio has been stable or falling.
Now I don't necessarily want to say the mechanism is the old "wage-price spiral" — or at least the old version of it. What if the reason is sexism? Let me explain.
A recent study showed that men self-cite more often that women in academic journals, but the interesting aspect for me was that this appears to increase right around the time of women entering the workforce:
What if the wage-price spirals of the strong Phillips curve era were due to men trying (and succeeding) to negotiate even higher salaries than women (who were now more frequently in similar jobs)? As the labor market tightens during the recovery from a recession, managers who gave a woman in the office a raise might then turn around and give a man an even larger raise. The effect of women in the workforce would be to amplify what might be an otherwise undetectable Phillips curve effect into a strong signal in the 1960s, 70s and 80s. While sexism hasn't gone away, this effect may be attenuated today from its height in that period. This "business cycle" component of inflation happens on top of an overall surge in inflation due to an increasing employment population ratio (see also Steve Randy Waldman on the demographic explanation of 1970s inflation).
Whether sexism is really the explanation, the connection betweem women entering the workforce and the Phillips curve is intriguing. It would also mean that the fading of the Phillips curve might be a more permanent feature of the economy until some other demographic phenomenon occurs.
Friday, September 22, 2017
Mutual information and information equilibrium
Natalie Wolchover has a nice article on the information bottleneck and how it might relate to why deep learning works. Originally described in this 2000 paper by Tishby et al, the information bottleneck was a new variational principle that optimized a functional of mutual information as a Lagrange multiplier problem (I'll use their notation):
\mathcal{L}[p(\tilde{x} | x)] = I(\tilde{X} ; X) - \beta I(\tilde{X} ; Y)
$$
The interpretation is that we pass the information the random variable $X$ has about $Y$ (i.e. the mutual information $I(X ; Y)$) through the "bottleneck" $\tilde{X}$.
Now I've been interested in the connection between information equilibrium, economics, and machine learning (e.g. GAN's real data, generated data, and discriminator have a formal similarity to information equilibrium's information source, information destination, and detector — the latter I use as a possible way of understanding of demand, supply, and price on this blog). I'm always on the lookout for connections to information equilibrium. This is a work in progress, but I first thought it might be valuable to understand information equilibrium in terms of mutual information.
If we have two random variables $X$ and $Y$, then information equilibrium is the condition that:
H(X) = H(Y)
$$
Without loss of generality, we can identify $X$ as the information source (effectively a sign convention) and say in general:
H(X) \geq H(Y)
$$
We can say mutual information is maximized when $Y = f(X)$. The diagram above represents a "noisy" case where either noise (or another random variable) contributes to $H(Y)$ (i.e. $Y = f(X) + n$). Mutual information cannot be greater than the information in $X$ or $Y$. And if we assert a simple case of information equilibrium (with information transfer index $k = 1$), e.g.:
p_{xy} = p_{x}\delta_{xy} = p_{y}\delta_{xy}
$$
then
$$
\begin{align}
I(X ; Y) & = \sum_{x} \sum_{y} p_{xy} \log \frac{p_{xy}}{p_{x}p_{y}}\\
& = \sum_{x} \sum_{y} p_{x} \delta_{xy} \log \frac{p_{x} \delta_{xy} }{p_{x}p_{y}}\\
& = \sum_{x} \sum_{y} p_{x} \delta_{xy} \log \frac{\delta_{xy} }{p_{y}}\\
& = \sum_{x} p_{x} \log \frac{1}{p_{x}}\\
& = -\sum_{x} p_{x} \log p_{x}\\
& = H(X)
\end{align}
$$
Note that in the above, the information transfer index accounts for the "mismatch" in dimensionality in the Kronecker delta (i.e. a die roll that determines the outcome of a coin flip such that a roll of 1, 2, or 3 yields heads and 4, 5, or 6 yields tails).
Basically, information equilibrium is the case where $H(X)$ and $H(Y)$ overlap, $Y = f(X)$, and mutual information is maximized.
Thursday, September 21, 2017
My introductory chapter on economics
I was reading the new CORE economics textbook. It starts off with some bold type stating "capitalism revolutionized the way we live", effectively defining economics as the study of capitalism as well as "other economic systems". The first date in the first bullet point is the 1700s (the first sentence mentions an Islamic scholar of the 1300s discussing India). This started me thinking: how would I start an economics textbook based on the information-theoretic approach I've been working on?
...
An envelope-tablet (bulla) and tokens ca. 4000-3000 BCE from the Louvre.
© Marie-Lan Nguyen / Wikimedia Commons
© Marie-Lan Nguyen / Wikimedia Commons
Commerce and information
While she was studying the uses of clay before the development of pottery in Mesopotamian culture in the early 1970s, Denise Schmandt-Besserat kept encountering small dried clay objects in various shapes and sizes. They were labelled with names like "enigmatic objects" at the time because there was no consensus theory of what they were. Schmandt-Besserat first cataloged them as "geometric objects" because they resembled cones and disks, until ones resembling animals and tools began to emerge. Realizing they might have symbolic purpose, she started calling them tokens. That is what they are called today.
The tokens appear in the archaeological record as far back as 8000 BCE, and there is evidence they were fired which would make them some of the earliest fired ceramics known. They appear all over Iran, Iraq, Syria, Turkey, and Israel. Most of this was already evident, but unexplained, when Schmandt-Besserat began her work. Awareness of the existence of tokens, in fact, went back almost all the way to the beginning of archaeology in the nineteenth century.
Tokens were found inside one particular "envelope-tablet" (hollow cylinders or balls of clay — called bullae) found in the 1920s at a site near ancient Babylon. It had a cuneiform inscription on the outside that read: "Counters representing small cattle: 21 ewes that lamb, 6 female lambs... " and so on until 49 animals were described. The bulla turned out to contain 49 tokens.
In a 1966 paper Pierre Amiet suggested the tokens represented specific commodities, citing this discovery of 49 tokens and the speculation that the objects were part of an accounting system or other record-keeping. Similar systems are used to this day. For example, in parts of Iraq pebbles are used as counters to keep track of sheep.
But because this was the only such envelope-tablet known, it seemed a stretch to reconstruct a entire system of token-counting based on one single piece of evidence. But, as Schmandt-Besserat later noted, the existence of many tokens having the same shape but in different sizes is suggestive that they belonged to an accounting system of some sort. With no further evidence however, this theory remained just one possible explanation for the function of the older tokens that pre-dated writing.
In 2013, fresh evidence emerged from bullae dated to ca. 3300 BCE. Through use of CT scanning and 3D modelling to see inside unbroken clay balls, researchers discovered that the bullae contained a variety of geometric shapes consistent with Schmandt-Besserat's tokens.
CT scan of Choga Mish bulla and Denise Schmandt-Besserat
While these artifacts were of great interest in Schmandt-Besserat's hypothesis about the origin of writing, they fundamentally represent economic archaeological artifacts.
Could a system function with just a single type of token? Cowrie shells seemed to provide a similar accounting function around the Pacific and Indian oceans (because of this, the Latin name of the specific species is Monetaria moneta, "the money cowrie"). The distinctive cowrie shape was even cast in copper and bronze in China as early as 700 BCE, making it an early form of metal coinage. The earliest known metal coins along the Mediterranean come from Lydia from before 500 BCE.
Bronze cowrie shells from the Shang dynasty (1600-1100 BCE)
The 2013 study of the Mesopotamian bullae was touted in the press as the "very first data storage system", and (given the probability distributions of finding various tokens) a bulla containing a particular set of tokens represents a specific amount of information by Claude Shannon's definition in his 1948 paper establishing information theory.
In this light, commerce can be seen as an information processing system whose emergence is deeply entwined with the emergence of civilization. It is also deeply entwined with modern mathematics.
Fibonacci today is most associated with the Fibonacci sequence of integers. His Liber Abaci ("Book of Calculation", 1202) introduced Europe to Hindu-Arabic numerals in its first section, and in the second section illustrated the usefulness of these numerals (instead of the Roman numerals used at the time) to businessmen in Pisa with examples of calculations involving currency, profit, and interest. In fact, Fibonacci's work spread back into Arabic business as the Arabic numerals had mostly been used by Arabic astronomers and scientists.
The accouting system and other economic data is often described using these numerals today — at least where they need to be accessed by humans. In reality, the vast majority of commerce is conducted using abstract bullae that either contain a token or not: bits. These on/off states (bullae that contain a token or not) that are the fundamental units of information theory also represent the billions of transactions and other information flowing from person to person or firm to firm.
At its heart, economics is the study of this information processing system.
...
Update: See also Eric Lonergan's fun blog experiment (click on the last link) on money and language. Additionally, Kocherlakota's Money is Memory [pdf] is relevant.
Credits:
I took liberally (maybe too liberally) from the first link here, and added to it.
http://sites.utexas.edu/dsb/tokens/
https://oi.uchicago.edu/sites/oi.uchicago.edu/files/uploads/shared/docs/nn215.pdf
Subscribe to:
Posts (Atom)


































