Information Transfer Economics: Wasserstein GAN and information equilibrium

Friday, June 16, 2017

Wasserstein GAN and information equilibrium

A figure from Arjovsky, Chintala, and Bottou showing the lack of a strong gradient in the Jensen-Shannon divergence, slightly edited for comedic effect.

This paper (Arjovsky, Chintala, and Bottou [ACB]) on Wasserstein Generative Adversarial Networks (WGANs) has been generating (no pun intended) a lot of buzz in the machine learning community. Earlier this year, I mentioned some intuition I had about a possible connection between GANs and information equilibrium as well as the potential for GANs to function as a model of markets.

ACB uses the Wasserstein distance (metric) (also called the Earth Mover's Distance) instead of the original Jensen-Shannon distance (divergence) (a symmetric version of the Kullback-Liebler divergence). One of the benefits of the W-metric for machine learning is that it tends to have non-zero gradients (so e.g. gradient descent solvers won't have as many issues as could happen with other metrics illustrated in the figure at the top of this post). In an odd coincidence, the W-metric has come up two other times recently in unrelated contexts in my real job.

I noticed that this approach also has some interesting connections to information equilibrium. For starters, the W-metric is very much the intuitive guide I usually give for the distribution of supply coming into equilibrium with demand. We have two distributions where a lump of the supply distribution is moved to some place where there is an excess in the demand distribution as part of our approach to equilibrium. Here's a figure illustrating the concept from a nice discussion of WGAN:

There is even a happy accident in terminology in that there is a "cost function" involved. There is an additional happy accident that the exact solution for histogram distributions is obtained via linear programming, discussed in the context of economics in this blog post. By the way, there is an another nice overview of ACB here.

One of the other interesting aspects of WGANs is that the GAN "discriminator" is replaced by a WGAN "critic" (per ACB via the previous link):

The critic makes much more sense in the context of economics: at constant demand, a low price is a "critic" of excess supply and a high price is a "critic" of scarce supply.

The GAN analogy is still a one-sided model of supply and demand (it is a model for constant demand and varying supply or vice versa), rather than a full "general equilibrium" analogy (where supply and demand react to changes in each other).

I'm going to close (for now) with another observation about some of the math involved, namely Lipschitz functions. One of the problems with the abstract WGANs is that the W-metric is generally intractable (like the linear programming economic allocation problem). However one can rewrite the problem using Kantorovich-Rubinstein duality which couches it in terms of Lipschitz functions which are (put simply) functions with bounded slope. The K-Lipshitz condition (bounded by slope K) is:

dₙ(f(m₁), f(m₂)) ≤ K dₘ(m₁, m₂)

for all

m₁ and m₂

where f(m) : m → n and the d's are metrics on the manifolds m and n. Longtime readers of this blog may know where this is going already: this can be represented as an information equilibrium (transfer) condition. If we have information transfer from "demand" A to "supply" B, then:

dA/A ≤ k dB/B

d(log A) ≤ k d(log B)

which is the infinitesimal version of the K-Lipschitz condition (for log A and log B) with the IT index k playing the role of the slope K. That is to say that information transfer relationships are "locally" k-Lipschitz. The definition of K-Lipschitz is actually a global (i.e. for all of m (i.e. log B)), so the KR duality trick would only work functions that are in information equilibrium (i.e. the equal sign, because then log A = k log B + c and the slope is actually equal to k for all log B measured from any two points log b₁ and log b₂).

[Update 19 June 2017: n.b. this also applies to the price p = dA/dB being locally K-Lipschitz, but with K = k − 1.]

The Lipschitz function representation of the WGAN also yields a solution up to an overall scale. I discuss the potential importance of scale invariance in economics in several posts (e.g. here or here).

I am not sure there is anything useful in the observation; it may be wildly off-base. I am still looking into the possible use of of GANs as a model of the "market algorithm", possibly showing us how markets work as well as under what conditions they don't work (and ways to improve them).

Information Transfer Economics

Friday, June 16, 2017

Wasserstein GAN and information equilibrium

No comments:

Post a Comment