Monday, September 26, 2016

On using Taylor expansions in economics

Jo Mitchell put up a tweet about a conversation with a theoretical chemist:

I'm fairly sure that the chemist's response must have been based on little other information about macroeconomics because after immersing myself in the subject this physicist doesn't see anything wrong with keeping just linear order terms.

One possibility is that the chemist misunderstood first term to mean just the zero-order polynomial (i.e. a constant), but I will take this to mean the first non-constant term (which may in fact be the quadratic one for reasons I'll go into below). For those unfamiliar with the idea, a Taylor expansion is a polynomial approximation to a more complex function, and the 'terms' are the pieces proportional to the powers of the variable. Basically, for any smooth enough function f(x) near x = a, we can say

f(≈ a) ≈ f(a) + (df/dx|x=a) (x − a) + (1/2)(d²f/dx²|x=a) (x − a)² + ...

where "F|x=a" means "F evaluated at x = a". This shows the zero order, first order and second order terms. Note that the first and second derivatives are evaluated at the point a that you are approximating the function near, and can therefore be considered constants:

f(x) ≈ c₀ + c₁ (x − a) + c₂ (x − a)² + ...

At a given order, this approximation is usually only good inside a limited region where x ≈ a. Taylor expansions are used in lots of places -- and are typically useful if the variable x stays in some neighborhood of a or x itself is small. In the case where x ≈ 0, it is technically called a Maclaurin series, which I only mention in order to post my picture of Colin Maclaurin's memorial in Greyfriar's Kirkyard in Edinburgh (again).

Anyway, a few really useful Taylor (Maclaurin) series expansions (to order ) are

sin(x) ≈  x
cos(x) ≈ 1 − x²/2
log(1+x) ≈ x - x²/2

That last one crops up in economics all the time; if you express your variable as a deviation from 100% (i.e. 1) and keep only the linear term, then the logarithm is approximately equal to that percent difference: log(100% + x%) ≈ x%. This is the basic idea behind log-linearization [pdf]. That also tells us that keeping only the linear terms isn't that big of a problem. For example, a bad recession involves a 10% shock to output or employment. The error in log(1+x) from keeping the linear term only is ~ x²/2, or about 0.1²/2 = 0.005 = 0.5%. Not bad. If you compound growth over many years, this starts to become an issue, though. For example, 2% inflation over 50 years leads to 50% error in the log-linear approximation.

In addition to not being numerically useful to go beyond leading order in macroeconomics, there are also a couple of other issues that might give you pause when using Taylor expansions.

We usually choose f(x = a) near an equilibrium in sciences

In economics, equilibrium isn't necessarily well defined (and even worse, just assumed), and the higher order terms in the Taylor expansion represent parameter space that is even further from that ill-defined equilibrium. Tread lightly in those dark corners! In physics, chemistry, and other sciences, this equilibrium is well-defined via some maximization/minimization principle (maximum entropy, minimum energy, etc) with an interior optimum and one can use that fact to your advantage. Being near an optimum means the linear term is c₁ ≈ 0, leaving only the second order term. You may think that the utility maximum in economics is a local optimum, however it is usually a utility maximum over a bounded region (e.g. budget constraint) meaning the optimum is on the edge so the linear term doesn't necessarily vanish (why I mentioned the interior optimum above).

Also, in the sciences, you degrees of freedom might change when you move away from the linear zone near x ≈ a. I am under the impression that rational agents are really only valid in a narrow region near macroeconomic equilibrium. In an ideal gas, the rotational or vibrational modes of your molecules might become important, or the thermal wavelength may become on the order of the deBroglie wavelength (quantum effects become important).

The function f(x) is usually ad hoc in economics

The function f(x) in macroeconomics is usually some guess (ansatz) like a Cobb-Douglas function or CES function. Taylor expanding an ad hoc function is really just choosing the c₁'s and  c₂'s to be some arbitrary parameters. This contrasts with the case in physics, chemistry, and other sciences where the function f(x) is usually not ad hoc (e.g. expanding the relativistic energy gives you the classical kinetic energy term at second order in v/c), or you are near an equilibrium in which case  c₁ ≈ 0 and adding c₂ doesn't lead to your problem having more parameters than a general linear case.

It makes the identification problem even worse in economics

Identification at linear order already has versions of the coefficient c₁ for m macroeconomic observables (an m×m matrix). Going to second order adds another parameters (the c₂'s). As mentioned by Paul Romer in the case of adding expectations, adding second order (nonlinear) terms makes the identification problem twice as bad (because the functions are ad hoc as mentioned above). The reason you have so many parameters is that the original m equations don't come from some theoretical framework like you have in physics and chemistry (where symmetries or conservation laws constrain the possible parameters).

And last but not least ...

Macroeconomics doesn't have a working linear theory yet

Some people might say there isn't a working linear theory yet because those second order terms are important. However given that a major recession is a 10% fall in RGDP growth this seems unlikely. In fact, RGDP per capita is a a fairly straight line (log-linear). There are some exceptions, but they are not frequent (e.g. France and Japan transitioned from one log-linear path to a different one after WWII). That is to say, unless you are dealing with a WWII-level disruption, the data is pretty (log-)linear. Once we get that down, we can start to try to understand a more complicated nonlinear theory.


Anyway, that is why I think it's fine to keep only those first terms. It has nothing to do with the mathematics, but rather the theoretical and empirical state of macroeconomics. The field still needs its linear training wheels, so let's not laugh.


Update 2 November 2016

Gauti Eggertsson and Sanjay Singh demonstrate this explicitly with a New Keynesian model.


  1. "The function f(x) is usually ad hoc in economics"

    In physics the function usually comes from solving a differential equation, doesn't it? I wonder if anyone has tried to create economic models based on differential equations, and where that got him. I guess the IE model is kinda like that.

    1. Well, in a sense the DSGE models that are log-linearized are differential equations (finite difference equations that are solved numerically because they tend not to have closed form solutions) -- even the Cobb Douglas form can be written as a solution to a differential equation.

      Actually, marginalism (used to set up the Cobb Douglas function) is a great differential equation to use as a starting point -- and is equivalent to the IT/IE differential equation for IT index k = 1.

      In physics, the key is that the differential equations are set up using symmetry principles (Newton's laws can be expressed as time and space translation symmetries). In the case of relativity mentioned above, the kinetic energy (1/2) m v² is an expansion of p² = pᵦpᵝ, which is really just a statement that momentum p is a contravariant 4-vector and that behaves in a certain way under Lorentz symmetry transformations (and that m² is invariant). These symmetry principles are absent in economics.

      And because those symmetry principles are absent, the matrix you get for m equations in m unknowns has ~ m² parameters ... instead of a smaller number because the matrix entries are related by symmetry.

      However, I try to point out that the IE differential equation is the simplest equation relating two variables consistent with scale invariance (a (subset of) conformal symmetry). Actually, I look at some possible generalizations of IE following this principle here.

      [Interestingly, I just checked and the IE equation is also invariant under inversion of the two variables A → 1/A and B → 1/B. This may have consequences I want to explore ... ]

      It's these symmetry principles that make the functions f(x) in physics models less ad hoc, therefore controlling the coefficients of the Taylor expansion.

    2. I see. Too much focus on the mathematical details has made me oblivious to how symmetry principles underlie the physical laws you mention. :/

      Uniform distribution is also an assumption of symmetries (invariance under permutations) when I come to think of it.

      This point of view is very interesting.

    3. Yes, uniform distributions definitely represent a symmetry principle.

      Actually, I've expanded on my note about invariance under inversion into a post ...

    4. By the way, I don't think it should be too difficult to find all the symmetries of the IE relation, or at least the ones that are linear (although the one that you mention here - inversion - isn't linear). After all, it's a relatively simple equation.

    5. Yeah, I'm pretty sure scale invariance and inversion pretty much exhaust the (interesting) symmetries of the IE equation.

      Now I have to find out if the inversion symmetry has ramifications in the real world. For instance, some of the terms I proposed in the expansion here (such as the constant term) are not consistent with inversion.

    6. What about power transformations? I think the IE equation is also invariant under the transformations $A \mapsto A^{\gamma}, B \mapsto B^{\gamma}$ for any nonzero $\gamma$. Example: A -> A^2, B-> B^2. Inversion is the special case where $\gamma = -1$.

    7. I played with the equation a bit more and I think I found pretty much all the symmetries. The equation is invariant under a transformation $A \mapsto f(A), B \mapsto g(B)$ if and only if the pair of functions f,g is of the form:

      f(A)= a*A^c
      g(B)= b*A^c

      where a,b,c is any triple of constants with the condition that b and c are nonzero (otherwise you run into division by zero). I can post the computation if you want.