Causal Analysis in Theory and Practice

May 17, 2007

More on Where Economic Modeling is Heading

Filed under: Discussion,Economics — judea @ 1:00 am

Judea Pearl writes:

My previous posting in this forum raised questions regarding Jim Heckman's analysis of causal effects, as described in his article, "The Scientific Model of Causality" (Sociological Methodology, Vol. 35 (1) page 40.)

To help answer these questions, Professor Heckman was kind enough to send me a more recent paper entitled: "Econometric Evaluation of Social Programs," by Heckman and Vytlacil (Draft of Dec. 12, 2006. Prepared for The Handbook of Econometrics, Vol. VI, ed by J. Heckman and E. Leamer, North Holland, 2006.)

This paper indeed clarifies some of my questions, yet raises others. I will share with readers my current thoughts on Heckman's approach to causality and on where causality is heading in econometrics.

(Post edited 5/4: revisions in red, thanks to feedback from David Pattison)
(Post edited 5/17: correction and new comments by LeRoy and Pearl)

A New Definition of Causal Effects
In their Handbook paper, Heckman and Vytlacil (HV) provide a new definition of causal effects, based on "external-variations," instead of shutting down equations (i.e., "surgery"). The definition is described semi-formally on page 77 and footnote 81 of their paper. The following is my extrapolation of their method as it applies to multi-equations and nonlinear systems.

Given a system of equations:

Y_i = f_i(Y, X, U) i = 1, 2,…, n

where X and U is sets of observed and unobserved external variables, respectively, the causal effect of Y_j on Y_k is computed in four steps:

Choose any member X_t of X that appears in f_j.
If X_t appears in any other equation as well, exclude it from that equation (e.g., set its coefficient to zero if the equation is linear or replace X_t by a constant).
Solve for the reduced form
Y_i = g_i(X, U) i = 1, 2, …., n
of the resulting system of equations.
Compute the partial derivative
dY_k/dY_j using the ratio dg_k/dX_t : dg_j/dX_t

The resulting ratio gives "the causal effect of Y_j on Y_k."

(Note: In the two-equations model discussed in my previous posting

Y₁ = a₁ + c₁₂Y₂ + b₁₁X₁ + b₁₂ X₂ + U₁ (4.8a)
Y₂ = a₂ + c₂₁Y₁ + b₂₁X₁ + b₂₂ X₂ + U₂ (4.8b)

the external-variation definition yields c₁₂ as the causal effect of Y₂ on Y₁, which is identical to the result obtained by the surgery definition. ~~In general, the two results may differ, as demonstrated below~~.)

External Variation vs. Surgery
In comparing their definition to the one provided by the surgery procedure, HV write (page 79): "Shutting down an equation or fiddling with the parameters … is not required to define causality in an interdependent, nonrecursive system or to identify causal parameters. The more basic idea is exclusion of different external variables from different equations which, when manipulated, allow the analyst to construct the desired causal quantities."

The following are my thoughts on this idea of HV.

In general, "exclusion" involves the removal of a variable from an equation and amounts to "fiddling with the parameters." It is, therefore, a form of "surgery" – a modification of the original system of equations — and would be subject to the same criticism one may raise against "surgery." I will refute such criticism in items 3 and 4 below, noting that if it ever has a grain of validity, the criticism would apply equally to both methods. I will then argue that "surgery" is a more basic idea than "exclusion", more solidly motivated and more appropriate for policy evaluation tasks.
The idea of relying exclusively on external variables to reveal internal cause-effect relationships has its roots in the literature on IDENTIFICATION (e.g., as in the studies of "instrumental variables") because such variables transmit the only manipulations present in observational studies. This restriction however, is unjustified in the context of DEFINING causal effect, since "causal effects" are meant to quantify effects produced by NEW external manipulations, not necessarily those shown explicitly in the model. Moreover, every causal structural equation model, by its very nature, provides an implicit mechanism, for emulating such external manipulations via surgery.
Indeed, most policy evaluation tasks are concerned with NEW external manipulations which exercise direct control over endogenous variables, namely, surgeries. Take for example a manufacturer deciding whether to double the current price of a given product after years of letting the price track the cost, i.e., price = f(cost). Such decision amounts to removing the equation price = f(cost) in the model at hand, (i.e., the one responsible for the available data), and replacing it with a constant equal to the new price. This removal is necessary for evaluating the decision at hand, and no external variables can help us avoid it.

Or take the example of evaluating the impact of terminating an educational program for which students are admitted based on a set of qualifications . The equation admission = f(qualifications) will no longer hold under program termination, and no external variable can simulate the new condition (i.e., admission = 0) save for one that actually neutralizes (or "ignores", or "shuts down") the equation admission = f(qualifications).

(NOTE: the method used in Haavelmo (1943) to define causal effects is mathematically equivalent to surgery, not to external variation. Instead of replacing the equation Y_j = f_j(Y, X, U) with

Y_j = y_j

as would be required by surgery, Haavelmo writes Y_j = f_j(Y, X, U) + x_j where x_j is chosen so as to
make Y_j constant Y_j = y_j. Thus, since x_j liberates Y_j from any residual influence of f_j(Y, X, U), Haavelmo's method is equivalent to that of surgery. Heckman's method of external variation leaves Y_j under the influence f_j.)
Definitions based on external variation have the obvious flaw that the target equation may not contain any observable external variable. In fact, in many cases the set of observed external variables in the system is empty. Additionally, a definition based on a ratio of two partial-derivatives does not generalize easily to non-linear systems with discrete variables. Thus, those who accept Heckman's restrictions would be deprived of the many identification techniques now available for instrument-less models (see Causality, chapter 3 and 4) and, more embarrassingly yet, they would be unable to even ask whether causal effects are identified in any such model — identification questions are meaningless for undefined quantities.
Fortunately, liberated by the understanding that definitions can be based on purely symbolic manipulations, we can modify Heckman's proposal and ADD fictitious external variables to any equation we desire. The added variables can then serve to define causal effects in a manner similar to the four steps in (2) (assuming continuous variables). This brings us closer to surgery, with one basic difference of leaving Y_j under the influence of f_j(Y,X,U).
Having argued that definitions based on "external variation" are conceptually ill-motivated, we now explore whether these definitions yield correct causal effects, identical to those defined by the surgery logic.
Consider a system of 3 equations:
Y₁ = aY₂ + cY₃ + U₁
Y₂ = bY₁ + X + U₂
Y₃ = dY₁ + U₃
Needed: the causal effect of Y₂ on Y₁.

The system has one external variable, X, which appears in the second equation alone, hence no exclusion is necessary. Applying the "external variation" procedure (following the 4-steps above), the reduced form yields:

                                        dY₁/dX = a/(1-ba-cd)      dY₂/dX= (1-cd)/ (1-ab-cd)

and the causal effect of Y₁ on Y₂ calculates to:

                                                               dY₁/dY₂ = a/(1-cd)

In comparison, the surgery procedure yields the following modified system of equations:
Y₁ = aY₂ + cY₃ + U₁
Y₂ = y₂
Y₃ = dY₁ + U₃

from which we obtain for the causal effect of Y₂ on Y₁;

                                                                 dY₁/dy₂ = a/(1-cd)

an identical expression to that obtained from the "external variation" procedure.

It is highly probable that the results of the two procedures always coincide, though I cant see an easy proof. Perhaps readers can provide the answer.

Criticism 1: Parameter Stability

"Shutting down one equation might also affect the parameters of the other equations in the system and violate the requirement of parameter stability" (HV page 79).

In the physical world, creating the conditions dictated by a "surgery" may sometimes affect parameters in other equations. The same applies to exclusion, which is a form of surgery (see item 2 above). For example, some parameters may depend on the excluded variable or on the coefficient of the excluded variable. However, we are dealing here with symbolic, not physical manipulations. Our task is to craft a meaningful mathematical definition of "the causal effect of one variable over another" from a symbolic system called a "model." This permits us to manipulate symbols at will, while ignoring the physical consequences of these manipulation. Physical considerations need not enter the discussion of DEFINITION.

Criticism 2: Equation ambiguity in non-causal systems

"In general, no single equation in a system of simultaneous equations uniquely determine any single outcome variable" (HV page 79). Heckman and Vytlacil refer here to systems containing non-directional equations, namely, equations in which the equality sign does not stand for the non-symmetrical relation "is determined by" or "is caused by" but for symmetrical algebraic equality. In econometrics, such non-causal equations usually convey equilibrium or resource constraints; they impose equality between the two sides of the equation but do not endow the variable on the left hand side with a special status of an "outcome" variable.

The presence of non-directional equations creates ambiguity in the surgical definition of the counterfactual Y_x, which calls for replacing the equation determining X with the constant equation X=x. If X appears in several equations, and if the position of X in the equation is arbitrary, then each one of those equations would be equally qualified for replacement by X=x, and the value of Y_x (i.e., the solution for Y after replacement) would be ambiguous.

(Note that this problem does not occur in directional nonrecursive systems (i.e., systems with feedbacks) since in such systems each variable is an "outcome" of precisely one equation.)

HV paper creates the impression that equation ambiguity is a flaw of the surgery definition and does not plague the exclusion-based definition. However, this is not the case. In a system of non-directional equations, we have no way of knowing which external variable to exclude from which equation to get the right causal effect.

For example: Consider Eqs. (4.8a)-(4.8b) in HV page 75.

Y₁ = a₁ + c₁₂Y₂ + b₁₁X₁ + b₁₂ X₂ + U₁ (4.8a)
Y₂ = a₂ + c₂₁Y₁ + b₂₁X₁ + b₂₂ X₂ + U₂ &n
bsp; (4.8b)

Suppose we move Y₁ to the lhs of (4.8b) and get:

Y₁ = [a₂ – Y₂ + b₂₁X₁ + b₂₂ X₂ + U₂]/c₂₁ (4.8b')

To define the causal effect of Y₂ on Y₁, we now have a choice of excluding X₂ from (4.8a) or (4.8b'). The former yields c₁₂, while the latter yields 1/c₂₁. We see that the ambiguity we have in choosing an equation for surgery now translates into ambiguity in choosing an equation and an external variable for manipulation.

Remark: Methods of breaking this ambiguity were proposed by Simon (1953) and are discussed in some detail in (Pearl 2000, Causality, page 226-228).

Summary
To summarize, the idea of constructing causal quantities by exclusion and manipulation of external variables, while soundly motivated in the context of identification problems, has no logical basis when it comes to model-based definitions. ~~It may yield erroneous results in nonrecursive systems, and suffers from problems of ambiguity in non-directional systems.~~ Definitions based on surgery, on the other hand, enjoy generality, semantic clarity and immunity from "parameter instability" concerns.

So, where does this leave econometric modeling? Is the failure of the "external variable" approach central or tangential to economic analysis and policy evaluation?

In almost every one of his recent articles Jim Heckman stresses the importance of counterfactuals as a necessary component of economic analysis and the hallmark of econometric achievement in the past century. For example, the first paragraph of HV article reads: "they [policy comparisons] require that the economist construct counterfactuals. Counterfactuals are required to forecast the effects of policies that have been tried in one environment but are proposed to be applied in new environments and to forecast the effects of new policies." Likewise, in his Sociological Methodology article (2005), Heckman states: "Economists since the time of Haavelmo (1943, 1944) have recognized the need for precise models to construct counterfactuals … The econometric framework is explicit about how counterfactuals are generated and how interventions are assigned…"

I totally agree with Heckman on the centrality of counterfactuals in economic analysis. However, I am not aware of even one econometric article or textbook in the past 40 years in which counterfactuals or causal effects are properly defined. Economists working within the potential-outcome framework of the Neyman-Rubin model take counterfactuals as undefined primitives, totally detached from the knowledge encoded in structural equations models. Economists working within the structural equations framework are busy estimating parameters while treating counterfactuals as metaphysical ghosts that should not concern ordinary mortals. They trust leaders such as Heckman to define precisely what the policy implications are of the structural parameters they labor to estimate, and to relate them to what their colleagues in the potential-outcome camp are doing.

Fortunately, a simple and precise unification of the two approaches can be achieved using the mathematical properties of the surgery operation (see Causality, page 98-102). Economists will do well resurrecting the basic surgery ideas of Haavelmo (1943) Marschak (1950) and Strotz and Wold (1960) and re-invigorating them with the logic of Graphs and counterfactuals developed in the past two decades.

Comments (4)

4 Comments »

1. I agree that the Heckman procedure (described also in his 2000 Quarterlly Journal of Economics article) should be modified to allow hypothetical manipulations of any variable in the system, regardless of where external variables enter the equations.  For the equation in the system for variable i,   yi=gi(y,x,u) insert a manipulating transformation Δi:   yi=Δi(gi(y,x,u)). This can be envisioned as a sort of "in-line" manipulation, in which the unmanipulated system variable yi'=gi(y,x,u) is transformed into the manipulated yi=Δi(yi').  If the transformation Δi can be parameterized with a single continuous parameter, we can let Δi represent the parameter and consider the derivatives of Δi.  For many purposes the manipulation can be a simple added term, yi=gi(y,x,u)+Δi.  For linear systems the manipulation will then be on par with other external variables:   yi=ayk+bxi+ui+Δi. But the manipulations don't have to be additive.  The same result would be gotten from a multiplicative manipulation like yi=(1+Δi)(ayk+bxi+ui).

2. The causal parameter for the effect on yj of manipulations affecting yi is then given by (δyj/Δi)/(δyi/Δi).  In a linear system in which a vector of Δs is added to the right hand side, these ratios can be read off the columns of the reduced-form matrix. This gives the same results as Heckman's procedure in equations for which there are external variables present.  But it also defines causal parameters for equations with no external variables. The surgery procedure also works with any equation, but instead of adding a manipulation, it fixes the value of the equation's variable and looks at the effects of variations in the value of that fixed variable.  In a simultaneous-equations system, indirect effects that feed back through the equation's own variable are barred from having an effect under the surgery procedure.  As long as these indirect effects continue to change both the equation's own variable and the other variable in the same proportions as in the direct or "first round" changes, the surgery procedure and the Heckman procedure give the same results. There is a difference, however, that manifests itself when the full system is solvable but the amputated system is not.  Consider, as an example,   Y1=Y3-Y2.   Y2=Y1.   Y3=Y1. I.e., the 3-equation example in the posting with a=-1 and b=c=d=1 (and U1=U2=U3=0).  Then 1-ab-cd=1 but 1-cd=0.  (These are the determinants of the full system and the amputated system.)  The full system solves to Y1=Y2=Y3=0, and if Y2 is manipulated to be Y1+h solves to Y1=Y3=-h and Y2=0.  The ratio of the changes is infinite –  (δy2/Δ2)/(δy2/Δ2)=-1/0 — but this is an appropriate description of a manipulation that affects the values of other variables without affecting the manipulated variable. Under the surgery procedure for the second equation, however, the system becomes the unsolvable   Y1=Y3-Y2.   Y2=y2.   Y3=Y1. Any non-zero value for the fixing variable y2 forces the system into inconsistency.

3.  I can see why Heckman's ratio (δy2/Δ2)/(δy2/Δ2) might be called the causal parameter for the effect of Y2 on Y1.  It is the ratio of changes in Y1 to the effects of a manipulation of Y2.  But still, it is the ratio of two effects, not the ratio of an effect to a cause.  So I'm hesitant to call it the causal effect of Y2 on Y1.  It it the effect on Y1 of changes that operate through Y2.

4. Defining the causal parameters through the hypothetical manipulations separates the definition problem from the identification problem.  Of course, to identify the causal parameters, some external variable X must be present whose observed variations can take the place of the hypothetical manipulations.  But the hypothetical manipulations are also different from surgery that directly manipulates Y2 by removing all influence of the inputs to Y2.  Any feedback to Y2 needs to be left in the system.

5. I suspect manipulations in the form of y=Δ(g(y,x,u)) can be defined for discrete systems and other non-linear systems.  A latent index model of a discrete variable would still allow the partial derivatives to be used, although the causal parameters will no longer be constant.  And I don't see why discrete or probabilistic structures can't be handled as well.  As long as the manipulation can be defined over the range of inputs (including feedback) that it will encounter, the system without and with the manipulation is defined.

6. I don't see the ambiguity in the simultaneous equations manipulations.  As long as there is one equation per endogenous variable, won't the causal parameters be the same?  In the example given of the ambiguity, ther are two equations for Y1 and none for Y2.  There should be one for each.

7. Causal interpretations haven't really fallen out of favor in econometrics.  It's just that estimating the causal effects (as opposed to defning them) has proven to be extraordinarily difficult.

Comment by David Pattison — May 7, 2007 @ 10:03 am
To me the most interesting feature of the exchange between Professors Heckman and Pearl is that they agree that in the system   Y₁ = a₁ + c₁₂Y₂ + b₁₁X₁ +b₁₂X₂ , Y₂ = a₂ + c₂₁Y₁ + b₂₁X₁ + b₂₂X₂   the effect of Y₂ on Y₁ is measured by the coefficient c₁₂.  Thus the answer to the question “What is the effect of Y₂ on Y₁?” is given by that coefficient.    To me it seems obvious that, contrary to this, the question is misposed: the assumed change in Y₂ could have been caused either by X₁ or X₂ (or by a combination of the two), each of which would lead to a different value of Y₁.  In the absence of further restrictions, Y₁ and Y₂ play symmetric roles in these equations, so neither is causally prior to the other.  We aren’t given enough information about the intervention on Y₂ to be able to determine its effect on Y₁.   Pearl’s justification via “surgery” for his identification of c₁₂ as the requisite causal coefficient is based on the assumption that economic models are modular.  Many authors – James Heckman, Nancy Cartwright, most recently Damien Fennell – have pointed out that economic models rarely if ever have this property.    Heckman, on the other hand, does not explain why his algorithm, which is summarized by Pearl on this page, gives the answer to the question “What is the effect of Y₂ on Y₁?” The coefficient will be c₁₂ only if the intervention takes the form of a particular pair of shifts of X₁ and X₂, and there is no reason to single out this pair from among all those that lead to a particular change in Y₂.   Under Heckman’s algorithm the effect evidently depends on the independent validity of the particular structural form given above – if the original structural form were replaced by the reduced form the effect of Y₂ and Y₁ by Heckman’s criterion would be 0.  Pearl would not find this a problem since under his characterization of structural equations the reduced form is not equivalent to the original structural equations.  Most economists, in contrast, starting with Herbert Simon, have sought definitions of causality that are invariant to algebraic operations on the individual equations.   It is depressing that we have to keep debating such apparently elementary questions half a century after the Cowles economists made their contributions.  One has to agree with Pearl’s judgment that we economists have lost our way in this area.

Comment by Stephen LeRoy — May 15, 2007 @ 3:40 pm
LeRoy's comment introduces a fresh new element into the discussion of causality in economics. His most refreshing statement, in my opinion, and the one with which I could not agree more, is his last statement:

"It is depressing that we have to keep debating such apparently elementary questions half a century after the Cowles economists made their contributions. One has to agree with Pearl's judgment that we economists have lost our way in this area."

Coming from a seasoned economist, it adds weight to our discussion on this blog, and it might perhaps jolt Heckman to reconsider some of his teachings about modeling, counterfactuals and what the key ingredients are in defining causal effects. LeRoy's main critics of the surgery analysis is that it is based on a "causal model", namely, on the assumption that all equality signs in structural equations stand for causal, non-algebraic "assignment" operators, where the lhs of each equation is "determined by" the rhs, and not the other way around. According to LeRoy, "Most economists, in contrast, starting with Herbert Simon, have sought definitions of causality that are invariant to algebraic operations on the individual equations." Thus, unlike Simon who allowed for some equations to be causal and some algebraic, LeRoy takes an extreme position, seeking an interpretation of causal effect in a system of equations all of which are algebraic. (Note that he is still retaining the assumption of modularity, since the admissible algebraic operations that he permits are limited to those operating on individual equations, not those that combine equations.)

As stated in my original posting, when we admit non-causal equations in the system, some causal effects may be ambiguous (See Simon (1953) and Causality page 226-228). However, I do not agree that most economists would feel comfortable working with strictly algebraic equations, forbidding at least some equations from directly encoding causal relationships. I would argue that some, if not most of economical equations are distinctly causal, especially those that characterize agents behavior. For example, consider the classical demand equation,      

                                          Q_d = a P_d + I

stating that the quantity Q of a commodity purchased by a consumer depends on the price P the seller asks for that commodity (plus other factors, I.) This equation is unquestionably causal; it is the price P that determines what quantity Q the consumer is willing to buy, and not the other way around. The reciprocal relation, describing the seller's willingness to adjust the price depending on the size of the buyer's order is described by a different equation in the system,   

                                          P_s = b Q_s + W

and is also causal in nature. The same can be argued about any quantity that is determined by human decision maker — it is determined by the information set available to the decision maker at decision time, and not the other way around. Of course, equations that express equilibrium conditions (e.g., Q_d = Q_s) or resource constraints are different, are symmetrical, and require a special treatment, because they summarize the result of elaborate temporal processes (e.g., bargaining strategy, inventory build up) that is not part of the model.Thus, a proper economic model would be one that acknowledges the need to accommodate both causal and symmetrical relationships, distinguishes the former from the latter, and permit meaningful definition of causal effect in such a hybrid system. Heckman seems to recognize the need for such accommodation, stating: "In general, no single equation in a system of simultaneous equations uniquely determine any single outcome variable" (Heckman and Vytlacil, 2006, page 79) but, then, he forgets about equation ambiguity in his definition of causal effect, where it is assumed that we know which external variable to exclude from which equation to get the causal effect of one variable on another. The question I have for LeRoy is whether he would endorse the surgery definition of causal effects in systems in which all equations are causal. From reading LeRoy's previous articles I assume the answer is: no (see Causality, page 136, and Cartwright, 2007, page 244-46), but perhaps he has changed his mind since.

Comment by judea — May 17, 2007 @ 12:38 am
I am happy to have finally offered an opinion on some aspect of causality that Prof. Pearl agrees with (I’m referring to my observation that since the days of the Cowles Commission economists have not done very good job in systematizing their thinking about causality, nor have they attached much importance to doing so).  His posting raises several other issues that deserve discussion.  In his first paragraph he quotes my sentence “Most economists … have sought definitions of causality that are invariant to algebraic operations on individual equations.”   Including the phrase “on individual equations” was a slip: I didn’t mean to imply that causal orderings properly defined are altered by algebraic operations that combine equations.  We cannot have a definition of causal orderings that implies that they are altered by arithmetic operations of any sort.  Derivation of any model involves a series of such operations; if these alter the causal ordering there is no reason to attach significance to whatever causal ordering describes the final product.  My point was that under Simon’s definition as set out in his 1953 paper, causal orderings are in fact invariant to any sort of arithmetic operations, including combining equations and reversing whichever variable is on the left-hand side of any equation. This is easy to verify from examples. In contrast to this, Prof. Pearl writes that Simon “allowed for some equations to be causal and some algebraic.” I would like to see a textual analysis in support of this point.  I may have been wrong in saying that most economists take the view expressed above, which essentially requires that causal orderings be defined from the reduced form, not the structural form, of a system of simultaneous equations. Prof. Heckman, for example, does not share this view, as I suggested in my earlier posting.  I exchanged a few emails with Prof. Heckman about this material a couple of years ago.  I’m afraid neither of us jolted the other.  I still don’t agree that price causes quantity in demand equations, but quantity causes price in supply equations.  Prof. Pearl asks whether I would ever endorse the surgery characterization of causality. As I, among many, have observed, the surgery characterization of causality assumes modularity, and economic models are usually not modular. But in special cases the surgery representation is acceptable.  For example, if two internal variables have disjoint external sets (the external set for any internal variable is the set of external variables that determine it), then the analyst can perform surgery on either internal variable without altering the equation determining the other. But this is a very strong condition.

Comment by Stephen LeRoy — May 18, 2007 @ 1:06 pm

RSS feed for comments on this post.

May 17, 2007

More on Where Economic Modeling is Heading

4 Comments »

Leave a comment