# Causal Analysis in Theory and Practice

## July 22, 2009

### Resolution of a Debate on Covariate Selection in Causal Inference

Filed under: Discussion,Opinion — judea @ 6:00 pm

Judea Pearl writes:

Recently, there have been several articles and many blog entries concerning the question of what measurements should be incorporated in various methods of causal analysis.The statement below is offered by way of a resolution that (1) summarizes the discussion thus far, (2) settles differences of opinion and  (3) remains faithful to logic and facts as we know them today.

The resolution is reached by separating the discussion into three parts:  1.  Propensity score matching  2. Bayes analysis 3. Other techniques

1. Propensity score matching. Everyone is in the opinion that one should screen variables before including them as predictors in the propensity-score function.We know that, theoretically, some variables are capable of increasing bias (over and above what it would be without their inclusion,) and some are even guaranteed to increase such bias.

1.1 The identity of those bias-raising variables is hard to ascertain in practice. However, their
general features can be described in either graphical terms or in terms of the "assignment mechanism", P(W|X, Y0,Y1),if such is assumed.

1.2 In light of 1.1, it is recommend that the practice of adjusting for as many measurements as possible should be approached with great caution. While most available measurements are bias-reducing, some are bias-increasing.The criterion of producing "balanced population" for
matching, should not be the only one in deciding whether a measurement should enter the propensity score function.

2. Bayes analysis. If the science behind the problem, is properly formulated as constraints over the prior distribution of the "assignment mechanism" P(W|X, Y, Y0,Y1), then one need not exclude any measurement in advance; sequential updating will properly narrow the posteriors to reflect both the science and the available data.

2.1 If one can deduce from the "science" that certain covariates are "irrelevant" to the problem at hand,there is no harm in excluding them from the Bayesian analysis. Such deductions can be derived either analytically, from the algebraic description of the constraints, or graphically, from the diagramatical description of those constraints.

2.2 The inclusion of irrelevant variables in the Bayesian analysis may be advantageous from certain perspectives (e.g., provide evidence for missing data) and dis-advantageous from others (e.g, slow convergence, increase in problem dimensionality, sensitivity to misspecification).

2.3 The status of intermediate variables (and M-Bias) fall under these considerations. For example, if the chain Smoking ->Tar-> Cancer represents the correct specification of the problem, there are advantages (e.g., reduced variance (Cox, 1960?)) to including Tar in the analysis even though the causal effect (of smoking on cancer) is identifiable without measuring Tar, if Smoking is randomized. However, misspecification of the role of Tar, may lead to bias.

3. Other methods. Instrumental variables, intermediate variables and confounders can be identified, and harnessed to facilitate effective causal inference using other methods, not involving propensity score matching or Bayes analysis. For example, the measurement of Tar in the example above, can facilitate a consistent estimate of the causal effect (of Smoking on Cancer) even in the presence of unmeasured confounding factors, affecting both smoking and cancer. Such analysis can be done by either graphical methods (Causality, page 81-88) or counterfactual algebra (Causality, page 231-234).

Thus far, I have not heard any objection to any of these conclusions, so I consider it a resolution of what seemed to be a major disagreement among experts. And this supports what Aristotle said (or should have said): Causality is simple.

Judea

1. Andrew Gelman comments (taken from his blog post):

I am not a causal inference expert in the way that Rosenbaum, Rubin, and Imbens are, by I will nonetheless give my thoughts on the above.

1. Propensity score matching is an important method, but I don&#39;t think it&#39;s fundamental in understanding causality. I think of propensity scores as a way of adjusting for large numbers of background variables. Again, I would point readers to the Dehejia and Wahba paper from 1999 which discusses the importance of controlling for key covariates. I think Pearl&#39;s discussion above is slightly confused by using the general term &#34;adjusting for.&#34; Rubin, Imbens, etc., will adjust for all variables, but not necessarily by including them in the propensity score.

2. Pearl&#39;s statement about Bayesian analysis seems reasonable to me.

3. The 1996 Angrist, Imbens, and Rubin paper puts instrumental variables into a clean Bayesian framework. I&#39;m sure there are non-Bayesian approaches that can solve these problems too. Finally, I don&#39;t agree with Pearl that causality is simple! I don&#39;t see any easy answers for the sorts of problems where you want to estimate a causal pathway through intermediate outcomes. See here for a pointer to Michael Sobel&#39;s recent discussion of these issues. All of us in the social sciences have seen lots of talks where you see a big table of regression coefficients and then the speaker interprets one after the other causally–despite the difficulty of interpreting a change in each with all others held constant. Two useful principles for me are (1) understand the data descriptively, in any case, and (2) perform a separate analysis for each causal claim. I&#39;m not saying these are general principles, but they&#39;ve helped me keep my head when things get confusing. Let me conclude the discussion by thanking Judea Pearl and the many commenters for a fascinating discussion. As I&#39;ve said before, the various methods of Pearl, Imbens and Rubin, Greenland and Robins, and others have all been useful to many researchers in different settings. I think it&#39;s helpful to develop statistical methods in the context of applications, and also to work toward theoretical understanding, as Pearl has been doing.

Comment by andrew gelman — July 29, 2009 @ 1:09 pm

2. Causality is simple, if we do not bend it.

Andrew,

I am glad we have concluded our discussion with only one major disagreement – whether causality is simple or hard. In support of the latter, you have pointed me to an article by Michael Sobel
( http://www.sociology.columbia.edu/pdf-files/msobel_text2.pdf) which, supposedly, finds special difficulties in defining and estimating mediation.

I have posted two quick responses to that paper, and I now I have had the chance to read Sobel&#39;s paper in greater detail. I would like to share my reaction with you and your readers, and to relate it to our question: Is causality simple?

First, even if Sobel convinces us that mediation presents a special problem to SEM researchers, it does not mean that causal analysis, as a discipline, is hard. The fact that we cannot solve two equations with three unknowns does not make highshool algebra a difficult subject — when we do not have the necessary information, we do not expect to produce a solution. Algebra is simple because it gives us the machinery to determine quickly whether we have enough information or not. The same applies to causal analysis — we now have that machinery at hand.

Now to Sobel&#39;s paper.

1. Background:
Researchers in the social sciences have been giving causal interpretation to structural coefficients. They have devised model-based criteria for identifying those coefficients and regression-based techniques for estimating them, and,once identified and estimated, they have considered the estimates as measuring direct causal effects among the corresponding variables.

2. Sobel&#39;s argument
Sobel argues against giving structural coefficients causal interpretation. His reason: These coefficients do not coincide, except in special cases, with the TRUE causal coefficients, where by &#34;true causal coefficients&#34; we mean those defined counterfactually.

Sobel further identifies an extra assumption (his equation (20)) that is needed to ensure equality between the structural and the &#34;causal&#34; coefficients (his Theorem 1) and recommends that SEM researchers use his criterion to &#34;reexamine the validity of previous work and ask if it is reasonable or not to assume (20) in a particular application.&#34;

3. Critique of Sobel argument
Sobel is wrong in defining structural coefficients in terms of regression, and in assuming that they are any different from the &#34;causal coefficients&#34; that he defines counterfactually. Early economists (Haavelmo, Marschak, Hurwitz, Simon, Fisher, Chris, even Goldberger and, of course, Heckman) have all given structural equations counterfactual interpretation (though not in formal notation). The definition of structural coefficients has nothing to do with regression; an estimation method that sometimes give the correct magnitude (of the structural coefficient) and sometimes does not.

I have examined Sobel&#39;s extra assumption (20) and found that, as expected, it coincides precisely with the standard SEM condition for the identification of structural coefficients (Specifically, that two error terms be uncorrelated, eps.(1) and eps.(2) in his Figure 1)

In general, it can be shown that IDENTIFIED structural coefficients always coincide with (the estimands of) their associated causal parameters and, moreover, the assumptions that justify the identification of structural coefficients are precisely those that are needed for consistent estimation of &#34;causal coefficients.&#34; For that reason, it is safe to speak about the structural coefficients themselves as BEING the causal coefficients.

For example, consider the under-identified structural equations
that Sobel uses to describe mediation:

M = a1*Z + eps1
Y = a2*Z + a3*M + eps2

Assume now that Z is NOT randomized. Rather, Z, eps1 and eps2 arehighly correlated. It is still perfectly safe to confer causal interpretation on a1, a2, and a3, and proclaim the total effect ofZ on Y to be T = a2 + a1*a3. It is also safe to equate a1*a3 with indirect effect of Z on Y, counterfactually defined as:

a1*a3 = E(Y(z, M(z))-Y(z,M(z&#39;) / (z-z&#39;)

as defined by Sobel. Our inability to identify a1*a3 given the information at hand, should not tarnish its causal interpretation.

In my paper on mediation (http://ftp.cs.ucla.edu/pub/stat_ser/R273-U.pdf)I show that, in linear systems, the counterfactual definition leads to the additive and multiplication rules of combining structural coefficients (Wright 1921). So, should SEM researchers panic and heed to Sobel warning to &#34;reexamine the validity of previous work and ask if it is reasonable or not to assume (20) in a particularapplication.&#34; ?? Absolutely not; they have already done so when they justified the assumptions that render the structural coefficients identifiable. Moreover, they have done so in a language that is much more transparent and meaningful thanthat recommended by Sobel.

To witness, the assumptions that two omitted factors be uncorrelated is many times more transparent than the same assumption articulatedin the language of ignorability, e.g., that thepotential outcome of the mediator, had assignmentbeen zero, be independent of the potential value of the outcome, had treatment assignment and mediating variable been at different levels.

In Causality chapter 11, I show that the condition of ignorability is subsumed by the condition of independence among omitted factors. I have met many researchers arguingabout omitted factors and none arguing aboutthe validity of ignorability — it is too cryptic. Sobel himself, when attempting to show that ignorability can be violated in his example,resorts to arguments based on omitted factors.(students&#39; smartness), not to potential outcome considerations. &#34;Students smartness&#34;even omitted, has a name, and anchors one&#39;s thoughts on the causal relationships that operate in the problem. Ignorability conditions anchor one&#39;s thought on outcomes of hypothetical &#34;black box&#34; experiments that are hard to envision and ascertain.

I prefer therefore to let SEM researchers continue to express knowledge in the communicable language of structural equations, and educate ourselves about how the counterfactual logic of the 21st century supports what they have set out to do in the 1950&#39;s, before their methods got messed up by bad economists and regression addicts.

My conclusion remains: Causality is simple, and I wish I knew why you thought that Sobel&#39;s article should spoil our optimism.

============Judea

Comment by judea — July 29, 2009 @ 1:14 pm