Resolution of a Debate on Covariate Selection in Causal Inference
Judea Pearl writes:
Recently, there have been several articles and many blog entries concerning the question of what measurements should be incorporated in various methods of causal analysis.The statement below is offered by way of a resolution that (1) summarizes the discussion thus far, (2) settles differences of opinion and (3) remains faithful to logic and facts as we know them today.
The resolution is reached by separating the discussion into three parts: 1. Propensity score matching 2. Bayes analysis 3. Other techniques
1. Propensity score matching. Everyone is in the opinion that one should screen variables before including them as predictors in the propensity-score function.We know that, theoretically, some variables are capable of increasing bias (over and above what it would be without their inclusion,) and some are even guaranteed to increase such bias.
1.1 The identity of those bias-raising variables is hard to ascertain in practice. However, their
general features can be described in either graphical terms or in terms of the "assignment mechanism", P(W|X, Y0,Y1),if such is assumed.
1.2 In light of 1.1, it is recommend that the practice of adjusting for as many measurements as possible should be approached with great caution. While most available measurements are bias-reducing, some are bias-increasing.The criterion of producing "balanced population" for
matching, should not be the only one in deciding whether a measurement should enter the propensity score function.
2. Bayes analysis. If the science behind the problem, is properly formulated as constraints over the prior distribution of the "assignment mechanism" P(W|X, Y, Y0,Y1), then one need not exclude any measurement in advance; sequential updating will properly narrow the posteriors to reflect both the science and the available data.
2.1 If one can deduce from the "science" that certain covariates are "irrelevant" to the problem at hand,there is no harm in excluding them from the Bayesian analysis. Such deductions can be derived either analytically, from the algebraic description of the constraints, or graphically, from the diagramatical description of those constraints.
2.2 The inclusion of irrelevant variables in the Bayesian analysis may be advantageous from certain perspectives (e.g., provide evidence for missing data) and dis-advantageous from others (e.g, slow convergence, increase in problem dimensionality, sensitivity to misspecification).
2.3 The status of intermediate variables (and M-Bias) fall under these considerations. For example, if the chain Smoking ->Tar-> Cancer represents the correct specification of the problem, there are advantages (e.g., reduced variance (Cox, 1960?)) to including Tar in the analysis even though the causal effect (of smoking on cancer) is identifiable without measuring Tar, if Smoking is randomized. However, misspecification of the role of Tar, may lead to bias.
3. Other methods. Instrumental variables, intermediate variables and confounders can be identified, and harnessed to facilitate effective causal inference using other methods, not involving propensity score matching or Bayes analysis. For example, the measurement of Tar in the example above, can facilitate a consistent estimate of the causal effect (of Smoking on Cancer) even in the presence of unmeasured confounding factors, affecting both smoking and cancer. Such analysis can be done by either graphical methods (Causality, page 81-88) or counterfactual algebra (Causality, page 231-234).
Thus far, I have not heard any objection to any of these conclusions, so I consider it a resolution of what seemed to be a major disagreement among experts. And this supports what Aristotle said (or should have said): Causality is simple.
Judea