# Causal Analysis in Theory and Practice

## November 30, 2009

### Measurement Cost and Estimator’s Variance

Sander Greenland from UCLA writes:

The machinery in your book addresses only issues of identification and unbiasedness. Of equal concern for practice is variance, which comes to the fore when (as usual) one has a lot of estimators with similar bias to choose from, for within that set of estimators the variance becomes the key driver of expected loss (usually taken as MSE (mean-squared-error = variance+bias^2). Thus for example you may identify a lot of (almost-) sufficient subsets in a graph; but the minimum MSE attainable with each may span an order of magnitude. On top of that, the financial costs of obtaining each subset may span orders of magnitudes. So your identification results, while important and useful, are just a start on working out which variables to spend the money to measure and adjust for. The math of the subsequent MSE and cost considerations is harder, but no less important.

Judea Pearl replies:

You are absolutely right, it is just a start, as is stated in Causality page 95. The reason I did not  emphasize the analysis of variance in this book was my assumption that, after a century of extremely fruitful statistical research, one would have little to add to this area.

My hypothesis was:

Once we identify a causal parameter, and produce an estimand of that parameter in closed mathematical form, a century of statistical research can be harnessed to the problem, and render theestimation task a routine exercise in data analysis. Why spend energy on areas well researched when so much needs to be done in areas of neglect?

However, the specific problem you raised, that of choosing among competing sufficient sets, happens to be one that Tian, Paz and Pearl (1998) did tackle and solved. See Causality page 80, reading: “The criterion also enable the analyst to search for an optimal set of covariates — a set Z that minimizes measurement cost or sampling variability (Tian et al, 1998).” [Available at http://ftp.cs.ucla.edu/pub/stat_ser/r254.pdf] By “solution”, I mean of course, an analytical solution, assuming that cost is additive and well defined for each covariate. The paper provides a polynomial time algorithm that identifies the minimal (or minimum cost) sets of nodes that d-separates two nodes in a graph. When applied to a graph purged of outgoing arrows from the treatment node, the algorithm will enumerate all minimal sufficient sets, i.e., sets of measurements that de-confound the causal relation between treatment and outcome.

Readers who deem such an algorithm useful, should have no difficulty implementing it from the description given in the paper; the introduction of variance considerations though would require some domain-specific expertise.

## November 10, 2009

### The Intuition Behind Inverse Probability Weighting

Filed under: Discussion,Intuition,Marginal structural models — moderator @ 11:00 pm

### Michael Foster from University of North Carolina writes:

I’m an economist here in the UNC school of public health and trying to work on the intuition of MSM for my non-methodologists collaborators. My bios and epi colleagues can give me mechanical answers but are short on intuition at times. Here are two questions:

1. Consider a regressor that is a confounding variable but that is also a victim of unobserved confounding itself. Why does weighting with this troublesome covariate not cause bias that regression causes (collider bias)? In this case, I’m principally thinking about past exposures and how to handle them in an analysis of dynamic treatment. Marginal structural models (MSM) including them in calculating the weights; Robins suggests that including them as covariates in the outcome equation produces the “null paradox”.

Here’s my answer. A confounding variable has two characteristics–it is related to the exposure and to the outcome. When we weight with that variable, we break the link between the exposure and that variable. However, other than the portion due to the exposure, we do not eliminate the relationship between the covariate and the outcome. In that way (by not breaking both links), we avoid the bias created by the collider issue.

1. How do I know what variables to include in the numerator of the MSM weight?

Here’s my answer: I would include in the weights those variables that will be included in the analysis of the outcome. Their presence in the denominator of the weight is essentially duplicative–we’re accounting for them there and in the outcome model.

### Judea Pearl replies (updated 11/19/2009):

Your question deals with the intuition behind “Inverse Probability Weighting” (IPW), an estimation technique used in several frameworks, among them Marginal Structural Models (MSM). However, the division by the propensity score P(X=1| Z=z) or the probability of treatment X = 1 given observed covariates Z = z, is more than a step taken by one estimation technique; it is dictated by the very definition of “causal effect,” and appears therefore, in various guises, in every method of effect estimation — it is a property of Nature, not of our efforts to unveil the secrets of Nature.

Let us first see how this probability ends up in the denominator of the effect estimand, and then deal with the specifics of your question, dynamic treatment and unobserved confounders.

As always, we welcome your views on this topic. To continue the discussion, please use the comment link below to add your thoughts. You can also suggest a new topic of discussion using our submission form by clicking here.

P
(
Z
1
, X
1
=
x
1
, Z
2
, X
2
=
x
2
, Y
) =
P
(
Z
1
, X
1
=
x
1
, Z
2
, X
2
=
x
2
, Y
)
P
(
X
1
=
x
1
|
Z
1
)
P
(
X
2
=
x
2
|
Z
2
, X
1
=
x
1
)