Causal Analysis in Theory and Practice

January 22, 2015

Flowers of the First Law of Causal Inference (2)

Flower 2 — Conditioning on post-treatment variables

In this 2nd flower of the First Law, I share with readers interesting relationships among various ways of extracting information from post-treatment variables. These relationships came up in conversations with readers, students and curious colleagues, so I will present them in a question-answers format.

Question-1
Rule 2 of do-calculus does not distinguish post-treatment from pre-treatment variables. Thus, regardless of the nature of Z, it permits us to replace P (y|do(x), z) with P (y|x, z) whenever Z separates X from Y in a mutilated graph GX (i.e., the causal graph, from which arrows emanating from X are removed). How can this rule be correct, when we know that one should be careful about conditioning on a post treatment variables Z?

Example 1 Consider the simple causal chain X → Y → Z. We know that if we condition on Z (as in case control studies) selected units cease to be representative of the population, and we cannot identify the causal effect of X on Y even when X is randomized. Applying Rule-2 however we get P (y|do(x), z) = P (y|x, z). (Since X and Y are separated in the mutilated graph X Y → Z). This tells us that the causal effect of X on Y IS identifiable conditioned on Z. Something must be wrong here.

To read more, click here.

November 30, 2009

Measurement Cost and Estimator’s Variance

Sander Greenland from UCLA writes:

The machinery in your book addresses only issues of identification and unbiasedness. Of equal concern for practice is variance, which comes to the fore when (as usual) one has a lot of estimators with similar bias to choose from, for within that set of estimators the variance becomes the key driver of expected loss (usually taken as MSE (mean-squared-error = variance+bias^2). Thus for example you may identify a lot of (almost-) sufficient subsets in a graph; but the minimum MSE attainable with each may span an order of magnitude. On top of that, the financial costs of obtaining each subset may span orders of magnitudes. So your identification results, while important and useful, are just a start on working out which variables to spend the money to measure and adjust for. The math of the subsequent MSE and cost considerations is harder, but no less important.

Judea Pearl replies:

You are absolutely right, it is just a start, as is stated in Causality page 95. The reason I did not  emphasize the analysis of variance in this book was my assumption that, after a century of extremely fruitful statistical research, one would have little to add to this area.

My hypothesis was:

Once we identify a causal parameter, and produce an estimand of that parameter in closed mathematical form, a century of statistical research can be harnessed to the problem, and render theestimation task a routine exercise in data analysis. Why spend energy on areas well researched when so much needs to be done in areas of neglect?

However, the specific problem you raised, that of choosing among competing sufficient sets, happens to be one that Tian, Paz and Pearl (1998) did tackle and solved. See Causality page 80, reading: “The criterion also enable the analyst to search for an optimal set of covariates — a set Z that minimizes measurement cost or sampling variability (Tian et al, 1998).” [Available at http://ftp.cs.ucla.edu/pub/stat_ser/r254.pdf] By “solution”, I mean of course, an analytical solution, assuming that cost is additive and well defined for each covariate. The paper provides a polynomial time algorithm that identifies the minimal (or minimum cost) sets of nodes that d-separates two nodes in a graph. When applied to a graph purged of outgoing arrows from the treatment node, the algorithm will enumerate all minimal sufficient sets, i.e., sets of measurements that de-confound the causal relation between treatment and outcome.

Readers who deem such an algorithm useful, should have no difficulty implementing it from the description given in the paper; the introduction of variance considerations though would require some domain-specific expertise.

May 4, 2008

Alternative Proof of the Back-Door Criterion

Filed under: Back-door criterion — judea @ 6:00 pm

Consider a Markovian model [tex]$G$[/tex] in which [tex]$T$[/tex] stands for the set of parents of [tex]$X$[/tex].  From [tex]{em Causality}[/tex], Eq.~(3.13), we know that the causal effect of [tex]$X$[/tex] on [tex]$Y$[/tex] is given by

[tex]begin{equation} P(y|hat{x}) = sum_{t in T} P(y|x,t) P(t) %% eq 1  label{ch11-eq-a} end{equation}[/tex] (1).

Now assume some members of [tex]$T$[/tex] are unobserved, and we seek another set [tex]$Z$[/tex] of observed variables, to replace [tex]$T$[/tex] so that

[tex]begin{equation} P(y|hat{x}) = sum_{z in Z} P(y|x,Z) P(z) %% eq 2  label{ch11-eq-b} end{equation}[/tex] (2).

It is easily verified that (2) follow from (1) if [tex]$Z$[/tex] satisfies:

  .

Indeed, conditioning on [tex]$Z$[/tex], ([tex]$i$[/tex]) permits us to rewrite (1) as [tex][ P(y|hat{x}) = sum_{t} P(t) sum_z P(y|z,x) P(z|t,x) ][/tex] and ([tex]$ii$[/tex]) further yields [tex]$P(z|t,x)=P(z|t)$[/tex] from which (2) follows. It is now a purely graphical exercize to prove that the back-door criterion implies ([tex]$i$[/tex]) and ([tex]$ii$[/tex]). Indeed, ([tex]$ii$[/tex]) follows directly from the fact that [tex]$Z$[/tex] consists of nondescendants of [tex]$X$[/tex], while the blockage of all back-door path by [tex]$Z$[/tex] implies  , hence ([tex]$i$[/tex]). This follows from observing that any path from [tex]$Y$[/tex] to [tex]$T$[/tex] in [tex]$G$[/tex] that is unblocked by [tex]${X,Z}$[/tex] can be extended to a back-door path from [tex]$Y$[/tex] to [tex]$X$[/tex], unblocked by [tex]$Z$[/tex].

February 22, 2007

Back-door criterion and epidemiology

Filed under: Back-door criterion,Book (J Pearl),Epidemiology — moderator @ 9:03 am

The definition of the back-door condition (Causality, page 79, Definition 3.3.1) seems to be contrived. The exclusion of descendants of X (Condition (i)) seems to be introduced as an after fact, just because we get into trouble if we dont. Why cant we get it from first principles; first define sufficiency of Z in terms of the goal of removing bias and, then, show that, to achieve this goal, you neither want nor need descendants of X in Z.

Powered by WordPress