Causal Analysis in Theory and Practice

August 14, 2019

A Crash Course in Good and Bad Control

Filed under: Back-door criterion,Bad Control,Econometrics,Economics,Identification — Judea Pearl @ 11:26 pm

Carlos Cinelli, Andrew Forney and Judea Pearl

Update: check the updated and extended version of the crash course here.


If you were trained in traditional regression pedagogy, chances are that you have heard about the problem of “bad controls”. The problem arises when we need to decide whether the addition of a variable to a regression equation helps getting estimates closer to the parameter of interest. Analysts have long known that some variables, when added to the regression equation, can produce unintended discrepancies between the regression coefficient and the effect that the coefficient is expected to represent. Such variables have become known as “bad controls”, to be distinguished from “good controls” (also known as “confounders” or “deconfounders”) which are variables that must be added to the regression equation to eliminate what came to be known as “omitted variable bias” (OVB).

Recent advances in graphical models have produced a simple criterion to distinguish good from bad controls, and the purpose of this note is to provide practicing analysts a concise and visible summary of this criterion through illustrative examples. We will assume that readers are familiar with the notions of “path-blocking” (or d-separation) and back-door paths. For a gentle introduction, see d-Separation without Tears

In the following set of models,  the target of the analysis is the average causal effect (ACE) of a treatment X on an outcome Y, which stands for the expected increase of Y per unit of a controlled increase in X. Observed variables will be designated by black dots and unobserved variables by white empty circles. Variable Z (highlighted in red) will represent the variable whose inclusion in the regression is to be decided, with “good control” standing for bias reduction, “bad control” standing for bias increase and “netral control” when the addition of Z does not increase nor reduce bias. For this last case, we will also make a brief remark about how Z could affect the precision of the ACE estimate.


Models 1, 2 and 3 – Good Controls 

In model 1,  Z stands for a common cause of both X and Y. Once we control for Z, we block the back-door path from X to Y, producing an unbiased estimate of the ACE. 

In models 2 and 3, Z is not a common cause of both X and Y, and therefore, not a traditional “confounder” as in model 1. Nevertheless, controlling for Z blocks the back-door path from X to Y due to the unobserved confounder U, and again, produces an unbiased estimate of the ACE.

Models 4, 5 and 6 – Good Controls

When thinking about possible threats of confounding, one needs to keep in mind that common causes of X and any mediator (between X and Y) also confound the effect of X on Y. Therefore, models 4, 5 and 6 are analogous to models 1, 2 and 3 — controlling for Z blocks the backdoor path from X to Y and produces an unbiased estimate of the ACE.

Model 7 – Bad Control

We now encounter our first “bad control”. Here Z is correlated with the treatment and the outcome and it is also a “pre-treatment” variable. Traditional econometrics textbooks would deem Z a “good control”. The backdoor criterion, however, reveals that Z is a “bad control”. Controlling for Z will induce bias by opening the backdoor path X ← U1→ Z← U2→Y, thus spoiling a previously unbiased estimate of the ACE.

Model 8 – Neutral Control (possibly good for precision)

Here Z is not a confounder nor does it block any backdoor paths. Likewise, controlling for Z does not open any backdoor paths from X to Y. Thus, in terms of bias, Z is a “neutral control”. Analysis shows, however, that controlling for Z reduces the variation of the outcome variable Y, and helps improve the precision of the ACE estimate in finite samples.

Model 9 – Neutral control (possibly bad for precision)

Similar to the previous case, here Z is “neutral” in terms of bias reduction. However, controlling for Z will reduce the variation of treatment variable X and so may hurt the precision of the estimate of the ACE in finite samples.  

Model 10 – Bad control

We now encounter our second “pre-treatment” “bad control”, due to a phenomenon called “bias amplification” (read more here). Naive control for Z in this model will not only fail to deconfound the effect of X on Y, but, in linear models, will amplify any existing bias.

Models 11 and 12 – Bad Controls

If our target quantity is the ACE, we want to leave all channels through which the causal effect flows “untouched”.

In Model 11, Z is a mediator of the causal effect of X on Y. Controlling for Z will block the very effect we want to estimate, thus biasing our estimates. 

In Model 12, although Z is not itself a mediator of the causal effect of X on Y, controlling for Z is equivalent to partially controlling for the mediator M, and will thus bias our estimates.

Models 11 and 12 violate the backdoor criterion, which excludes controls that are descendants of the treatment along paths to the outcome.

Model 13 – Neutral control (possibly good for precision)

At first look, model 13 might seem similar to model 12, and one may think that adjusting for Z would bias the effect estimate, by restricting variations of the mediator M. However, the key difference here is that Z is a cause, not an effect, of the mediator (and, consequently, also a cause of Y). Thus, model 13 is analogous to model 8, and so controlling for Z will be neutral in terms of bias and may increase precision of the ACE estimate in finite samples.

Model 14 – Neutral controls (possibly helpful in the case of selection bias)

Contrary to econometrics folklore, not all “post-treatment” variables are inherently bad controls. In models 14 and 15 controlling for Z does not open any confounding paths between X and Y. Thus, Z is neutral in terms of bias. However, controlling for Z does reduce the variation of the treatment variable X and so may hurt the precision of the ACE estimate in finite samples. Additionally, in model 15, suppose one has only samples with W = 1 recorded (a case of selection bias). In this case, controlling for Z can help obtaining the W-specific effect of X on Y, by blocking the colliding path due to W.

Model 16 – Bad control

Contrary to Models 14 and 15, here controlling for Z is no longer harmless, since it opens the backdoor path X → Z ← U → Y and so biases the ACE.

Model 17 – Bad Control

Here, Z is not a mediator, and one might surmise that, as in Model 14, controlling for Z is harmless. However, controlling for the effects of the outcome Y will induce bias in the estimate of the ACE, making Z a “bad control”. A visual explanation of this phenomenon using “virtual colliders” can be found here.

Model 17 is usually known as a “case-control bias” or “selection bias”. Finally, although controlling for Z will generally bias numerical estimates of the ACE, it does have an exception when X has no causal effect on Y. In this scenario, X is still d-separated from Y even after conditioning on Z. Thus, adjusting for Z is valid for testing whether the effect of X on Y is zero.

January 22, 2015

Flowers of the First Law of Causal Inference (2)

Flower 2 — Conditioning on post-treatment variables

In this 2nd flower of the First Law, I share with readers interesting relationships among various ways of extracting information from post-treatment variables. These relationships came up in conversations with readers, students and curious colleagues, so I will present them in a question-answers format.

Rule 2 of do-calculus does not distinguish post-treatment from pre-treatment variables. Thus, regardless of the nature of Z, it permits us to replace P (y|do(x), z) with P (y|x, z) whenever Z separates X from Y in a mutilated graph GX (i.e., the causal graph, from which arrows emanating from X are removed). How can this rule be correct, when we know that one should be careful about conditioning on a post treatment variables Z?

Example 1 Consider the simple causal chain X → Y → Z. We know that if we condition on Z (as in case control studies) selected units cease to be representative of the population, and we cannot identify the causal effect of X on Y even when X is randomized. Applying Rule-2 however we get P (y|do(x), z) = P (y|x, z). (Since X and Y are separated in the mutilated graph X Y → Z). This tells us that the causal effect of X on Y IS identifiable conditioned on Z. Something must be wrong here.

To read more, click here.

November 30, 2009

Measurement Cost and Estimator’s Variance

Sander Greenland from UCLA writes:

The machinery in your book addresses only issues of identification and unbiasedness. Of equal concern for practice is variance, which comes to the fore when (as usual) one has a lot of estimators with similar bias to choose from, for within that set of estimators the variance becomes the key driver of expected loss (usually taken as MSE (mean-squared-error = variance+bias^2). Thus for example you may identify a lot of (almost-) sufficient subsets in a graph; but the minimum MSE attainable with each may span an order of magnitude. On top of that, the financial costs of obtaining each subset may span orders of magnitudes. So your identification results, while important and useful, are just a start on working out which variables to spend the money to measure and adjust for. The math of the subsequent MSE and cost considerations is harder, but no less important.

Judea Pearl replies:

You are absolutely right, it is just a start, as is stated in Causality page 95. The reason I did not  emphasize the analysis of variance in this book was my assumption that, after a century of extremely fruitful statistical research, one would have little to add to this area.

My hypothesis was:

Once we identify a causal parameter, and produce an estimand of that parameter in closed mathematical form, a century of statistical research can be harnessed to the problem, and render theestimation task a routine exercise in data analysis. Why spend energy on areas well researched when so much needs to be done in areas of neglect?

However, the specific problem you raised, that of choosing among competing sufficient sets, happens to be one that Tian, Paz and Pearl (1998) did tackle and solved. See Causality page 80, reading: “The criterion also enable the analyst to search for an optimal set of covariates — a set Z that minimizes measurement cost or sampling variability (Tian et al, 1998).” [Available at] By “solution”, I mean of course, an analytical solution, assuming that cost is additive and well defined for each covariate. The paper provides a polynomial time algorithm that identifies the minimal (or minimum cost) sets of nodes that d-separates two nodes in a graph. When applied to a graph purged of outgoing arrows from the treatment node, the algorithm will enumerate all minimal sufficient sets, i.e., sets of measurements that de-confound the causal relation between treatment and outcome.

Readers who deem such an algorithm useful, should have no difficulty implementing it from the description given in the paper; the introduction of variance considerations though would require some domain-specific expertise.

May 4, 2008

Alternative Proof of the Back-Door Criterion

Filed under: Back-door criterion — judea @ 6:00 pm

Consider a Markovian model [tex]$G$[/tex] in which [tex]$T$[/tex] stands for the set of parents of [tex]$X$[/tex].  From [tex]{em Causality}[/tex], Eq.~(3.13), we know that the causal effect of [tex]$X$[/tex] on [tex]$Y$[/tex] is given by

[tex]begin{equation} P(y|hat{x}) = sum_{t in T} P(y|x,t) P(t) %% eq 1  label{ch11-eq-a} end{equation}[/tex] (1).

Now assume some members of [tex]$T$[/tex] are unobserved, and we seek another set [tex]$Z$[/tex] of observed variables, to replace [tex]$T$[/tex] so that

[tex]begin{equation} P(y|hat{x}) = sum_{z in Z} P(y|x,Z) P(z) %% eq 2  label{ch11-eq-b} end{equation}[/tex] (2).

It is easily verified that (2) follow from (1) if [tex]$Z$[/tex] satisfies:


Indeed, conditioning on [tex]$Z$[/tex], ([tex]$i$[/tex]) permits us to rewrite (1) as [tex][ P(y|hat{x}) = sum_{t} P(t) sum_z P(y|z,x) P(z|t,x) ][/tex] and ([tex]$ii$[/tex]) further yields [tex]$P(z|t,x)=P(z|t)$[/tex] from which (2) follows. It is now a purely graphical exercize to prove that the back-door criterion implies ([tex]$i$[/tex]) and ([tex]$ii$[/tex]). Indeed, ([tex]$ii$[/tex]) follows directly from the fact that [tex]$Z$[/tex] consists of nondescendants of [tex]$X$[/tex], while the blockage of all back-door path by [tex]$Z$[/tex] implies  , hence ([tex]$i$[/tex]). This follows from observing that any path from [tex]$Y$[/tex] to [tex]$T$[/tex] in [tex]$G$[/tex] that is unblocked by [tex]${X,Z}$[/tex] can be extended to a back-door path from [tex]$Y$[/tex] to [tex]$X$[/tex], unblocked by [tex]$Z$[/tex].

February 22, 2007

Back-door criterion and epidemiology

Filed under: Back-door criterion,Book (J Pearl),Epidemiology — moderator @ 9:03 am

The definition of the back-door condition (Causality, page 79, Definition 3.3.1) seems to be contrived. The exclusion of descendants of X (Condition (i)) seems to be introduced as an after fact, just because we get into trouble if we dont. Why cant we get it from first principles; first define sufficiency of Z in terms of the goal of removing bias and, then, show that, to achieve this goal, you neither want nor need descendants of X in Z.

Powered by WordPress