# Causal Analysis in Theory and Practice

## August 14, 2019

### A Crash Course in Good and Bad Control

Filed under: Back-door criterion,Bad Control,Econometrics,Economics,Identification — Judea Pearl @ 11:26 pm

Carlos Cinelli, Andrew Forney and Judea Pearl

Update: check the updated and extended version of the crash course here.

## Introduction

If you were trained in traditional regression pedagogy, chances are that you have heard about the problem of “bad controls”. The problem arises when we need to decide whether the addition of a variable to a regression equation helps getting estimates closer to the parameter of interest. Analysts have long known that some variables, when added to the regression equation, can produce unintended discrepancies between the regression coefficient and the effect that the coefficient is expected to represent. Such variables have become known as “bad controls”, to be distinguished from “good controls” (also known as “confounders” or “deconfounders”) which are variables that must be added to the regression equation to eliminate what came to be known as “omitted variable bias” (OVB).

Recent advances in graphical models have produced a simple criterion to distinguish good from bad controls, and the purpose of this note is to provide practicing analysts a concise and visible summary of this criterion through illustrative examples. We will assume that readers are familiar with the notions of “path-blocking” (or d-separation) and back-door paths. For a gentle introduction, see d-Separation without Tears

In the following set of models,  the target of the analysis is the average causal effect (ACE) of a treatment X on an outcome Y, which stands for the expected increase of Y per unit of a controlled increase in X. Observed variables will be designated by black dots and unobserved variables by white empty circles. Variable Z (highlighted in red) will represent the variable whose inclusion in the regression is to be decided, with “good control” standing for bias reduction, “bad control” standing for bias increase and “netral control” when the addition of Z does not increase nor reduce bias. For this last case, we will also make a brief remark about how Z could affect the precision of the ACE estimate.

## Models

Models 1, 2 and 3 – Good Controls

In model 1,  Z stands for a common cause of both X and Y. Once we control for Z, we block the back-door path from X to Y, producing an unbiased estimate of the ACE.

In models 2 and 3, Z is not a common cause of both X and Y, and therefore, not a traditional “confounder” as in model 1. Nevertheless, controlling for Z blocks the back-door path from X to Y due to the unobserved confounder U, and again, produces an unbiased estimate of the ACE.

Models 4, 5 and 6 – Good Controls

When thinking about possible threats of confounding, one needs to keep in mind that common causes of X and any mediator (between X and Y) also confound the effect of X on Y. Therefore, models 4, 5 and 6 are analogous to models 1, 2 and 3 — controlling for Z blocks the backdoor path from X to Y and produces an unbiased estimate of the ACE.

We now encounter our first “bad control”. Here Z is correlated with the treatment and the outcome and it is also a “pre-treatment” variable. Traditional econometrics textbooks would deem Z a “good control”. The backdoor criterion, however, reveals that Z is a “bad control”. Controlling for Z will induce bias by opening the backdoor path X ← U1→ Z← U2→Y, thus spoiling a previously unbiased estimate of the ACE.

Model 8 – Neutral Control (possibly good for precision)

Here Z is not a confounder nor does it block any backdoor paths. Likewise, controlling for Z does not open any backdoor paths from X to Y. Thus, in terms of bias, Z is a “neutral control”. Analysis shows, however, that controlling for Z reduces the variation of the outcome variable Y, and helps improve the precision of the ACE estimate in finite samples.

Model 9 – Neutral control (possibly bad for precision)

Similar to the previous case, here Z is “neutral” in terms of bias reduction. However, controlling for Z will reduce the variation of treatment variable X and so may hurt the precision of the estimate of the ACE in finite samples.

We now encounter our second “pre-treatment” “bad control”, due to a phenomenon called “bias amplification” (read more here). Naive control for Z in this model will not only fail to deconfound the effect of X on Y, but, in linear models, will amplify any existing bias.

Models 11 and 12 – Bad Controls

If our target quantity is the ACE, we want to leave all channels through which the causal effect flows “untouched”.

In Model 11, Z is a mediator of the causal effect of X on Y. Controlling for Z will block the very effect we want to estimate, thus biasing our estimates.

In Model 12, although Z is not itself a mediator of the causal effect of X on Y, controlling for Z is equivalent to partially controlling for the mediator M, and will thus bias our estimates.

Models 11 and 12 violate the backdoor criterion, which excludes controls that are descendants of the treatment along paths to the outcome.

Model 13 – Neutral control (possibly good for precision)

At first look, model 13 might seem similar to model 12, and one may think that adjusting for Z would bias the effect estimate, by restricting variations of the mediator M. However, the key difference here is that Z is a cause, not an effect, of the mediator (and, consequently, also a cause of Y). Thus, model 13 is analogous to model 8, and so controlling for Z will be neutral in terms of bias and may increase precision of the ACE estimate in finite samples.

Model 14 – Neutral controls (possibly helpful in the case of selection bias)

Contrary to econometrics folklore, not all “post-treatment” variables are inherently bad controls. In models 14 and 15 controlling for Z does not open any confounding paths between X and Y. Thus, Z is neutral in terms of bias. However, controlling for Z does reduce the variation of the treatment variable X and so may hurt the precision of the ACE estimate in finite samples. Additionally, in model 15, suppose one has only samples with W = 1 recorded (a case of selection bias). In this case, controlling for Z can help obtaining the W-specific effect of X on Y, by blocking the colliding path due to W.

Contrary to Models 14 and 15, here controlling for Z is no longer harmless, since it opens the backdoor path X → Z ← U → Y and so biases the ACE.

Here, Z is not a mediator, and one might surmise that, as in Model 14, controlling for Z is harmless. However, controlling for the effects of the outcome Y will induce bias in the estimate of the ACE, making Z a “bad control”. A visual explanation of this phenomenon using “virtual colliders” can be found here.

Model 17 is usually known as a “case-control bias” or “selection bias”. Finally, although controlling for Z will generally bias numerical estimates of the ACE, it does have an exception when X has no causal effect on Y. In this scenario, X is still d-separated from Y even after conditioning on Z. Thus, adjusting for Z is valid for testing whether the effect of X on Y is zero.

## August 13, 2019

### Lord’s Paradox: The Power of Causal Thinking

Filed under: Uncategorized — Judea Pearl @ 9:41 pm

Background

This post aims to provide further insight to readers of “Book of Why” (BOW) (Pearl and Mackenzie, 2018) on Lord’s paradox and the simple way this decades-old paradox was resolved when cast in causal language. To recap, Lord’s paradox (Lord, 1967; Pearl, 2016) involves two statisticians, each using what seems to be a reasonable strategy of analysis, yet reaching opposite conclusions when examining the data shown in Fig. 1 (a) below.

Figure 1: Wainer and Brown’s revised version of Lord’s paradox and the corresponding causal diagram.

The story, in the form described by Wainer and Brown (2017) reads:

“A large university is interested in investigating the effects on the students of the diet provided in the university dining halls …. Various types of data are gathered. In particular, the weight of each student at the time of his arrival in September and his weight the following June (WF) are recorded.”

The first statistician (named John) looks at the weight gains associated with the two dining halls, find them equally distributed, and naturally concludes that Diet has no effect on Gain.  The second statistician (named Jane) uses the initial weight (WI) as a covariate and finds that, for every level of WI, the final weight (WF) distribution for Hall B is shifted above that of Hall A. Thus concluding Diet has an effect on Gain. Who is right?

The Book of Why resolved this paradox using causal analysis. First, noting that at issue is “the effect of Diet on weight Gain”, a causal model is postulated, in the form of the diagram of Fig. 1(b). Second, noting the WIis the only confounder of Diet and Gain, Jane was declared “unambiguously correct” and John “incorrect”.

The Critics

The simplicity of this solution invariably evokes skepticism among statisticians. “But how can we be sure of the diagram?” they ask. This kind of skepticism is natural since, statisticians are not trained in postulating causal assumptions, that is, assumptions that cannot be articulated in the language of mainstream statistics, and cannot therefore be tested using the available data.  However, after reminding the critics that the contention  between John and Jane surrounds the notion of “effect”, and that “effect” is a causal, not statistical notion, enlightened statisticians accept the idea that diagrams need to be drawn and that the one in Fig. 1(b) is reasonable; its main assumptions are: Diet does not affect the initial weight and the initial weight is the only factor affecting both Diet and final weight.

A series of recent posts by S. Senn, however, introduced a new line of criticism into our story (Senn, 2019). It focuses on the process by which the data of Fig. 1(a) was generated, and invokes RCT considerations such as block design, experiments with many halls, analysis of variance, standard errors, and more. Statisticians among my Twitter followers “liked” Senn’s critiques and I am not sure whether they were convinced by my argument that Lord’s paradox has nothing to do with  experimental procedures. In other words, the conflict between John and Jane persists even when the data is generated by clean and un-complicated process, as the one depicted in Fig. 1(b).

Senn’s critiques can be summarized thus (quoted):

“I applied John Nedler’s experimental calculus [5, 6] … and came to the conclusion that the second statistician’s solution is only correct given an untestable assumption and that even if the assumption were correct and hence the estimate were appropriate, the estimated standard error would almost certainly be wrong.”

My response was:

Lord’s paradox is about causal effects of Diet. In your words: “diet has no effect” according to John and “diet does have an effect” according to Jane. We know that, inevitably, every analysis of “effects” must rely on causal, hence “untestable assumptions”. So BOW did a superb job in calling the attention of analysts to the fact that the nature of Lord’s paradox is causal, hence outside the province of mainstream statistical analysis. This explains why I agree with your conclusion that “the second statistician’s solution is only correct given an untestable assumption”. Had you concluded that we can decide who is correct without relying on “an untestable assumption”, you and Nelder would have been the first mortals to demonstrate the impossible, namely, that assumption-free correlation does imply causation.

Now let me explain why your last conclusion also attests to the success of BOW. You conclude: “even if the assumption were correct, … the estimated standard error would almost certainly be wrong.”

The beauty of Lord’s paradox is that it demonstrates the surprising clash between John and Jane in purely qualitative terms, with no appeal to numbers, standard errors, or confidence intervals. Luckily, the surprising clash persists in the asymptotic limit where Lord’s ellipses represent infinite samples, tightly packed into those two elliptical clouds.

Some people consider this asymptotic abstraction to be a “limitation” of graphical models. I consider it a blessing and a virtue, enabling us, again, to separate things that matter (clash over causal effects) from those that don’t (sample variability, standard errors, p-values etc.). More generally, it permits us to separate issues of estimation, that is, going from samples to distributions, from those of identification, that is, going from distributions to cause-effect relationships. BOW goes to great length explaining why this last stage presented an insurmountable hurdle to analysts lacking the appropriate language of causation.

Note that BOW declares Jane to be “unambiguously correct” in the context of the causal assumptions displayed in the diagram (Fig.1 (b)) where Diet is shown NOT to influence initial weight, and the initial weight is shown to be the (only) factor that makes students prefer one diet or another. Changing these assumptions may lead to another problem and another resolution but, once we agree with the assumptions our choice of Jane as the correct statistician is “unambiguously correct”

As an example (requested on Twitter) if dining halls have their own effect on weight gain (say Hall-A provides free weight-watching instructions to diners) our model will change as depicted in Fig 2. In this setup, Wis no longer a sole confounder and both Wand Hall need to be adjusted to obtain the effect of Diet on Gain. In other words, Jane will no longer be “correct” unless she analyzes each stratum of the Diet-Hall combination and finds preference of Diet-A over Diet-B.

Figure 2:  Separating Diet from Hall in Lord’s Story

New Insights

The upsurge of interest in Lord’s paradox gives me an opportunity to elaborate on another interesting aspect of our Diet-weight model, Fig. 1.

Having concluded that Statistician-2 (Jane) is “unambiguously correct” and that Statistician-1 (John) is wrong, an astute reader would ask: “And what about the sure-thing principle? Isn’t the overall gain just an average of the stratum-specific gains?” (where each stratum represents a level of the initial weight WI). Previously, in the original version of the paradox (Fig. 6.8 of BOW) we dismissed this intuition by noting that Wwas affected by the causal variable (Sex) but, now, with the arrow pointing from Wto we can no longer use this argument. Indeed, the diagram tells us (using the back-door criterion) that the causal effect of on can be obtained by adjusting for the (only) confounder, WI, yielding:

P(Y|do(Diet)) = ∑WIP(Y|Diet,WI) P(WI)

In other words, the overall gain resulting from administering a given diet to everyone is none other but the gain observed in a given diet-weight group, averaged over the weight. How is it possible then for the latter to be positive (as seen from the shifted ellipses) and, simultaneously, for the former to be zero (as seen by the perfect alignment of the ellipses along the W= Wline)

One would be tempted to suggest that data matching the ellipses of Fig 6.9(a) can never be generated by the model of Fig. 6.9(b) , in which WIis the only confounder? But this could not possibly be the case, because we know that the model has no refuting implications, so it cannot be refuted by the position of the two ellipses.

The answer is that the sure-thing principle applies to causal effects, not to statistical associations. The perfect alignment of the ellipses does not mean that the effect of Diet on Gain is zero; it means only that the Gain is statistically independent of Diet:

P(Gain|Diet=A) = P(Gain|Diet=B)

not that Gain is causally unaffected by Diet. In other words, the equality above does not imply the equality

P(Gain|do(Diet=A)) = P(Gain|do(Diet=B))

which statistician-1 (John) wants us to believe.

Our astute student will of course question this explanation and, pointing to Fig. 1(b), will ask: How can Gain be independent of Diet when the diagram shows them connected? The answer is that the three paths connecting Diet and Gain cancel each other in such a way that an overall independence shows up in the data,

Conclusions

Lord’s paradox starts with a clash between two strong intuitions: (1) To get the effect we want, we must make “proper allowances” for uncontrolled preexisting differences between groups” (i.e. initial weights) and (2) The overall effect (of Diet on Gain) is just the average of the stratum-specific effects. Like the bulk of human intuitions, these two are CAUSAL. Therefore, to reconcile the apparent clash between them we need a causal language; statistics alone won’t do.

The difficulties that generations of statisticians have had in resolving this apparent clash stem from lacking a formal language to express the two intuitions as well as the conditions under which they are applicable. Missing were: (1) A calculus of “effects” and its associated causal sure-thing principle and (2) a criterion (back door) for deciding when “proper allowances for preexisting conditions” is warranted. We are now in possession of these two ingredients,  and we should enjoy the power of causal analysis to resolve this paradox, which generations of statisticians have found intriguing, if not vexing. We should also feel empowered to resolve all the paradoxes that surface from the causation-association confusion  that our textbooks have bestowed upon us.

References

Lord, F.M. “A paradox in the interpretation of group comparisons,” Psychological Bulletin, 68(5):304-305, 1967.

Pearl, J. “Lord’s Paradox Revisited — (Oh Lord! Kumbaya!)”, Journal of Causal Inference, Causal, Casual, and Curious Section, 4(2), September 2016. https://ftp.cs.ucla.edu/pub/stat_ser/r436.pdf

Pearl, J. and Mackenzie, D. Book of Why, NY: Basic Books, 2018. http://bayes.cs.ucla.edu/WHY/

Senn, S. “Red herrings and the art of cause fishing: Lord’s Paradox revisited” (Guest post) August 2, 2019. https://errorstatistics.com/2019/08/02/s-senn-red-herrings-and-the-art-of-cause-fishing-lords-paradox-revisited-guest-post/

Wainer and Brown, L.M., “Three statistical paradoxes in the interpretation of group differences: Illustrated with medical school admission and licensing data,” in C.R. Rao and S. Sinharay (Eds.), Handbook of Statistics 26: Psychometrics, North Holland: Elsevier B.V., pp. 893-918, 2007.