Causal Analysis in Theory and Practice

January 10, 2018

Facts and Fiction from the “Missing Data Framework”

Filed under: Missing Data — Judea Pearl @ 9:15 am

Last month, Karthika Mohan and I received a strange review from a prominent Statistical Journal. Among other comments, we found the following two claims about a conception called “missing data framework.”

Claim-1: “The role of missing data analysis in causal inference is well understood (eg causal inference theory based on counterfactuals relies on the missing data framework).
Claim-2: “While missing data methods can form tools for causal inference, the converse cannot be true.”

I am sure that you have seen similar claims made in the literature, in lecture notes, in reviews of technical papers, or informal conversations in the cafeteria. Oddly, based on everything that we have read and researched about missing data we came to believe that both statements are false. Still, these claims are being touted widely, routinely, and  unabashedly, with only scattered attempts to explicate their content in open discussions.

Below, we venture to challenge the two claims, hoping to elicit your comments, and to come to some understanding of what actually is meant by the phrase “missing data framework;” what is being “framed” and what remains “un-framed.”

Challenging Claim-1

It is incorrect to suppose that the role of missing data analysis in causal inference is “well understood.” Quite the opposite. Researchers adhering to missing data analysis invariably invoke an ad-hoc assumption called “conditional ignorability,” often decorated as “ignorable treatment assignment mechanism”, which is far from being “well understood” by those who make it, let alone those who need to judge its plausibility.

For readers versed in graphical modeling, “conditional ignorability” is none other than the back-door criterion that students learn in the second class on causal inference, and which “missing-data” advocates have vowed to avoid at all cost. As we know, this criterion can easily be interpreted and verified when background knowledge is presented in graphical form but, as you can imagine, it turns into a frightening enigma for those who shun the light of graphs. Still, the simplicity of reading this criterion off a graph makes it easy to test whether those who rely heavily on ignorability assumptions know what they are assuming. The results of this test are discomforting.

Marshall Joffe, at John Hopkins University, summed up his frustration with the practice and “understanding” of ignorability in these words: “Most attempts at causal inference in observational studies are based on assumptions that treatment assignment is ignorable. Such assumptions are usually made casually, largely because they justify the use of available statistical methods and not because they are truly believed.” [Joffe, etal 2010, “Selective Ignorability Assumptions in Causal Inference,” The International Journal of Biostatistics: Vol. 6: Iss. 2, Article 11.  DOI: 10.2202/1557-4679.1199 Available at: ]

My personal conversations with leaders of the missing data approach to causation (these include seasoned researchers, educators and prolific authors) concluded with an even darker picture. None of those leaders was able to take a toy-example of 3-4 variables and determine whether conditional ignorability holds in the examples presented. It is not their fault, or course; determining
conditional ignorability is a hard cognitive and computational task that ordinary mortals cannot accomplish in their head, without the aids of graphs. (I base this assertion both on first-hand experience with students and colleagues and on intimate familiarity with issues of problem complexity and cognitive loads.)

Unfortunately, the mantra: “missing data analysis in causal inference is well understood” continues to be chanted at an ever increasing intensity, building faith among the faithful, and luring chanters to assume ignorability as self evident. Worse yet, the mantra blinds researchers from seeing how an improved level of understanding can emerge by abandoning the missing-data prism altogether, and conducting causal analysis in its natural habitat, using scientific models of reality rather than unruly patterns of missingness in the data.

A typical example of this trend is a recent article by Ding and Fan titled: “Causal Inference: A missing data perspective”.
Sure enough, already on the ninth line of the abstract, the authors assume away non-ignorable treatments and, then, having  reached the safety zone of classical statistics, launch statistical estimation exercises on a variety of estimands. This creates the impression that “missing data perspective” is sufficient for  conducting “causal inference” when, in fact, the entire analysis rests on the assumption of ignorability, the one assumption that the missing data perspective lacks the tools to address.

The second part of Claim-1 is equally false: “causal inference theory based on counterfactuals relies on the missing data framework”. This may be true for the causal inference theory developed
by Rubin (1974) and expanded in Imbens and Rubin book (2015), but certainly not for the causal inference theory developed in (Pearl, 2000 2009) which is also based on counterfactuals, yet in no way relies on “the missing data framework”. On the contrary, page after page of (Pearl, 2000, 2009) emphasizes that counterfactuals are natural derivatives of the causal model used, and do not
require the artificial interpolation tools (eg imputations or matching) advocated by the missing data paradigm. Indeed, model-blind imputation can be shown to invite disasters in the class of “non ignorable” problems, something that is rarely acknowledged in the imputation-addicted literature. The very idea that certain parameters are not estimable, regardless of how clever the imputation is foreign to the missing data way of thinking. The same goes for the idea that some parameters are estimable while others are not.

In the past five years, we have done extensive reading into the missing data literature. [For a survey, see:] It has become clear to us that this framework falls short of addressing three fundamental problems of modern causal analysis (1) To find if there exist sets of covariates that render treatments “ignorable”, (2) To estimate causal effects in cases where such sets do not exist, and (3) To decide if one’s modeling assumptions are compatible with the observed data.

It takes a theological leap of faith to imagine that a framework avoiding these fundamental problems can serve as an intellectual basis for a general theory of causal inference, a theory that has tackled those problems head on, and successfully so. Causal inference theory has advanced significantly beyond this stage – nonparametric estimability conditions have been established for causal and counterfactual relationships in both ignorable and non-ignorable problems. Can a framework bound to ignorability assumptions serve as a basis for one that has emancipated itself from such assumptions? We doubt it.

Challenging Claim 2.

We come now to claim (2), concerning the possibility of causality-free interpretation of missing data problems. It is possible indeed to pose a missing data problem in purely statistical terms, totally void of “missingness mechanism” vocabulary, void even of conditional independence assumptions. But this is rarely done, because the answer is trivial: none of the parameters of interest would be estimable without such assumptions (i.e, the likelihood function is flat). In theory, one can argue that there is really nothing causal about “missingness mechanism” as conceptualized by Rubin (1976), since it is defined in terms of conditional independence relations, a purely statistical notion that requires no reference to causation.

Not quite! The conditional independence relations that define missingness mechanisms are fundamentally different from those invoked in standard statistical analysis. In standard statistics, independence assumptions are presumed to hold in the distribution that governs the observed data, whereas in missing-data problems, the needed independencies are assumed to hold in the distribution of variables which are only partially observed. In other words, the independence assumptions invoked in missing data analysis are necessarily judgmental, and only rarely do they have
testable implications in the available data. [Fully developed in:]

This behooves us to ask what kind of knowledge is needed for making reliable conditional independence judgments about a specific, yet partially observed problem domain. The graphical models literature has an unambiguous answer to this question: our judgment about statistical dependencies stems from our knowledge about causal dependencies, and the latters are organized in graphical form. The non-graphical literature has thus far avoided this question, presumably because it is a psychological issue that resides outside the scope of statistical analysis.

Psychology or not, the evidence from behavioral sciences is overwhelming that judgments about statistical dependence emanate from causal intuition. [see D. Kahneman “Thinking, Fast and Slow”
Chapter 16: Causes Trump Statistics]

In light of these considerations we would dare call for re-examination of the received mantra: 2.  “while missing data methods can form tools for causal inference, the converse cannot be true.” and reverse it, to read:

2′.  “while causal inference methods provide tools for solving missing data problems, the converse cannot be true.”

We base this claim on the following observations: 1. The assumptions needed to define the various types of missing data mechanisms are causal in nature. Articulating those assumption in causal vocabulary is natural, and results therefore in model transparency and credibility. 2. Estimability analysis based on causal modeling of missing data problems has charted new territories, including problems in the MNAR category (ie, Missing Not At Random), which were inaccessible to conventional missing-data analysis. In comparison, imputation-based approaches to missing data
do not provide guarantees of convergence (to consistent estimates) except for the narrow and unrecognizable class of problems in which ignorability holds. 3. Causal modeling of missing data problems has uncovered new ways of testing assumptions, which are infeasible in conventional missing-data analysis.

Perhaps even more convincingly, we were able to prove that no algorithm exists which decides if a parameter is estimable, without examining the causal structure of the model; statistical information is insufficient.

We hope these arguments convince even the staunchest missing data enthusiast to switch mantras and treat missing data problems for what they are: causal inference problems.

Judea Pearl, UCLA,
Karthika Mohan, UC Berkeley

1 Comment »

  1. Thank you for bringing graph-based research on missing data to my attention! If I understood correctly, the results stated in the paper “Graphical Models for Processing Missing” are related to what survey practitioners call “item non-response”, i.e. some respondents fail to answer some questions (as in the example presented in Table 1). In survey research there is, however, also the problem of so-called “unit non-response”, i.e. respondents are unwilling to take part in the survey altogether. “Unit non-response” might also be missing completely at random, at random, or not at random, which might have severe consequences for the estimates based on such a survey.

    Survey practitioners usually tackle this problem by weighting the data (so-called response weights are created by predicting the response probability based in the scare information available on non-respondents as well as adhoc assumptions of this missingness-mechanism). This task is obviously very challenging because there usually is only little information regarding the individuals not participating in the survey. So I was wondering if and how m-graphs could be used to represent this second kind of missing data? In particular, how can assumptions on whether “unit non response” is MCAR, MAR, or MNAR be encoded in a m-graph? And would any of the results on recoverability and testability stated in the paper translate to this second problem?

    Comment by kg — February 5, 2018 @ 2:21 pm

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress