Causal Analysis in Theory and Practice

January 24, 2018

Can DAGs Do the Un-doable?

Filed under: DAGs,Discussion — Judea Pearl @ 2:32 am

The following question was sent to us by Igor Mandel:

Separation of variables with zero causal coefficients from others
Here is a problem. Imagine, we have a researcher who has some understanding of the particular problem, and this understanding is partly or completely wrong. Can DAG or other (if any) causality theory convincingly establish this fact (that she is wrong)?

To be more specific, let’s consider a simple example with kind of undisputable causal variables (described in details in https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2984045 ). One wants to estimate, how different food’s ingredients affect the energy (in calories) containing in different types of food. She takes many samples and measures different things. But she doesn’t know about existence of the fats and proteins – yet she knows, that there are carbohydrates, water and fiber. She builds a respective DAG, how she feels it should be:

From our (i.e. educated people of 21st century) standpoint the arrows from Fiber and Water to Calories have zero coefficients. But since data bear significant correlations between Calories, Water and Fiber – any regression estimates would show non-zero values for these coefficients. Is there way to say, that these non-zero values are wrong, not just quantitatively, but kind of qualitatively?
Even brighter example of what is often called “spurious correlation”. It was “statistically proven” almost 20 years ago, that storks deliver babies ( http://robertmatthews.org/wp-content/uploads/2016/03/RM-storks-paper.pdf ) – while many women still believe they do not. How to reconvince those statistically ignorant women? Or – how to strengthen their naïve, but statistically not confirmed beliefs, just looking at the data and not asking them for some babies related details? What kind of DAG may help?

My Response
This question, in a variety of settings, has been asked by readers of this blog since the beginning of the Causal Revolution. The idea that new tools are now available that can handle causal problems free of statistical dogmas has encouraged thousands of researchers to ask: Can you do this, or can you do that? The answers to such questions are often trivial, and can be obtained directly from the logic of causal inference, without the details of the question. I am not surprised however that such questions surface again, in 2018, since the foundations of causal inference are rarely emphasized in the technical literature, so they tend to be forgotten.

I will answer Igor’s question as a student of modern logic of causation.

1. Can a DAG distinguish variables with zero causal effects (on Y) from those having non-zero effects.

Of course not, no method in the world can do that without further assumption. Here is why:
The question above concerns causal relations. We know from first principle that no causal query can be answered from data alone, without causal information that lies outside the data.
QED
[It does not matter if your query is quantitative or qualitative, if you address it to a story or to a graph. Every causal query needs causal assumptions. No causes in – no causes out (N. Cartwright)]

2. Can DAG-based methods do anything more than just quit with failure?

Of course they can.

2.1 First notice that the distinction between having or not having causal effect is a property of nature, (or the data generating process), not of the model that you postulate. We can therefore ignore the diagram that Igor describes above. Now, in addition to quitting for lack of information, DAG-based methods would tell you: “If you can give me some causal information, however qualitative, I will tell you if it is sufficient or not for answering your query.” I hope readers would agree with me that this kind of an answer, though weaker than the one expected by the naïve inquirer, is much more informative than just quitting in despair.

2.2 Note also that postulating a whimsical model like the one described by Igor above has no bearing on the answer. To do anything useful in causal inference we need to start with a model of reality, not with a model drawn by a confused researcher, for whom an arrow is nothing more than “data bears significant correlation” or “regression estimates show non-zero values.”

2.3 Once you start with a postulated model of reality, DAG-based methods can be very helpful. For example, they can take your postulated model and determine which of the arrows in the model should have a zero coefficient attached to it, which should have a non-zero coefficient attached to it, and which would remain undecided till the end of time.

2.4 Moreover, assume reality is governed by model M1 and you postulate model M2, different from M1. DAG-based methods can tell you which causal query you will answer correctly and which you will
answer incorrectly. (see section 4.3 of http://ftp.cs.ucla.edu/pub/stat_ser/r459-reprint-errata.pdf ). This is nice, because it offers us a kind of sensitivity analysis: how far should reality be from your assumed model before you will start making mistakes?

2.5 Finally, DAG-based methods identify for us the testable implication of our model, so that we can test models for compatibility with data.

I am glad Igor raised the question that he did. There is a tendency to forget fundamentals, and it is healthy to rehearse them periodically.

– Judea

January 10, 2018

2018 Winter Update

Filed under: Announcement,General — Judea Pearl @ 10:07 pm

Dear friends in causality research,

Welcome to the 2018 Winter Greeting from the UCLA Causality Blog. This greeting discusses the following topics:

1.  A report is posted, on the “What If” workshop at the NIPS conference  (see December 19, 2017 post below). It discusses my presentation of: Theoretical Impediments to Machine Learning, a newly revised version of which can be viewed here. [http://ftp.cs.ucla.edu/pub/stat_ser/r475.pdf]

2. New posting: “Facts and Fiction from the Missing Data Framework”. We are inviting discussion of two familiar mantras:
Mantra-1. “The role of missing data analysis in causal inference is well understood (eg causal inference theory based on counterfactuals relies on the missing data framework).
and
Mantra-2. “while missing data methods can form tools for causal inference, the converse cannot be true.”

We explain why we believe both mantras to be false, but we would like to hear you opinion before firming up our minds.

3. A review paper is available here:
http://ftp.cs.ucla.edu/pub/stat_ser/r473-L.pdf
Titled: “Graphical Models for Processing Missing Data.” It explains and demonstrates why missing data is a causal inference problem.

4. A new page is now up, providing information on “The Book of Why”
http://bayes.cs.ucla.edu/WHY/

5. Nominations are now open for the ASA Causality in Education Award. The nomination deadline is March 1, 2018. For more information, please see
http://www.amstat.org/education/causalityprize/.

6. For those of us who were waiting patiently for the Korean translation of Primer — our long wait is finally over. The book is available now in colorful cover and in optimistic North Korean accent.
http://www.kyowoo.co.kr/02_sub/view.php?p_idx=1640&cate=0014_0017_

Don’t miss the gentlest introduction to causal inference.
http://bayes.cs.ucla.edu/PRIMER/

Enjoy, and have a productive 2018.
JP

Facts and Fiction from the “Missing Data Framework”

Filed under: Missing Data — Judea Pearl @ 9:15 am

Last month, Karthika Mohan and I received a strange review from a prominent Statistical Journal. Among other comments, we found the following two claims about a conception called “missing data framework.”

Claim-1: “The role of missing data analysis in causal inference is well understood (eg causal inference theory based on counterfactuals relies on the missing data framework).
and
Claim-2: “While missing data methods can form tools for causal inference, the converse cannot be true.”

I am sure that you have seen similar claims made in the literature, in lecture notes, in reviews of technical papers, or informal conversations in the cafeteria. Oddly, based on everything that we have read and researched about missing data we came to believe that both statements are false. Still, these claims are being touted widely, routinely, and  unabashedly, with only scattered attempts to explicate their content in open discussions.

Below, we venture to challenge the two claims, hoping to elicit your comments, and to come to some understanding of what actually is meant by the phrase “missing data framework;” what is being “framed” and what remains “un-framed.”

Challenging Claim-1

It is incorrect to suppose that the role of missing data analysis in causal inference is “well understood.” Quite the opposite. Researchers adhering to missing data analysis invariably invoke an ad-hoc assumption called “conditional ignorability,” often decorated as “ignorable treatment assignment mechanism”, which is far from being “well understood” by those who make it, let alone those who need to judge its plausibility.

For readers versed in graphical modeling, “conditional ignorability” is none other than the back-door criterion that students learn in the second class on causal inference, and which “missing-data” advocates have vowed to avoid at all cost. As we know, this criterion can easily be interpreted and verified when background knowledge is presented in graphical form but, as you can imagine, it turns into a frightening enigma for those who shun the light of graphs. Still, the simplicity of reading this criterion off a graph makes it easy to test whether those who rely heavily on ignorability assumptions know what they are assuming. The results of this test are discomforting.

Marshall Joffe, at John Hopkins University, summed up his frustration with the practice and “understanding” of ignorability in these words: “Most attempts at causal inference in observational studies are based on assumptions that treatment assignment is ignorable. Such assumptions are usually made casually, largely because they justify the use of available statistical methods and not because they are truly believed.” [Joffe, etal 2010, “Selective Ignorability Assumptions in Causal Inference,” The International Journal of Biostatistics: Vol. 6: Iss. 2, Article 11.  DOI: 10.2202/Available at: http://www.bepress.com/ijb/vol6/iss2/11 ]

My personal conversations with leaders of the missing data approach to causation (these include seasoned researchers, educators and prolific authors) concluded with an even darker picture. None of those leaders was able to take a toy-example of 3-4 variables and determine whether conditional ignorability holds in the examples presented. It is not their fault, or course; determining
conditional ignorability is a hard cognitive and computational task that ordinary mortals cannot accomplish in their head, without the aids of graphs. (I base this assertion both on first-hand experience with students and colleagues and on intimate familiarity with issues of problem complexity and cognitive loads.)

Unfortunately, the mantra: “missing data analysis in causal inference is well understood” continues to be chanted at an ever increasing intensity, building faith among the faithful, and luring chanters to assume ignorability as self evident. Worse yet, the mantra blinds researchers from seeing how an improved level of understanding can emerge by abandoning the missing-data prism altogether, and conducting causal analysis in its natural habitat, using scientific models of reality rather than unruly patterns of missingness in the data.

A typical example of this trend is a recent article by Ding and Fan titled: “Causal Inference: A missing data perspective”.
https://arxiv.org/pdf/1712.06170.pdf
Sure enough, already on the ninth line of the abstract, the authors assume away non-ignorable treatments and, then, having  reached the safety zone of classical statistics, launch statistical estimation exercises on a variety of estimands. This creates the impression that “missing data perspective” is sufficient for  conducting “causal inference” when, in fact, the entire analysis rests on the assumption of ignorability, the one assumption that the missing data perspective lacks the tools to address.

The second part of Claim-1 is equally false: “causal inference theory based on counterfactuals relies on the missing data framework”. This may be true for the causal inference theory developed
by Rubin (1974) and expanded in Imbens and Rubin book (2015), but certainly not for the causal inference theory developed in (Pearl, 2000 2009) which is also based on counterfactuals, yet in no way relies on “the missing data framework”. On the contrary, page after page of (Pearl, 2000, 2009) emphasizes that counterfactuals are natural derivatives of the causal model used, and do not
require the artificial interpolation tools (eg imputations or matching) advocated by the missing data paradigm. Indeed, model-blind imputation can be shown to invite disasters in the class of “non ignorable” problems, something that is rarely acknowledged in the imputation-addicted literature. The very idea that certain parameters are not estimable, regardless of how clever the imputation is foreign to the missing data way of thinking. The same goes for the idea that some parameters are estimable while others are not.

In the past five years, we have done extensive reading into the missing data literature. [For a survey, see: http://ftp.cs.ucla.edu/pub/stat_ser/r473-L.pdf] It has become clear to us that this framework falls short of addressing three fundamental problems of modern causal analysis (1) To find if there exist sets of covariates that render treatments “ignorable”, (2) To estimate causal effects in cases where such sets do not exist, and (3) To decide if one’s modeling assumptions are compatible with the observed data.

It takes a theological leap of faith to imagine that a framework avoiding these fundamental problems can serve as an intellectual basis for a general theory of causal inference, a theory that has tackled those problems head on, and successfully so. Causal inference theory has advanced significantly beyond this stage – nonparametric estimability conditions have been established for causal and counterfactual relationships in both ignorable and non-ignorable problems. Can a framework bound to ignorability assumptions serve as a basis for one that has emancipated itself from such assumptions? We doubt it.

Challenging Claim 2.

We come now to claim (2), concerning the possibility of causality-free interpretation of missing data problems. It is possible indeed to pose a missing data problem in purely statistical terms, totally void of “missingness mechanism” vocabulary, void even of conditional independence assumptions. But this is rarely done, because the answer is trivial: none of the parameters of interest would be estimable without such assumptions (i.e, the likelihood function is flat). In theory, one can argue that there is really nothing causal about “missingness mechanism” as conceptualized by Rubin (1976), since it is defined in terms of conditional independence relations, a purely statistical notion that requires no reference to causation.

Not quite! The conditional independence relations that define missingness mechanisms are fundamentally different from those invoked in standard statistical analysis. In standard statistics, independence assumptions are presumed to hold in the distribution that governs the observed data, whereas in missing-data problems, the needed independencies are assumed to hold in the distribution of variables which are only partially observed. In other words, the independence assumptions invoked in missing data analysis are necessarily judgmental, and only rarely do they have
testable implications in the available data. [Fully developed in: http://ftp.cs.ucla.edu/pub/stat_ser/r473-L.pdf]

This behooves us to ask what kind of knowledge is needed for making reliable conditional independence judgments about a specific, yet partially observed problem domain. The graphical models literature has an unambiguous answer to this question: our judgment about statistical dependencies stems from our knowledge about causal dependencies, and the latters are organized in graphical form. The non-graphical literature has thus far avoided this question, presumably because it is a psychological issue that resides outside the scope of statistical analysis.

Psychology or not, the evidence from behavioral sciences is overwhelming that judgments about statistical dependence emanate from causal intuition. [see D. Kahneman “Thinking, Fast and Slow”
Chapter 16: Causes Trump Statistics]

In light of these considerations we would dare call for re-examination of the received mantra: 2.  “while missing data methods can form tools for causal inference, the converse cannot be true.” and reverse it, to read:

2′.  “while causal inference methods provide tools for solving missing data problems, the converse cannot be true.”

We base this claim on the following observations: 1. The assumptions needed to define the various types of missing data mechanisms are causal in nature. Articulating those assumption in causal vocabulary is natural, and results therefore in model transparency and credibility. 2. Estimability analysis based on causal modeling of missing data problems has charted new territories, including problems in the MNAR category (ie, Missing Not At Random), which were inaccessible to conventional missing-data analysis. In comparison, imputation-based approaches to missing data
do not provide guarantees of convergence (to consistent estimates) except for the narrow and unrecognizable class of problems in which ignorability holds. 3. Causal modeling of missing data problems has uncovered new ways of testing assumptions, which are infeasible in conventional missing-data analysis.

Perhaps even more convincingly, we were able to prove that no algorithm exists which decides if a parameter is estimable, without examining the causal structure of the model; statistical information is insufficient.

We hope these arguments convince even the staunchest missing data enthusiast to switch mantras and treat missing data problems for what they are: causal inference problems.

Judea Pearl, UCLA,
Karthika Mohan, UC Berkeley
———————————————–