Causal Analysis in Theory and Practice

January 1, 2000


Filed under: Uncategorized — bryantc @ 12:01 am

Thank you for visiting the Causal Analysis in Theory and Practice. We welcome participants from all backgrounds and views to post questions, opinions, or results for other visitors to chew on and respond to. Specific topics of interest include:

  • Questions regarding the basic principles of causal analysis including its meaning and historical development.
  • Views on the controversial status of causation (if any).
  • Reviews of current books and papers related to causal inference and its application
  • Discussion and comparison of various approaches and representations.
  • Development of practical applications in economics, social science, health sciences, political science, law, and other disciplines based on understanding of cause-effect relationships.

Submissions will be reviewed and posted on this blog anonymously unless the author gives permission to include the his/her name. The purpose of moderation is not to censor differing views, but rather, to ensure that the discussion remains relevant and professional.

To submit a topic or question for discussion, please complete a simple form. A reply will be sent upon receipt of a submission, and if a submission is not posted, a reason will be given. We appreciate your interest in causality and hope to hear your views on this subject.

April 28, 2018

Causal Inference Workshop at UAI 2018

Filed under: Announcement,Conferences — Judea Pearl @ 12:42 am

Dear friends in causality research,

You may find an upcoming workshop at UAI to be of interest; see the details below for more information:

7th Causal Inference Workshop at UAI 2018 – Intercontinental, Monterey, CA; August 2018

In recent years, causal inference has seen important advances, especially through a dramatic expansion in its theoretical and practical domains. By assuming a central role in decision making, causal inference has attracted interest from computer science, statistics, and machine learning, each field contributing a fresh and unique perspective.

More specifically, computer science has focused on the algorithmic understanding of causality, and general conditions under which causal structures may be inferred. Machine learning methods have focused on high-dimensional models and non-parametric methods, whereas more classical causal inference has been guiding policy in complex domains involving economics, social and health sciences, and business. Through such advances a powerful cross-pollination has emerged as a new set of methodologies promising to deliver robust data analysis than each field could individually — some examples include concepts such as doubly-robust methods, targeted learning, double machine learning, causal trees, all of which have recently been introduced.

This workshop is aimed at facilitating more interactions between researchers in machine learning, statistics, and computer science working on questions of causal inference. In particular, it is an opportunity to bring together highly technical individuals who are strongly motivated by the practical importance and real-world impact of their work. Cultivating such interactions will lead to the development of theory, methodology, and – most importantly – practical tools, that better target causal questions across different domains.

Important Dates
May 20 — Paper submission deadline; submission page:
June 20 — Author notification
July 20 — Camera ready version
August 10 — Workshop

Bryant Chen, IBM
Panos Toulis, University of Chicago
Alexander Volfovsky, Duke University

March 10, 2018

Challenging the Hegemony of Randomized Controlled Trials: Comments on Deaton and Cartwright

Filed under: Data Fusion,RCTs — Judea Pearl @ 12:20 am

I was asked to comment on a recent article by Angus Deaton and Nancy Cartwright (D&C), which touches on the foundations of causal inference. The article is titled: “Understanding and misunderstanding randomized controlled trials,” and can be viewed here:

My comments are a mixture of a welcome and a puzzle; I welcome D&C’s stand on the status of randomized trials, and I am puzzled by how they choose to articulate the alternatives.

D&C’s main theme is as follows: “We argue that any special status for RCTs is unwarranted. Which method is most likely to yield a good causal inference depends on what we are trying to discover as well as on
what is already known.” (Quoted from their introduction)

As a veteran challenger of the supremacy of the RCT, I welcome D&C’s challenge wholeheartedly. Indeed, “The Book of Why” (forthcoming, may 2018, quotes me as saying:
“If our conception of causal effects had anything to do with randomized experiments, the latter would have been invented 500 years before Fisher.” In this, as well as in my other writings I go so far as claiming that the RCT earns its legitimacy by mimicking the do-operator, not the other way around. In addition, considering the practical difficulties of conducting an ideal RCT, observational studies have a definite advantage: they interrogate populations at their natural habitats, not in artificial environments choreographed by experimental protocols.

Deaton and Cartwright’s challenge of the supremacy of the RCT consists of two parts:

  1. The first (internal validity) deals with the curse of dimensionality and argues that, in any single trial, the outcome of the RCT can be quite distant from the target causal quantity, which is usually the average treatment effect (ATE). In other words, this part concerns imbalance due to finite samples, and reflects the traditional bias-precision tradeoff in statistical analysis and machine learning.
  2. The second part (external validity) deals with biases created by inevitable disparities between the conditions and populations under study versus those prevailing in the actual implementation of the treatment program or policy. Here, Deaton and Cartwright propose alternatives to RCT, calling all out for integrating a web of multiple information sources, including observational, experimental, quasi-experimental, and theoretical inputs, all collaborating towards the goal of estimating “what we are trying to discover”.

My only qualm with D&C’s proposal is that, in their passion to advocate the integration strategy, they have failed to notice that, in the past decade, a formal theory of integration strategies has emerged from the brewery of causal inference and is currently ready and available for empirical researchers to use. I am referring of course to the theory of Data Fusion which formalizes the integration scheme in the language of causal diagrams, and provides theoretical guarantees of feasibility and performance. (see )

Let us examine closely D&C’s main motto: “Which method is most likely to yield a good causal inference depends on what we are trying to discover as well as on what is already known.” Clearly, to cast this advice in practical settings, we must devise notation, vocabulary, and logic to represent “what we are trying to discover” as well as “what is already known” so that we can infer the former from the latter. To accomplish this nontrivial task we need tools, theorems and algorithms to assure us that what we conclude from our integrated study indeed follows from those precious pieces of knowledge that are “already known.” D&C are notably silent about the language and methodology in which their proposal should be carried out. One is left wondering therefore whether they intend their proposal to remain an informal, heuristic guideline, similar to Bradford Hill’s Criteria of the 1960’s, or be explicated in some theoretical framework that can distinguish valid from invalid inference? If they aspire to embed their integration scheme within a coherent framework, then they should celebrate; Such a framework has been worked out and is now fully developed.

To be more specific, the Data Fusion theory described in provides us with notation to characterize the nature of each data source, the nature of the population interrogated, whether the source is an observational or experimental study, which variables are randomized and which are measured and, finally, the theory tells us how to fuse all these sources together to synthesize an estimand of the target causal quantity at the target population. Moreover, if we feel uncomfortable about the assumed structure of any given data source, the theory tells us whether an alternative source can furnish the needed information and whether we can weaken any of the model’s assumptions.

Those familiar with Data Fusion theory will find it difficult to understand why D&C have not utilized it as a vehicle to demonstrate the feasibility of their proposed alternatives to RCT’s. This enigma stands out in D&C’s description of how modern analysis can rectify the deficiencies of RCTs, especially those pertaining to generalizing across populations, extrapolating across settings, and controlling for selection bias.

Here is what D&C’s article says about extrapolation (Quoting from their Section 3.5, “Re-weighting and stratifying”): “Pearl and Bareinboim (2011, 2014) and Bareinboim and Pearl (2013, 2014)
provide strategies for inferring information about new populations from trial results that are more general than re-weighting. They suppose we have available both causal information and probabilistic information for population A (e.g. the experimental one), while for population B (the target) we have only (some) probabilistic information, and also that we know that certain probabilistic and causal facts are shared between the two and certain ones are not. They offer theorems describing what causal conclusions about population B are thereby fixed. Their work underlines the fact that exactly what conclusions about one population can be supported by information about another depends on exactly what causal and probabilistic facts they have in common.”

The text is accurate up to this point, but then it changes gears and states: “But as Muller (2015) notes, this, like the problem with simple re-weighting, takes us back to the situation that RCTs are designed to avoid, where we need to start from a complete and correct specification of the causal structure. RCTs can avoid this in estimation which is one of their strengths, supporting their credibility but the benefit vanishes as soon as we try to carry their results to a new context. ” I believe D&C miss the point about re-weighing and stratifying.

First, it is not the case that “this takes us back to the situation that RCTs are designed to avoid.” It actually takes us to a more manageable situation. RCTs are designed to neutralize the confounding of treatments, whereas our methods are designed to neutralize differences between populations. Researchers may be totally ignorant of the structure of the former and quite knowledgeable about the structure of the latter. To neutralize selection bias, for example, we need to make assumptions about the process of recruiting subjects for the trial, a process over which we have some control. There is a fundamental difference therefore between assumptions about covariates that determine patients choice of treatment and those that govern the selection of subjects — the latter is (partially) under our control. Replacing one set of assumptions with another, more defensible set, does not “take us back to the situation that RCTs are designed to avoid.” It actually takes us forward, towards the ultimate goal of causal inference — to base conclusions on scrutinizable assumptions, and to base their plausibility on scientific or substantive grounds.

Second, D&C overlook the significance of the “completeness” results established for transportability problems (see Completeness tells us, in essence, that one cannot do any better. In other words, it delineates precisely the minimum set of assumptions that are needed to establish consistent estimate of causal effects in the target population. If any of those assumptions are violated we know that we can do only worse. From a mathematical (and philosophical) viewpoint, this is the most one can expect analysis to do for us and, therefore, completeness renders the generalizability problem “solved.”

Finally, the completeness result highlights the broad implications of the Data Fusion theory, and how it brings D&C’s desiderata closer to becoming a working methodology. Completeness tells us that any envisioned strategy of study integration is either embraceable in the structure-based framework of Data Fusion, or it is not workable in any framework. This means that one cannot dismiss the conclusions of Data Fusion theory on the grounds that: “Its assumptions are too strong.” If a set of assumptions is deemed necessary in the Data Fusion analysis, then it is necessary period; it cannot be avoided or relaxed, unless it is supplemented by other assumptions elsewhere, and the algorithm can tell you where.

It is hard to see therefore why any of D&C’s proposed strategies would resist formalization, analysis and solution within the current logic of modern causal inference.

It took more than a dozen years for researchers to accept the notion of  completeness in the context of internal validity. Utilizing the tools of the do-calculus (Pearl, 1995, Tian and Pearl, 2001, Shpitser & Pearl, 2006) completeness tells us what assumptions are absolutely needed for nonparametric identification of causal effects, how to tell if they are satisfied in any specific problem description, and how to use them to extract causal parameters from non-experimental studies. Completeness in external validity context is a relatively new result (See: which will probably take a few more years for enlightened researchers to accept, appreciate and to fully utilize. One purpose of this post is to urge the research community, especially Deaton and Cartwright to study the recent mathematization of externaly validity and to benefit from its implications.

I would be very interested in seeing other readers reaction to D&C’s article, as well as to my optimistic assessment of what causal inference can do for us in this day and age. I have read the reactions of Andrew Gelman (on his blog) and Stephen J. Senn (on Deborah Mayo’s blog, but they seem to be unaware of the latest developments in Data Fusion analysis. I also invite Angus Deaton and Nancy Cartwright to share a comment or two on these issues. I hope they respond positively.

Looking forward to your comments,

Addendum to “Challenging the Hegemony of RCTs”
Upon re-reading the post above I realized that I have assumed readers to be familiar with Data Fusion theory. This Addendum aims at readers who are not familiar with the theory, who would probably be asking: “Who needs a new theory to do what statistics does so well?” “Once we recognize the importance of diverse  sources of data, statistics can be helpful in making decisions and quantifying uncertainty.” [Quoted from Andrew Gelman’s blog]. The reason I question the sufficiency of statistics to manage the integration of diverse sources of data is that statistics lacks the vocabulary needed for the job. Let us demonstrate it in a couple of toy examples, taken from BP-2015 (

Example 1
Suppose we wish to estimate the average causal effect of X on Y, and we have two diverse sources of data:

  1. An RCT in which Z, not X, is randomized, and
  2. An observational study in which X, Y, and Z are measured.

What substantive assumptions are needed to facilitate a solution to our problem? Put another way, how can we be sure that, once we make those assumptions, we can solve our problem?

Example 2
Suppose we wish to estimate the average causal effect ACE of X on Y, and we have two diverse sources of data:

  1. An RCT in which the effect of X on both Y and Z is measured, but the recruited subjects had non-typical values of Z.
  2. An observational study conducted in the target population, in which both X and Z (but not Y) were measured.

What substantive assumptions would enable us to estimate ACE, and how should we combine data from the two studies so as to synthesize a consistent estimate of ACE?

The nice thing about a toy example is that the solution is known to us in advance, and so, we can check any proposed solution for correctness. Curious readers can find the solutions for these two examples in More ambitious readers will probably try to solve them using statistical techniques, such as meta analysis or partial pooling. The reason I am confident that the second group will end up with disappointment comes from a profound statement made by Nancy Cartwright in 1989: “No Causes In, No Causes Out”. It means not only that you need substantive assumptions to derive causal conclusions; it also means that the vocabulary of statistical analysis, since it is built entirely on properties of distribution functions, is inadequate for expressing those substantive assumptions that are needed for getting causal conclusions. In our examples, although part of the data is provided by an RCT, hence it provides causal information, one can still show that the needed assumptions must invoke causal vocabulary; distributional assumptions are insufficient. As someone versed in both graphical modeling and counterfactuals, I would go even further and state that it would be a miracle if anyone succeeds in  translating the needed assumptions into a comprehensible language other than causal diagrams. (See Appendix, Scenario 3.)

Armed with these examples and findings, we can go back and re-examine why D&C do not embrace the Data Fusion methodology in their quest for integrating diverse sources of data. The answer, I conjecture, is that D&C were not intimately familiar with what this methodology offers us, and how vastly different it is from previous attempts to operationalize Cartwright’s dictum: “No causes in, no causes out”.

March 1, 2018

Special Greeting from the UCLA Causality Blog

Filed under: Announcement — Judea Pearl @ 10:34 pm

Dear friends in causality research,

This greeting is somewhat different from those you have been receiving in the past 18 years (Yes, it has been that long, see, January 1, 2000). Instead of new results, passionate discussions, breakthroughs, controversies, and question and answers sessions, this greeting brings you a musical offering: The Book of Why. It is a new book that I have co-authored recently with Dana MacKenzie (, forthcoming May 15, 2018. The book tells the story, in layman’s terms, of the new science of cause and effect, the one we have been nourishing, playing with, and marveling at on this blog.

By “the new science” I mean going back, not merely to the causal revolution of the past few decades, but all the way to the day when scientists first assigned a mathematical symbol to a causal relation.

Joining me in this journey you will see how leaders in your own field managed to cope with the painful transition from statistical to causal thinking.

Despite my personal obsession with mathematical tools, this book has taught me that the story of causal inference looks totally different from the conceptual, non-technical viewpoint of our intended readers. So different in fact that I occasionally catch myself tuning to the music of The Book of Why when seeking a deeper understanding of a dry equation. I hope you and your students find it as useful and as enjoyable.

The publisher’s description can be viewed here: while the Table of Content and sample chapters can be viewed here:

Our publisher also assures us that the book can be pre-ordered at no extra cost, and on your favorite website.

And may our story be inscribed in the book of worthy causes.


January 24, 2018

Can DAGs Do the Un-doable?

Filed under: DAGs,Discussion — Judea Pearl @ 2:32 am

The following question was sent to us by Igor Mandel:

Separation of variables with zero causal coefficients from others
Here is a problem. Imagine, we have a researcher who has some understanding of the particular problem, and this understanding is partly or completely wrong. Can DAG or other (if any) causality theory convincingly establish this fact (that she is wrong)?

To be more specific, let’s consider a simple example with kind of undisputable causal variables (described in details in ). One wants to estimate, how different food’s ingredients affect the energy (in calories) containing in different types of food. She takes many samples and measures different things. But she doesn’t know about existence of the fats and proteins – yet she knows, that there are carbohydrates, water and fiber. She builds a respective DAG, how she feels it should be:

From our (i.e. educated people of 21st century) standpoint the arrows from Fiber and Water to Calories have zero coefficients. But since data bear significant correlations between Calories, Water and Fiber – any regression estimates would show non-zero values for these coefficients. Is there way to say, that these non-zero values are wrong, not just quantitatively, but kind of qualitatively?
Even brighter example of what is often called “spurious correlation”. It was “statistically proven” almost 20 years ago, that storks deliver babies ( ) – while many women still believe they do not. How to reconvince those statistically ignorant women? Or – how to strengthen their naïve, but statistically not confirmed beliefs, just looking at the data and not asking them for some babies related details? What kind of DAG may help?

My Response
This question, in a variety of settings, has been asked by readers of this blog since the beginning of the Causal Revolution. The idea that new tools are now available that can handle causal problems free of statistical dogmas has encouraged thousands of researchers to ask: Can you do this, or can you do that? The answers to such questions are often trivial, and can be obtained directly from the logic of causal inference, without the details of the question. I am not surprised however that such questions surface again, in 2018, since the foundations of causal inference are rarely emphasized in the technical literature, so they tend to be forgotten.

I will answer Igor’s question as a student of modern logic of causation.

1. Can a DAG distinguish variables with zero causal effects (on Y) from those having non-zero effects.

Of course not, no method in the world can do that without further assumption. Here is why:
The question above concerns causal relations. We know from first principle that no causal query can be answered from data alone, without causal information that lies outside the data.
[It does not matter if your query is quantitative or qualitative, if you address it to a story or to a graph. Every causal query needs causal assumptions. No causes in – no causes out (N. Cartwright)]

2. Can DAG-based methods do anything more than just quit with failure?

Of course they can.

2.1 First notice that the distinction between having or not having causal effect is a property of nature, (or the data generating process), not of the model that you postulate. We can therefore ignore the diagram that Igor describes above. Now, in addition to quitting for lack of information, DAG-based methods would tell you: “If you can give me some causal information, however qualitative, I will tell you if it is sufficient or not for answering your query.” I hope readers would agree with me that this kind of an answer, though weaker than the one expected by the naïve inquirer, is much more informative than just quitting in despair.

2.2 Note also that postulating a whimsical model like the one described by Igor above has no bearing on the answer. To do anything useful in causal inference we need to start with a model of reality, not with a model drawn by a confused researcher, for whom an arrow is nothing more than “data bears significant correlation” or “regression estimates show non-zero values.”

2.3 Once you start with a postulated model of reality, DAG-based methods can be very helpful. For example, they can take your postulated model and determine which of the arrows in the model should have a zero coefficient attached to it, which should have a non-zero coefficient attached to it, and which would remain undecided till the end of time.

2.4 Moreover, assume reality is governed by model M1 and you postulate model M2, different from M1. DAG-based methods can tell you which causal query you will answer correctly and which you will
answer incorrectly. (see section 4.3 of ). This is nice, because it offers us a kind of sensitivity analysis: how far should reality be from your assumed model before you will start making mistakes?

2.5 Finally, DAG-based methods identify for us the testable implication of our model, so that we can test models for compatibility with data.

I am glad Igor raised the question that he did. There is a tendency to forget fundamentals, and it is healthy to rehearse them periodically.

– Judea

January 10, 2018

2018 Winter Update

Filed under: Announcement,General — Judea Pearl @ 10:07 pm

Dear friends in causality research,

Welcome to the 2018 Winter Greeting from the UCLA Causality Blog. This greeting discusses the following topics:

1.  A report is posted, on the “What If” workshop at the NIPS conference  (see December 19, 2017 post below). It discusses my presentation of: Theoretical Impediments to Machine Learning, a newly revised version of which can be viewed here. []

2. New posting: “Facts and Fiction from the Missing Data Framework”. We are inviting discussion of two familiar mantras:
Mantra-1. “The role of missing data analysis in causal inference is well understood (eg causal inference theory based on counterfactuals relies on the missing data framework).
Mantra-2. “while missing data methods can form tools for causal inference, the converse cannot be true.”

We explain why we believe both mantras to be false, but we would like to hear you opinion before firming up our minds.

3. A review paper is available here:
Titled: “Graphical Models for Processing Missing Data.” It explains and demonstrates why missing data is a causal inference problem.

4. A new page is now up, providing information on “The Book of Why”
It contains Table of Contents and excerpts from the book.

5. Nominations are now open for the ASA Causality in Education Award. The nomination deadline is March 1, 2018. For more information, please see

6. For those of us who were waiting patiently for the Korean translation of Primer — our long wait is finally over. The book is available now in colorful cover and in optimistic North Korean accent.

Don’t miss the gentlest introduction to causal inference.

Enjoy, and have a productive 2018.

Facts and Fiction from the “Missing Data Framework”

Filed under: Missing Data — Judea Pearl @ 9:15 am

Last month, Karthika Mohan and I received a strange review from a prominent Statistical Journal. Among other comments, we found the following two claims about a conception called “missing data framework.”

Claim-1: “The role of missing data analysis in causal inference is well understood (eg causal inference theory based on counterfactuals relies on the missing data framework).
Claim-2: “While missing data methods can form tools for causal inference, the converse cannot be true.”

I am sure that you have seen similar claims made in the literature, in lecture notes, in reviews of technical papers, or informal conversations in the cafeteria. Oddly, based on everything that we have read and researched about missing data we came to believe that both statements are false. Still, these claims are being touted widely, routinely, and  unabashedly, with only scattered attempts to explicate their content in open discussions.

Below, we venture to challenge the two claims, hoping to elicit your comments, and to come to some understanding of what actually is meant by the phrase “missing data framework;” what is being “framed” and what remains “un-framed.”

Challenging Claim-1

It is incorrect to suppose that the role of missing data analysis in causal inference is “well understood.” Quite the opposite. Researchers adhering to missing data analysis invariably invoke an ad-hoc assumption called “conditional ignorability,” often decorated as “ignorable treatment assignment mechanism”, which is far from being “well understood” by those who make it, let alone those who need to judge its plausibility.

For readers versed in graphical modeling, “conditional ignorability” is none other than the back-door criterion that students learn in the second class on causal inference, and which “missing-data” advocates have vowed to avoid at all cost. As we know, this criterion can easily be interpreted and verified when background knowledge is presented in graphical form but, as you can imagine, it turns into a frightening enigma for those who shun the light of graphs. Still, the simplicity of reading this criterion off a graph makes it easy to test whether those who rely heavily on ignorability assumptions know what they are assuming. The results of this test are discomforting.

Marshall Joffe, at John Hopkins University, summed up his frustration with the practice and “understanding” of ignorability in these words: “Most attempts at causal inference in observational studies are based on assumptions that treatment assignment is ignorable. Such assumptions are usually made casually, largely because they justify the use of available statistical methods and not because they are truly believed.” [Joffe, etal 2010, “Selective Ignorability Assumptions in Causal Inference,” The International Journal of Biostatistics: Vol. 6: Iss. 2, Article 11.  DOI: 10.2202/1557-4679.1199 Available at: ]

My personal conversations with leaders of the missing data approach to causation (these include seasoned researchers, educators and prolific authors) concluded with an even darker picture. None of those leaders was able to take a toy-example of 3-4 variables and determine whether conditional ignorability holds in the examples presented. It is not their fault, or course; determining
conditional ignorability is a hard cognitive and computational task that ordinary mortals cannot accomplish in their head, without the aids of graphs. (I base this assertion both on first-hand experience with students and colleagues and on intimate familiarity with issues of problem complexity and cognitive loads.)

Unfortunately, the mantra: “missing data analysis in causal inference is well understood” continues to be chanted at an ever increasing intensity, building faith among the faithful, and luring chanters to assume ignorability as self evident. Worse yet, the mantra blinds researchers from seeing how an improved level of understanding can emerge by abandoning the missing-data prism altogether, and conducting causal analysis in its natural habitat, using scientific models of reality rather than unruly patterns of missingness in the data.

A typical example of this trend is a recent article by Ding and Fan titled: “Causal Inference: A missing data perspective”.
Sure enough, already on the ninth line of the abstract, the authors assume away non-ignorable treatments and, then, having  reached the safety zone of classical statistics, launch statistical estimation exercises on a variety of estimands. This creates the impression that “missing data perspective” is sufficient for  conducting “causal inference” when, in fact, the entire analysis rests on the assumption of ignorability, the one assumption that the missing data perspective lacks the tools to address.

The second part of Claim-1 is equally false: “causal inference theory based on counterfactuals relies on the missing data framework”. This may be true for the causal inference theory developed
by Rubin (1974) and expanded in Imbens and Rubin book (2015), but certainly not for the causal inference theory developed in (Pearl, 2000 2009) which is also based on counterfactuals, yet in no way relies on “the missing data framework”. On the contrary, page after page of (Pearl, 2000, 2009) emphasizes that counterfactuals are natural derivatives of the causal model used, and do not
require the artificial interpolation tools (eg imputations or matching) advocated by the missing data paradigm. Indeed, model-blind imputation can be shown to invite disasters in the class of “non ignorable” problems, something that is rarely acknowledged in the imputation-addicted literature. The very idea that certain parameters are not estimable, regardless of how clever the imputation is foreign to the missing data way of thinking. The same goes for the idea that some parameters are estimable while others are not.

In the past five years, we have done extensive reading into the missing data literature. [For a survey, see:] It has become clear to us that this framework falls short of addressing three fundamental problems of modern causal analysis (1) To find if there exist sets of covariates that render treatments “ignorable”, (2) To estimate causal effects in cases where such sets do not exist, and (3) To decide if one’s modeling assumptions are compatible with the observed data.

It takes a theological leap of faith to imagine that a framework avoiding these fundamental problems can serve as an intellectual basis for a general theory of causal inference, a theory that has tackled those problems head on, and successfully so. Causal inference theory has advanced significantly beyond this stage – nonparametric estimability conditions have been established for causal and counterfactual relationships in both ignorable and non-ignorable problems. Can a framework bound to ignorability assumptions serve as a basis for one that has emancipated itself from such assumptions? We doubt it.

Challenging Claim 2.

We come now to claim (2), concerning the possibility of causality-free interpretation of missing data problems. It is possible indeed to pose a missing data problem in purely statistical terms, totally void of “missingness mechanism” vocabulary, void even of conditional independence assumptions. But this is rarely done, because the answer is trivial: none of the parameters of interest would be estimable without such assumptions (i.e, the likelihood function is flat). In theory, one can argue that there is really nothing causal about “missingness mechanism” as conceptualized by Rubin (1976), since it is defined in terms of conditional independence relations, a purely statistical notion that requires no reference to causation.

Not quite! The conditional independence relations that define missingness mechanisms are fundamentally different from those invoked in standard statistical analysis. In standard statistics, independence assumptions are presumed to hold in the distribution that governs the observed data, whereas in missing-data problems, the needed independencies are assumed to hold in the distribution of variables which are only partially observed. In other words, the independence assumptions invoked in missing data analysis are necessarily judgmental, and only rarely do they have
testable implications in the available data. [Fully developed in:]

This behooves us to ask what kind of knowledge is needed for making reliable conditional independence judgments about a specific, yet partially observed problem domain. The graphical models literature has an unambiguous answer to this question: our judgment about statistical dependencies stems from our knowledge about causal dependencies, and the latters are organized in graphical form. The non-graphical literature has thus far avoided this question, presumably because it is a psychological issue that resides outside the scope of statistical analysis.

Psychology or not, the evidence from behavioral sciences is overwhelming that judgments about statistical dependence emanate from causal intuition. [see D. Kahneman “Thinking, Fast and Slow”
Chapter 16: Causes Trump Statistics]

In light of these considerations we would dare call for re-examination of the received mantra: 2.  “while missing data methods can form tools for causal inference, the converse cannot be true.” and reverse it, to read:

2′.  “while causal inference methods provide tools for solving missing data problems, the converse cannot be true.”

We base this claim on the following observations: 1. The assumptions needed to define the various types of missing data mechanisms are causal in nature. Articulating those assumption in causal vocabulary is natural, and results therefore in model transparency and credibility. 2. Estimability analysis based on causal modeling of missing data problems has charted new territories, including problems in the MNAR category (ie, Missing Not At Random), which were inaccessible to conventional missing-data analysis. In comparison, imputation-based approaches to missing data
do not provide guarantees of convergence (to consistent estimates) except for the narrow and unrecognizable class of problems in which ignorability holds. 3. Causal modeling of missing data problems has uncovered new ways of testing assumptions, which are infeasible in conventional missing-data analysis.

Perhaps even more convincingly, we were able to prove that no algorithm exists which decides if a parameter is estimable, without examining the causal structure of the model; statistical information is insufficient.

We hope these arguments convince even the staunchest missing data enthusiast to switch mantras and treat missing data problems for what they are: causal inference problems.

Judea Pearl, UCLA,
Karthika Mohan, UC Berkeley

December 19, 2017

NIPS 2017: Q&A Follow-up

Filed under: Conferences,General — Judea Pearl @ 6:42 am
Dear friends in causal research,
Last week I spoke at a workshop on machine learning and causality, which followed the NIPS conference in Long Beach. Below please find my response to several questions I was asked
after my talk. I hope you will find the questions and answers to be of relevance to issues discussed on this blog.
To: Participants at the NIPS “What If” workshop
Dear friends,
Some of you asked me for copies of my slides. I am attaching them with this message, and you can get the accompanying paper by clicking here:

NIPS 17 – What If? Workshop Slides (PDF)

NIPS 17 – What If? Workshop Slides (PPT [zipped])

I have also received interesting questions at the end of my talk, which I could not fully answer in the short break we had. I will try to answer them below.

Q.1. What do you mean by the “Causal Revolution”?
Ans.1: “Revolution” is a poetic word to summarize Gary King’s observation:  “More has been learned about causal inference in the last few decades than the sum total of everything that had been learned about it in all prior recorded history” (see cover of Morgan and Winship’s book, 2015). It captures the miracle that only three decades ago we could not write a formula for: “Mud does not
cause Rain” and, today, we can formulate and estimate every causal or counterfactual statement.

Q2: Are the estimates produced by graphical models the same as those produced by the potential outcome approach?
Ans.2: Yes, provided the two approaches start with the same set of assumptions. The assumptions in the graphical approach are advertised in the graph, while those in the potential outcome approach are articulated separately by the investigator, using counterfactual vocabulary.

Q3: The method of imputing potential outcomes to individual units in a table appears totally different from the methods used in the graphical approach. Why the difference?
Ans.3: Imputation works only when certain assumptions of conditional ignorability hold. The table itself does not show us what the assumption are, nor what they mean. To see what they mean we need a graph, since no mortal can process such assumptions in his/her head. The apparent difference in procedures reflects the insistence (in the graphical framework) on seeing the assumptions, rather than wishing them away.

Q4: Some say that economists do not use graphs because their problems are different, and they cannot afford to model the entire economy. Do you agree with this explanation?
Ans.4: No way! Mathematically speaking, economic problems are no different from those faced by epidemiologists (or other social scientists) for whom graphical models have become a second language. Moreover, epidemiologists have never complained that graphs force them to model the entirety of the human anatomy. Graph-avoidance among (some) economists is a cultural phenomenon, reminiscent of telescope-avoidance among Church astronomers in 17th century Italy. Bottom line: epidemiologists can judge the plausibility of their assumptions — graph-avoiding economists cannot. (I have offered them many opportunities to demonstrate it in public, and I don’t blame them for remaining silent; it is not a problem that can be managed by an unaided intellect)

Q.5: Isn’t deep-learning more than just glorified curve-fitting? After all, the objective of curve-fitting is to maximize “fit”, while in deep-learning much effort goes into minimizing “over-fit”.
Ans.5: No matter what acrobatics  you go through to minimize overfitting or other flaws in your learning strategy, you are still optimizing some property of the observed data while making no reference to the world outside the data.  This puts  you right back on rung-1 of the Ladder of Causation with all the limitations that rung-1 entails.

If you have additional questions on these or other topics, feel free to post them here on our blog, (anonymity will be respected), and I will try my best to answer them.


August 2, 2017

2017 Mid-Summer Update

Filed under: Counterfactual,Discussion,Epidemiology — Judea Pearl @ 12:55 am

Dear friends in causality research,

Welcome to the 2017 Mid-summer greeting from the Ucla Causality Blog.

This greeting discusses the following topics:

1. “The Eight Pillars of Causal Wisdom” and the WCE 2017 Virtual Conference Website.
2. A discussion panel: “Advances in Deep Neural Networks”,
3. Comments on “The Tale Wagged by the DAG”,
4. A new book: “The book of Why”,
5. A new paper: Disjunctive Counterfactuals,
6. Causality in Education Award,
7. News on “Causal Inference: A  Primer”

1. “The Eight Pillars of Causal Wisdom”

The tenth annual West Coast Experiments Conference was held at UCLA on April 24-25, 2017, preceded by a training workshop  on April 23.

You will be pleased to know that the WCE 2017 Virtual Conference Website is now available here:
It provides videos of the talks as well as some of the papers and presentations.

The conference brought together scholars and graduate students in economics, political science and other social sciences who share an interest in causal analysis. Speakers included:

1. Angus Deaton, on Understanding and misunderstanding randomized controlled trials.
2. Chris Auld, on the on-going confusion between regression vs. structural equations in the econometric literature.
3. Clark Glymour, on Explanatory Research vs Confirmatory Research.
4. Elias Barenboim, on the solution to the External Validity problem.
5. Adam Glynn, on Front-door approaches to causal inference.
6. Karthika Mohan, on Missing Data from a causal modeling perspective.
7. Judea Pearl, on “The Eight Pillars of Causal Wisdom.”
8. Adnan Darwiche, on Model-based vs. Model-Blind Approaches to Artificial Intelligence.
9. Niall Cardin, Causal inference for machine learning.
10. Karim Chalak, Measurement Error without Exclusion.
11. Ed Leamer, “Causality Complexities Example: Supply and Demand.
12. Rosa Matzkin, “Identification is simultaneous equation.
13 Rodrigo Pinto, Randomized Biased-controlled Trials.

The video of my lecture “The Eight Pillars of Causal Wisdom” can be watched here:
A transcript of the talk can be found here:

2. “Advances in Deep Neural Networks”

As part of the its celebration of the 50 years of the Turing Award, the ACM has organized several discussion sessions on selected topics in computer science. I participated in a panel discussion on
“Advances in Deep Neural Networks”, which gave me an opportunity to share thoughts on whether learning methods based solely on data fitting can ever achieve a human-level intelligence. The discussion video can be viewed here:
A position paper that defends these thoughts is available here:

3. The Tale Wagged by the DAG

An article by this title, authored by Nancy Krieger and George Davey Smith has appeared in the International Journal of Epidemiology, IJE 2016 45(6) 1787-1808.
It is part of a special IJE issue on causal analysis which, for the reasons outlined below, should be of interest to readers of this blog.

As the title tell-tales us, the authors are unhappy with the direction that modern epidemiology has taken, which is too wedded to a two-language framework:
(1) Graphical models (DAGs) — to express what we know, and
(2) Counterfactuals (or potential outcomes) — to express what we wish to know.

The specific reasons for the authors unhappiness are still puzzling to me, because the article does not demonstrate concrete alternatives to current methodologies. I can only speculate however that it is the dazzling speed with which epidemiology has modernized its tools that lies behind the authors discomfort. If so, it would be safe for us to assume that the discomfort will subside as soon as researchers gain greater familiarity with the capabilities and flexibility of these new tools.  I nevertheless recommend that the article, and the entire special issue of IJE be studied by our readers, because they reflect an interesting soul-searching attempt by a forward-looking discipline to assess its progress in the wake of a profound paradigm shift.

Epidemiology, as I have written on several occasions, has been a pioneer in accepting the DAG-counterfactuals symbiosis as a ruling paradigm — way ahead of mainstream statistics and its other satellites. (The social sciences, for example, are almost there, with the exception of the model-blind branch of econometrics. See Feb. 22 2017 posting)

In examining the specific limitations that Krieger and Davey Smith perceive in DAGs, readers will be amused to note that these limitations coincide precisely with the strengths for which DAGs are praised.

For example, the article complains that DAGs provide no information about variables that investigators chose not to include in the model.  In their words: “the DAG does not provide a comprehensive picture. For example, it does not include paternal factors, ethnicity, respiratory infections or socioeconomic position…” (taken from the Editorial introduction). I have never considered this to be a limitation of DAGs or of any other scientific modelling. Quite the contrary. It would be a disaster if models were permitted to provide information unintended by the modeller. Instead, I have learned to admire the ease with which DAGs enable researchers to incorporate knowledge about new variables, or new mechanisms, which the modeller wishes
to embrace.

Model misspecification, after all,  is a problem that plagues every  exercise in causal inference, no matter what framework one chooses to adapt. It can only be cured by careful model-building
strategies, and by enhancing the modeller’s knowledge. Yet, when it comes to minimizing misspecification errors, DAGS have no match. The transparency with which DAGs display the causal assumptions in the model, and the ease with which the DAG identifies the testable implications of those assumptions are incomparable; these facilitate speedy model diagnosis and repair with no match in sight.

Or, to take another example, the authors call repeatedly for an ostensibly unavailable methodology which they label “causal triangulation” (it appears 19 times in the article). In their words: “In our field, involving dynamic populations of people in dynamic societies and ecosystems, methodical triangulation of diverse types of evidence from diverse types of study settings and involving diverse populations is essential.”  Ironically, however, the task of treating “diverse type of evidence from diverse populations” has been accomplished quite successfully in the dag-counterfactual framework. See, for example the formal and complete results of (Bareinbaum and Pearl, 2016, which have emerged from DAG-based perspective and invoke the do-calculus. (See also is inconceivable for me to imagine anyone pooling data from two different designs (say
experimental and observational) without resorting to DAGs or (equivalently) potential outcomes, I am open to learn.

Another conceptual paradigm which the authors hope would liberate us from the tyranny of DAGs and counterfactuals is Lipton’s (2004) romantic aspiration for “Inference to the Best Explanation.” It is a compelling, century old mantra, going back at least to Charles Pierce theory of abduction (Pragmatism and Pragmaticism, 1870) which, unfortunately, has never operationalized its key terms: “explanation,” “Best” and “inference to”.  Again, I know of only one framework in which this aspiration has been explicated with sufficient precision to produce tangible results — it is the structural framework of DAGs and counterfactuals. See, for example, Causes of Effects and Effects of Causes”
and Halpern and Pearl (2005) “Causes and explanations: A structural-model approach”

In summary, what Krieger and Davey Smith aspire to achieve by abandoning the structural framework has already been accomplished with the help and grace of that very framework.
More generally, what we learn from these examples is that the DAG-counterfactual symbiosis is far from being a narrow “ONE approach to causal inference” which ” may potentially lead to spurious causal inference” (their words). It is in fact a broad and flexible framework within which a plurality of tasks and aspirations can be formulated, analyzed and implemented. The quest for metaphysical alternatives is not warranted.

I was pleased to note that, by and large, commentators on Krieger and Davey Smith paper seemed to be aware of the powers and generality of the DAG-counterfactual framework, albeit not exactly for the reasons that I have described here. [footnote: I have many disagreements with the other commentators as well, but I wish to focus here on the TALE WAGGED DAG where the problems appear more glaring.] My talk on “The Eight Pillars of Causal Wisdom” provides a concise summary of those reasons and explains why I take the poetic liberty of calling these pillars “The Causal Revolution”

All in all, I believe that epidemiologists should be commended for the incredible progress they have made in the past two decades. They will no doubt continue to develop and benefit from the new tools that the DAG-counterfactual symbiosis has spawn. At the same time, I hope that the discomfort that Krieger and Davey Smith’s have expressed will be temporary and that it will inspire a greater understanding of the modern tools of causal inference.

Comments on this special issue of IJE are invited on this blog.

4. The Book of WHY

As some of you know, I am co-authoring another book, titled: “The Book of Why: The new science of cause and effect”. It will attempt to present the eight pillars of causal wisdom to the general public using words, intuition and examples to replace equations. My co-author is science writer Dana MacKenzie ( and our publishing house is Basic Books. If all goes well, the book will see your shelf by March 2018. Selected sections will appear periodically on this blog.

5. Disjunctive Counterfactuals

The structural interpretation of counterfactuals as formulated in Balke and Pearl (1994) excludes  disjunctive conditionals, such as “had X been x1 or x2”, as well as disjunctive actions such as do(X=x1 or X=x2).  In contrast, the closest-world interpretation of Lewis ( 1973) assigns truth values to all counterfactual sentences, regardless of the logical form of the antecedant. The next issue of the Journal of Causal Inference will include a paper that extends the vocabulary of structural counterfactuals with disjunctions, and clarifies the assumptions needed for the extension. An advance copy can be viewed here:

6.  ASA Causality in Statistics Education Award

Congratulations go to Ilya Shpitser, Professor of Computer Science at Johns Hopkins University, who is the 2017 recipient of the ASA Causality in Statistics Education Award.  Funded by Microsoft Research and Google, the $5,000 Award, will be presented to Shpitser at the 2017 Joint Statistical Meetings (JSM 2017) in Baltimore.

Professor Shpitser has developed Masters level graduate course material that takes causal inference from the ivory towers of research to the level of students with a machine learning and data science background. It combines techniques of graphical and counterfactual models and provides both an accessible coverage of the field and excellent conceptual, computational and project-oriented exercises for students.

These winning materials and those of the previous Causality in Statistics Education Award winners are available to download online at

Information concerning nominations, criteria and previous winners can be viewed here:
and here:

7. News on “Causal Inference: A Primer”

Wiley, the publisher of our latest book “Causal Inference in Statistics: A Primer” (2016, Pearl, Glymour and Jewell) is informing us that the book is now in its 4th printing, corrected for all the errors we (and others) caught since the first publications. To buy a corrected copy, make sure you get the “4th “printing”. The trick is to look at the copyright page and make sure
the last line reads: 10 9 8 7 6 5 4

If you already have a copy, look up our errata page,
where all corrections are marked in red. The publisher also tells us the the Kindle version is much improved. I hope you concur.

Happy Summer-end, and may all your causes
produce healthy effects.

May 1, 2017

UAI 2017 Causality Workshop

Filed under: Announcement — Judea Pearl @ 8:35 pm

Dear friends in causality research,

We would like to promote an upcoming causality workshop at UAI 2017. See the details below for more information:

Causality in Learning, Inference, and Decision-making: Causality shapes how we view, understand, and react to the world around us. It’s a key ingredient in building AI systems that are autonomous and can act efficiently in complex and uncertain environments. It’s also important to the process of scientific discovery since it underpins how explanations are constructed and the scientific method.

Not surprisingly, the tasks of learning and reasoning with causal-effect relationships have attracted great interest in the artificial intelligence and machine learning communities. This effort has led to a very general theoretical and algorithmic understanding of what causality means and under what conditions it can be inferred. These results have started to percolate through more applied fields that generate the bulk of the data currently available, ranging from genetics to medicine, from psychology to economics.

This one-day workshop will explore causal inference in a broad sense through a set of invited talks,  open problems sessions, presentations, and a poster session. In this workshop, we will focus on the foundational side of causality on the one hand, and challenges presented by practical applications on the other. By and large, we welcome contributions from all areas relating to the study of causality.

We encourage co-submission of (full) papers that have been submitted to the main UAI 2017 conference. This workshop is a sequel to a successful predecessor at UAI 2016.

Dates/Locations: August 15, 2017; Sydney, Australia.

Speakers: TBA

Registration and additional information:

April 14, 2017

West Coast Experiments Conference, UCLA 2017

Filed under: Announcement — Judea Pearl @ 9:05 pm

Hello friends in causality research!

UCLA is proud to host the 2017 West Coast Experiments Conference. See the details below for more information:

West Coast Experiments Conference: The WCE is an annual conference that brings together leading scholars and graduate students in economics, political science and other social sciences who share an interest in causal identification broadly speaking. Now in its tenth year, the WCE is a venue for methodological instruction and debate over design-based and observational methods for causal inference, both theory and applications.

Speakers: Judea Pearl, Rosa Matzkin, Niall Cardin, Angus Deaton, Chris Auld, Jeff Wooldridge, Ed Leamer, Karim Chalak, Rodrigo Pinto, Clark Glymour, Elias Barenboim, Adam Glynn, and Karthika Mohan.

Dates/Location: The tenth annual West Coast Experiments Conference will be held at UCLA on Monday, April 24 and Tuesday, April 25, 2017, preceded by in-depth methods training workshops on Sunday, April 23. Events will be held in the Covel Commons Grand Horizon Ballroom, 200 De Neve Drive, Los Angeles, CA 90095.

Fees: Attendance is free!

Registration and details: Space is limited; for a detailed schedule of events and registration, please visit:

Next Page »

Powered by WordPress