Causal Analysis in Theory and Practice

August 2, 2017

2017 Mid-Summer Update

Filed under: Counterfactual,Discussion,Epidemiology — Judea Pearl @ 12:55 am

Dear friends in causality research,

Welcome to the 2017 Mid-summer greeting from the Ucla Causality Blog.

This greeting discusses the following topics:

1. “The Eight Pillars of Causal Wisdom” and the WCE 2017 Virtual Conference Website.
2. A discussion panel: “Advances in Deep Neural Networks”,
3. Comments on “The Tale Wagged by the DAG”,
4. A new book: “The book of Why”,
5. A new paper: Disjunctive Counterfactuals,
6. Causality in Education Award,
7. News on “Causal Inference: A  Primer”

1. “The Eight Pillars of Causal Wisdom”

The tenth annual West Coast Experiments Conference was held at UCLA on April 24-25, 2017, preceded by a training workshop  on April 23.

You will be pleased to know that the WCE 2017 Virtual Conference Website is now available here:
It provides videos of the talks as well as some of the papers and presentations.

The conference brought together scholars and graduate students in economics, political science and other social sciences who share an interest in causal analysis. Speakers included:

1. Angus Deaton, on Understanding and misunderstanding randomized controlled trials.
2. Chris Auld, on the on-going confusion between regression vs. structural equations in the econometric literature.
3. Clark Glymour, on Explanatory Research vs Confirmatory Research.
4. Elias Barenboim, on the solution to the External Validity problem.
5. Adam Glynn, on Front-door approaches to causal inference.
6. Karthika Mohan, on Missing Data from a causal modeling perspective.
7. Judea Pearl, on “The Eight Pillars of Causal Wisdom.”
8. Adnan Darwiche, on Model-based vs. Model-Blind Approaches to Artificial Intelligence.
9. Niall Cardin, Causal inference for machine learning.
10. Karim Chalak, Measurement Error without Exclusion.
11. Ed Leamer, “Causality Complexities Example: Supply and Demand.
12. Rosa Matzkin, “Identification is simultaneous equation.
13 Rodrigo Pinto, Randomized Biased-controlled Trials.

The video of my lecture “The Eight Pillars of Causal Wisdom” can be watched here:
A transcript of the talk can be found here:

2. “Advances in Deep Neural Networks”

As part of the its celebration of the 50 years of the Turing Award, the ACM has organized several discussion sessions on selected topics in computer science. I participated in a panel discussion on
“Advances in Deep Neural Networks”, which gave me an opportunity to share thoughts on whether learning methods based solely on data fitting can ever achieve a human-level intelligence. The discussion video can be viewed here:
A position paper that defends these thoughts is available here:

3. The Tale Wagged by the DAG

An article by this title, authored by Nancy Krieger and George Davey Smith has appeared in the International Journal of Epidemiology, IJE 2016 45(6) 1787-1808.
It is part of a special IJE issue on causal analysis which, for the reasons outlined below, should be of interest to readers of this blog.

As the title tell-tales us, the authors are unhappy with the direction that modern epidemiology has taken, which is too wedded to a two-language framework:
(1) Graphical models (DAGs) — to express what we know, and
(2) Counterfactuals (or potential outcomes) — to express what we wish to know.

The specific reasons for the authors unhappiness are still puzzling to me, because the article does not demonstrate concrete alternatives to current methodologies. I can only speculate however that it is the dazzling speed with which epidemiology has modernized its tools that lies behind the authors discomfort. If so, it would be safe for us to assume that the discomfort will subside as soon as researchers gain greater familiarity with the capabilities and flexibility of these new tools.  I nevertheless recommend that the article, and the entire special issue of IJE be studied by our readers, because they reflect an interesting soul-searching attempt by a forward-looking discipline to assess its progress in the wake of a profound paradigm shift.

Epidemiology, as I have written on several occasions, has been a pioneer in accepting the DAG-counterfactuals symbiosis as a ruling paradigm — way ahead of mainstream statistics and its other satellites. (The social sciences, for example, are almost there, with the exception of the model-blind branch of econometrics. See Feb. 22 2017 posting)

In examining the specific limitations that Krieger and Davey Smith perceive in DAGs, readers will be amused to note that these limitations coincide precisely with the strengths for which DAGs are praised.

For example, the article complains that DAGs provide no information about variables that investigators chose not to include in the model.  In their words: “the DAG does not provide a comprehensive picture. For example, it does not include paternal factors, ethnicity, respiratory infections or socioeconomic position…” (taken from the Editorial introduction). I have never considered this to be a limitation of DAGs or of any other scientific modelling. Quite the contrary. It would be a disaster if models were permitted to provide information unintended by the modeller. Instead, I have learned to admire the ease with which DAGs enable researchers to incorporate knowledge about new variables, or new mechanisms, which the modeller wishes
to embrace.

Model misspecification, after all,  is a problem that plagues every  exercise in causal inference, no matter what framework one chooses to adapt. It can only be cured by careful model-building
strategies, and by enhancing the modeller’s knowledge. Yet, when it comes to minimizing misspecification errors, DAGS have no match. The transparency with which DAGs display the causal assumptions in the model, and the ease with which the DAG identifies the testable implications of those assumptions are incomparable; these facilitate speedy model diagnosis and repair with no match in sight.

Or, to take another example, the authors call repeatedly for an ostensibly unavailable methodology which they label “causal triangulation” (it appears 19 times in the article). In their words: “In our field, involving dynamic populations of people in dynamic societies and ecosystems, methodical triangulation of diverse types of evidence from diverse types of study settings and involving diverse populations is essential.”  Ironically, however, the task of treating “diverse type of evidence from diverse populations” has been accomplished quite successfully in the dag-counterfactual framework. See, for example the formal and complete results of (Bareinbaum and Pearl, 2016, which have emerged from DAG-based perspective and invoke the do-calculus. (See also is inconceivable for me to imagine anyone pooling data from two different designs (say
experimental and observational) without resorting to DAGs or (equivalently) potential outcomes, I am open to learn.

Another conceptual paradigm which the authors hope would liberate us from the tyranny of DAGs and counterfactuals is Lipton’s (2004) romantic aspiration for “Inference to the Best Explanation.” It is a compelling, century old mantra, going back at least to Charles Pierce theory of abduction (Pragmatism and Pragmaticism, 1870) which, unfortunately, has never operationalized its key terms: “explanation,” “Best” and “inference to”.  Again, I know of only one framework in which this aspiration has been explicated with sufficient precision to produce tangible results — it is the structural framework of DAGs and counterfactuals. See, for example, Causes of Effects and Effects of Causes”
and Halpern and Pearl (2005) “Causes and explanations: A structural-model approach”

In summary, what Krieger and Davey Smith aspire to achieve by abandoning the structural framework has already been accomplished with the help and grace of that very framework.
More generally, what we learn from these examples is that the DAG-counterfactual symbiosis is far from being a narrow “ONE approach to causal inference” which ” may potentially lead to spurious causal inference” (their words). It is in fact a broad and flexible framework within which a plurality of tasks and aspirations can be formulated, analyzed and implemented. The quest for metaphysical alternatives is not warranted.

I was pleased to note that, by and large, commentators on Krieger and Davey Smith paper seemed to be aware of the powers and generality of the DAG-counterfactual framework, albeit not exactly for the reasons that I have described here. [footnote: I have many disagreements with the other commentators as well, but I wish to focus here on the TALE WAGGED DAG where the problems appear more glaring.] My talk on “The Eight Pillars of Causal Wisdom” provides a concise summary of those reasons and explains why I take the poetic liberty of calling these pillars “The Causal Revolution”

All in all, I believe that epidemiologists should be commended for the incredible progress they have made in the past two decades. They will no doubt continue to develop and benefit from the new tools that the DAG-counterfactual symbiosis has spawn. At the same time, I hope that the discomfort that Krieger and Davey Smith’s have expressed will be temporary and that it will inspire a greater understanding of the modern tools of causal inference.

Comments on this special issue of IJE are invited on this blog.

4. The Book of WHY

As some of you know, I am co-authoring another book, titled: “The Book of Why: The new science of cause and effect”. It will attempt to present the eight pillars of causal wisdom to the general public using words, intuition and examples to replace equations. My co-author is science writer Dana MacKenzie ( and our publishing house is Basic Books. If all goes well, the book will see your shelf by March 2018. Selected sections will appear periodically on this blog.

5. Disjunctive Counterfactuals

The structural interpretation of counterfactuals as formulated in Balke and Pearl (1994) excludes  disjunctive conditionals, such as “had X been x1 or x2”, as well as disjunctive actions such as do(X=x1 or X=x2).  In contrast, the closest-world interpretation of Lewis ( 1973) assigns truth values to all counterfactual sentences, regardless of the logical form of the antecedant. The next issue of the Journal of Causal Inference will include a paper that extends the vocabulary of structural counterfactuals with disjunctions, and clarifies the assumptions needed for the extension. An advance copy can be viewed here:

6.  ASA Causality in Statistics Education Award

Congratulations go to Ilya Shpitser, Professor of Computer Science at Johns Hopkins University, who is the 2017 recipient of the ASA Causality in Statistics Education Award.  Funded by Microsoft Research and Google, the $5,000 Award, will be presented to Shpitser at the 2017 Joint Statistical Meetings (JSM 2017) in Baltimore.

Professor Shpitser has developed Masters level graduate course material that takes causal inference from the ivory towers of research to the level of students with a machine learning and data science background. It combines techniques of graphical and counterfactual models and provides both an accessible coverage of the field and excellent conceptual, computational and project-oriented exercises for students.

These winning materials and those of the previous Causality in Statistics Education Award winners are available to download online at

Information concerning nominations, criteria and previous winners can be viewed here:
and here:

7. News on “Causal Inference: A Primer”

Wiley, the publisher of our latest book “Causal Inference in Statistics: A Primer” (2016, Pearl, Glymour and Jewell) is informing us that the book is now in its 4th printing, corrected for all the errors we (and others) caught since the first publications. To buy a corrected copy, make sure you get the “4th “printing”. The trick is to look at the copyright page and make sure
the last line reads: 10 9 8 7 6 5 4

If you already have a copy, look up our errata page,
where all corrections are marked in red. The publisher also tells us the the Kindle version is much improved. I hope you concur.

Happy Summer-end, and may all your causes
produce healthy effects.

May 1, 2017

UAI 2017 Causality Workshop

Filed under: Announcement — Judea Pearl @ 8:35 pm

Dear friends in causality research,

We would like to promote an upcoming causality workshop at UAI 2017. See the details below for more information:

Causality in Learning, Inference, and Decision-making: Causality shapes how we view, understand, and react to the world around us. It’s a key ingredient in building AI systems that are autonomous and can act efficiently in complex and uncertain environments. It’s also important to the process of scientific discovery since it underpins how explanations are constructed and the scientific method.

Not surprisingly, the tasks of learning and reasoning with causal-effect relationships have attracted great interest in the artificial intelligence and machine learning communities. This effort has led to a very general theoretical and algorithmic understanding of what causality means and under what conditions it can be inferred. These results have started to percolate through more applied fields that generate the bulk of the data currently available, ranging from genetics to medicine, from psychology to economics.

This one-day workshop will explore causal inference in a broad sense through a set of invited talks,  open problems sessions, presentations, and a poster session. In this workshop, we will focus on the foundational side of causality on the one hand, and challenges presented by practical applications on the other. By and large, we welcome contributions from all areas relating to the study of causality.

We encourage co-submission of (full) papers that have been submitted to the main UAI 2017 conference. This workshop is a sequel to a successful predecessor at UAI 2016.

Dates/Locations: August 15, 2017; Sydney, Australia.

Speakers: TBA

Registration and additional information:

April 14, 2017

West Coast Experiments Conference, UCLA 2017

Filed under: Announcement — Judea Pearl @ 9:05 pm

Hello friends in causality research!

UCLA is proud to host the 2017 West Coast Experiments Conference. See the details below for more information:

West Coast Experiments Conference: The WCE is an annual conference that brings together leading scholars and graduate students in economics, political science and other social sciences who share an interest in causal identification broadly speaking. Now in its tenth year, the WCE is a venue for methodological instruction and debate over design-based and observational methods for causal inference, both theory and applications.

Speakers: Judea Pearl, Rosa Matzkin, Niall Cardin, Angus Deaton, Chris Auld, Jeff Wooldridge, Ed Leamer, Karim Chalak, Rodrigo Pinto, Clark Glymour, Elias Barenboim, Adam Glynn, and Karthika Mohan.

Dates/Location: The tenth annual West Coast Experiments Conference will be held at UCLA on Monday, April 24 and Tuesday, April 25, 2017, preceded by in-depth methods training workshops on Sunday, April 23. Events will be held in the Covel Commons Grand Horizon Ballroom, 200 De Neve Drive, Los Angeles, CA 90095.

Fees: Attendance is free!

Registration and details: Space is limited; for a detailed schedule of events and registration, please visit:

April 13, 2017

Causal Inference with Directed Graphs – Seminar

Filed under: Announcement — Judea Pearl @ 5:27 am


We would like to promote another causal inference short course. This 2-day seminar won the 2013 Causality in Statistics Education Award, given by the American Statistical Association. See the details below for more information:

Causal Inference with Directed Graphs: This seminar offers an applied introduction to directed acyclic graphs (DAGs) for causal inference. DAGs are a powerful new tool for understanding and resolving causal problems in empirical research. DAGs are useful for social and biomedical researchers, business and policy analysts who want to draw causal inferences from non-experimental data. The chief advantage of DAGs is that they are “algebra-free,” relying instead on intuitive yet rigorous graphical rules.

Instructor: Felix Elwert, Ph.D.

Who should attend: If you want to understand under what circumstances you can draw causal inferences from non-experimental data, this course is for you. Participants should have a good working knowledge of multiple regression and basic concepts of probability. Some prior exposure to causal inference (counterfactuals, propensity scores, instrumental variables analysis) will be helpful but is not essential.

Tuition: The fee of $995.00 includes all seminar materials.

Date/Location: The seminar meets Friday, April 28 and Saturday, April 29 at Temple University Center City, 1515 Market Street, Philadelphia, PA 19103.

Details and registration:

April 8, 2017

Causal Inference Short Course at Harvard

Filed under: Announcement — Judea Pearl @ 2:31 am


We’ve received news that Harvard is offering a short course on causal inference that may be of interest to readers of this blog. See the details below for more information:

An Introduction to Causal Inference: This 5-day course introduces concepts and methods for causal inference from observational data. Upon completion of the course, participants will be prepared to further explore the causal inference literature. Topics covered include the g-formula, inverse probability weighting of marginal structural models, g-estimation of structural nested models, causal mediation analysis, and methods to handle unmeasured confounding. The last day will end with a “capstone” open Q&A session.

Instructors: Miguel Hernán, Judith Lok, James Robins, Eric Tchetgen Tchetgen & Tyler VanderWeele

Prerequisites: Participants are expected to be familiar with basic concepts in epidemiology and biostatistics, including linear and logistic regression and survival analysis techniques.

Tuition: $450/person, to be paid at the time of registration. Tuition will be waived for up to 2 students with primary affiliation at an institution in a developing country.

Date/Location: June 12-16, 2017 at the Harvard T.H. Chan School of Public Health

Details and registration:

February 22, 2017

Winter-2017 Greeting from UCLA Causality Blog

Filed under: Announcement,Causal Effect,Economics,Linear Systems — bryantc @ 6:03 pm

Dear friends in causality research,

In this brief greeting I would like to first call attention to an approaching deadline and then discuss a couple of recent articles.

Causality in Education Award – March 1, 2017

We are informed that the deadline for submitting a nomination for the ASA Causality in Statistics Education Award is March 1, 2017. For purpose, criteria and other information please see .

The next issue of the Journal of Causal Inference (JCI) is schedule to appear March, 2017. See

MY contribution to this issue includes a tutorial paper entitled: “A Linear ‘Microscope’ for Interventions and Counterfactuals”. An advance copy can be viewed here:

Overturning Econometrics Education (or, do we need a “causal interpretation”?)

My attention was called to a recent paper by Josh Angrist and Jorn-Steffen Pischke titled: “Undergraduate econometrics instruction” (A NBER working paper)

This paper advocates a pedagogical paradigm shift that has methodological ramifications beyond econometrics instruction; As I understand it, the shift stands contrary to the traditional teachings of causal inference, as defined by Sewall Wright (1920), Haavelmo (1943), Marschak (1950), Wold (1960), and other founding fathers of econometrics methodology.

In a nut shell, Angrist and Pischke  start with a set of favorite statistical routines such as IV, regression, differences-in-differences among others, and then search for “a set of control variables needed to insure that the regression-estimated effect of the variable of interest has a causal interpretation”. Traditional causal inference (including economics) teaches us that asking whether the output of a statistical routine “has a causal interpretation” is the wrong question to ask, for it misses the direction of the analysis. Instead, one should start with the target causal parameter itself, and asks whether it is ESTIMABLE (and if so how), be it by IV, regression, differences-in-differences, or perhaps by some new routine that is yet to be discovered and ordained by name. Clearly, no “causal interpretation” is needed for parameters that are intrinsically causal; for example, “causal effect”, “path coefficient”, “direct effect”, “effect of treatment on the treated”, or “probability of causation”.

In practical terms, the difference between the two paradigms is that estimability requires a substantive model while interpretability appears to be model-free. A model exposes its assumptions explicitly, while statistical routines give the deceptive impression that they run assumptions-free (hence their popular appeal). The former lends itself to judgmental and statistical tests, the latter escapes such scrutiny.

In conclusion, if an educator needs to choose between the “interpretability” and “estimability” paradigms, I would go for the latter. If traditional econometrics education
is tailored to support the estimability track, I do not believe a paradigm shift is warranted towards an “interpretation seeking” paradigm as the one proposed by Angrist and Pischke,

I would gladly open this blog for additional discussion on this topic.

I tried to post a comment on NBER (National Bureau of Economic Research), but was rejected for not being an approved “NBER family member”. If any of our readers is a “”NBER family member” feel free to post the above. Note: “NBER working papers are circulated for discussion and comment purposes.” (page 1).

September 15, 2016

Summer-end Greeting from the UCLA Causality Blog

Filed under: Uncategorized — bryantc @ 4:39 am

Dear friends in causality research,
This greeting from UCLA Causality blog contains news and discussion on the following topics:

1. Reflections on 2016 JSM meeting.
2. The question of equivalent representations.
3. Simpson’s Paradox (Comments on four recent papers)
4. News concerning Causal Inference Primer
5. New books, blogs and other frills.

1. Reflections on JSM-2016
For those who missed the JSM 2016 meeting, my tutorial slides can be viewed here:

As you can see, I argue that current progress in causal inference should be viewed as a major paradigm shift in the history of statistics and, accordingly, nuances and disagreements are merely linguistic realignments within a unified framework. To support this view, I chose for discussion six specific achievements (called GEMS) that should make anyone connected with causal analysis proud, empowered, and mighty motivated.

The six gems are:
1. Policy Evaluation (Estimating “Treatment Effects”)
2. Attribution Analysis (Causes of Effects)
3. Mediation Analysis (Estimating Direct and Indirect Effects)
4. Generalizability (Establishing External Validity)
5. Coping with Selection Bias
6. Recovering from Missing Data

I hope you enjoy the slides and appreciate the gems.

2. The question of equivalent representations
One challenging question that came up from the audience at JSM concerned the unification of the graphical and potential-outcome frameworks. “How can two logically equivalent representations be so different in actual use?”. I elaborate on this question in a separate post titled “Logically equivalent yet way too different.”

3. Simpson’s Paradox: The riddle that would not die
(Comments on four recent papers)
If you search Google for “Simpson’s paradox”, as I did yesterday, you would get 111,000 results, more than any other statistical paradox that I could name. What elevates this innocent reversal of associations to “paradoxical” status, and why it has captured the fascination of statisticians, mathematicians and philosophers for over a century are questions that we discussed at length on this (and other) blogs. The reason I am back to this topic is the publication of four recent papers that give us a panoramic view at how the understanding of causal reasoning has progressed in communities that do not usually participate in our discussions.

4. News concerning Causal Inference – A Primer
We are grateful to Jim Grace for his in-depth review on Amazon:

For those of you awaiting the solutions to the study questions in the Primer, I am informed that the Solution Manual is now available (to instructors) from Wiley. To obtain a copy, see page 2 of: However, rumor has it that a quicker way to get it is through your local Wiley representative, at

If you encounter difficulties, please contact us at and we will try to help. Readers tell me that the solutions are more enlightening than the text. I am not surprised, there is nothing more invigorating than seeing a non-trivial problem solved from A to Z.

5. New books, blogs and other frills
We are informed that a new book by Joseph Halpern, titled “Actual Causality”, is available now from MIT Press. ( Readers familiar with Halpern’s fundamental contributions to causal reasoning will not be surprised to find here a fresh and comprehensive solution to the age-old problem of actual causality. Not to be missed.

Adam Kelleher writes about an interesting math-club and causal-minded blog that he is orchestrating. See his post,

Glenn Shafer just published a review paper: “A Mathematical Theory of Evidence turn 40” celebrating the 40th anniversary of the publication of his 1976 book “A Mathematical Theory of Evidence” I have enjoyed reading this article for nostalgic reasons, reminding me of the stormy days in the 1980’s, when everyone was arguing for another calculus of evidential reasoning. My last contribution to that storm, just before sailing off to causality land, was this paper: Section 10 of Shafer’s article deals with his 1996 book “The Art of Causal Conjecture” My thought: Now, that the causal inference field has matured, perhaps it is time to take another look at the way Shafer views causation.

Wishing you a super productive Fall season.

J. Pearl

September 12, 2016

Logically equivalent yet way too different

Filed under: Uncategorized — bryantc @ 2:50 am

Contributor: Judea Pearl

In comparing the tradeoffs between the structural and potential outcome frameworks, I often state that the two are logically equivalent yet poles apart in terms of transparency and computational efficiency. (See Slide #34 of the JSM tutorial). Indeed, anyone who examines how the two frameworks solve a specific problem from begining to end (See, e.g., Slides #35-36 ) would find the differences astonishing.

The question naturally arises: How can two equivalent frameworks differ so substantially in actual use.

The answer is that epistemic equivalence does not mean representational equivalence. Two representations of the same information may highlight different aspects of the problem and thus differ substantially in how easy it is to solve a given problem.  This is a recurrent theme in complexity analysis, but is not generally appreciated outside computer science. We saw it in our discussions with Guido Imbens who could not accept the fact that the use of graphical models is a mathematical necessity not just a matter of taste. (

The examples usually cited in complexity analysis are combinatorial problems whose solution times depend critically on the initial representation. I hesitated from bringing up these examples, fearing that they will not be compelling to readers on this blog who are more familiar with classical mathematics.

Last week I stumbled upon a very simple example that demonstrates representational differences in no ambiguous terms; I would like to share it with readers.

Consider the age-old problem of finding a solution to an algebraic equation, say
y(x) = x3 + ax2 + bx + c = 0

This is a tough problem for those of us who do not remember Tartalia’s solution of the cubic.  (It can be made much tougher once we go to quintic equation.)

But there are many syntactic ways of representing the same function y(x) . Here is one equivalent representation:
y(x) = x(x2+ax) + b(x+c/b) = 0
and here is another:
y(x) = (x-x1)(x-x2)(x-x3) = 0,
where x1, x2, and x3 are some functions of a, b, c.

The last representation permits an immediate solution, which is:
x=x1, x=x2, x=x3.

The example may appear trivial, and some may even call it cheating, saying that finding x1, x2, and x3 is as hard as solving the original problem. This is true, but the purpose of the example was not to produce an easy solution to the cubic. The purpose was to demonstrate that different syntactic ways of representing the same information (i.e., the same polynomial) may lead to substantial differences in the complexity of computing an answer to a query (i.e., find a root).

A preferred representation is one that makes certain desirable aspects of the problem explicit, thus facilitating a speedy solution. Complexity theory is full of such examples.

Note that the complexity is query-dependent. Had our goal been to find a value x that makes the polynomial y(x) equal 4, not zero, the representation above y(x) = (x-x1)(x-x2)(x-x3) would offer no help at all. For this query, the representation
y(x) = (x-z1)(x-z2)(x-z3) + 4  
would yield an immediate solution
x=z1, x=z2, x=z3,
where z1, z2, and z3 are the roots of another polynomial:
x3 + ax2 + bx + (c-4) = 0

This simple example demonstrates nicely the principle that makes graphical models more efficient than alternative representations of the same causal information, say a set of ignorability assumptions. What makes graphical models efficient is the fact that they make explicit the logical ramifications of the conditional-independencies conveyed by the model. Deriving those ramifications by algebraic or logical means takes substantially more work. (See for the logic of counterfactual independencies)

A typical example of how nasty such derivations can get is given in Heckman and Pinto’s paper on “Causal Inference after Haavelmo” (Econometric Theory, 2015). Determined to avoid graphs at all cost, Heckman and Pinto derived conditional independence relations directly from Dawid’s axioms and the Markov condition (See The results are pages upon pages of derivations of independencies that are displayed explicitly in the graph.

Of course, this and other difficulties will not dissuade econometricians to use graphs; that would rake a scientific revolution of Kuhnian proportions. (see Still, awareness of these complexity issues should give inquisitive students the ammunition to hasten the revolution and equip econometrics with modern tools of causal analysis.

They eventually will.

September 11, 2016

An interesting math and causality-minded club

Filed under: Announcement — bryantc @ 6:08 pm

from Adam Kelleher:

The math and algorithm reading group ( is based in NYC, and was founded when I moved here three years ago. It’s a very casual group that grew out of a reading group I was in during graduate school. Some friends who were math graduate students were interested in learning more about general relativity, and I (a physicist) was interested in learning more math. Together, we read about differential geometry, with the goal of bringing our knowledge together. We reasoned that we could learn more as a group, by pooling our different perspectives and experience, than we could individually. That’s the core motivation of our reading group: not only are we there to help resolve each other get through the material if anyone gets stuck, but we’re also there to add what else we know (in the format of a group discussion) to the content of the material.

We’re currently reading Causality cover to cover. We’ve paused to implement some of the algorithms, and plan on pausing again soon for a review session. We intend to do a “hacking session”, to try our hands at causal inference and analysis on some open data sets.

Inspired by reading Causality, and realizing that the best open implementations of causal inference were packaged in the (old, relatively inaccessible) Tetrad package, I’ve started a modern implementation of some tools for causal inference and analysis in the causality package in Python. It’s on pypi (pip install causality, or check the tutorial on, but it’s still a work in progress. The IC* algorithm is implemented, along with a small suite of conditional independence tests. I’m adding some classic methods for causal inference and causal effects estimation, aimed at making the package more general-purpose. I invite new contributions to help build out the package. Just open an issue, and label it an “enhancement” to kick of the discussion!

Finally, to make all of the work more accessible to people without more advanced math background, I’ve been writing a series of blog posts aimed at introducing anyone with an intermediate background in probability and statistics to the material in Causality! It’s aimed especially at practitioners, like data scientists. The hope is that more people, managers included (the intended audience for the first 3 posts), will understand the issues that come up when you’re not thinking causally. I’d especially recommend the article about understanding bias, but the whole series (still in progress) is indexed here:

August 24, 2016

Simpson’s Paradox: The riddle that would not die. (Comments on four recent papers)

Filed under: Simpson's Paradox — bryantc @ 12:06 am

Contributor: Judea Pearl

If you search Google for “Simpson’s paradox,” as I did yesterday, you will get 111,000 results, more than any other statistical paradox that I could name. What elevates this innocent reversal of association to “paradoxical” status, and why it has captured the fascination of statisticians, mathematicians and philosophers for over a century are questions that we discussed at length on this (and other) blogs. The reason I am back to this topic is the publication of four recent papers that give us a panoramic view at how the understanding of causal reasoning has progressed in communities that do not usually participate in our discussions.

As readers of this blog recall, I have been trying since the publication of Causality (2000) to convince statisticians, philosophers and other scientific communities that Simpson’s paradox is: (1) a product of wrongly applied causal principles, and (2) that it can be fully resolved using modern tools of causal inference.

The four papers to be discussed do not fully agree with the proposed resolution.

To reiterate my position, Simpson’s paradox is (quoting Lord Russell) “another relic of a bygone age,” an age when we believed that every peculiarity in the data can be understood and resolved by statistical means. Ironically, Simpson’s paradox has actually become an educational tool for demonstrating the limits of statistical methods, and why causal, rather than statistical considerations are necessary to avoid paradoxical interpretations of data. For example, our recent book Causal Inference in Statistics: A Primer, uses Simpson’s paradox at the very beginning (Section 1.1), to show students the inevitability of causal thinking and the futility of trying to interpret data using statistical tools alone. See

Thus, my interest in the four recent articles stems primarily from curiosity to gauge the penetration of causal ideas into communities that were not intimately involved in the development of graphical or counterfactual models. Discussions of Simpson’s paradox provide a sensitive litmus test to measure the acceptance of modern causal thinking. “Talk to me about Simpson,” I often say to friendly colleagues, “and I will tell you how far you are on the causal trail.” (Unfriendly colleagues balk at the idea that there is a trail they might have missed.)

The four papers for discussion are the following:

Malinas, G. and Bigelow, J. “Simpson’s Paradox,” The Stanford Encyclopedia of Philosophy (Summer 2016 Edition), Edward N. Zalta (ed.), URL = <>.

Spanos, A., “Revisiting Simpson’s Paradox: a statistical misspecification perspective,” ResearchGate Article, <>, online May 2016.

Memetea, S. “Simpson’s Paradox in Epistemology and Decision Theory,” The University of British Columbia (Vancouver), Department of Philosophy, Ph.D. Thesis, May 2015.

Bandyopadhyay, P.S., Raghavan, R.V., Deruz, D.W., and Brittan, Jr., G. “Truths about Simpson’s Paradox Saving the Paradox from Falsity,” in Mohua Banerjee and Shankara Narayanan Krishna (Eds.), Logic and Its Applications, Proceedings of the 6th Indian Conference ICLA 2015 , LNCS 8923, Berlin Heidelberg: Springer-Verlag, pp. 58-73, 2015 .

——————- Discussion ——————-

1. Molina and Bigelow 2016 (MB)

I will start the discussion with Molina and Bigelow 2016 (MB) because the Stanford Encyclopedia of Philosophy enjoys both high visibility and an aura of authority. MB’s new entry is a welcome revision of their previous article (2004) on “Simpson’s Paradox,” which was written almost entirely from the perspective of “probabilistic causality,” echoing Reichenbach, Suppes, Cartwright, Good, Hesslow, Eells, to cite a few.

Whereas the previous version characterizes Simpson’s reversal as “A Logically Benign, empirically Treacherous Hydra,” the new version dwarfs the dangers of that Hydra and correctly states that Simpson’s paradox poses problem only for “philosophical programs that aim to eliminate or reduce causation to regularities and relations between probabilities.” Now, since the “probabilistic causality” program is fairly much abandoned in the past two decades, we can safely conclude that Simpson’s reversal poses no problem to us mortals. This is reassuring.

MB also acknowledge the role that graphical tools play in deciding whether one should base a decision on the aggregate population or on the partitioned subpopulations, and in testing one’s hypothesized model.

My only disagreement with the MB’s article is that it does not go all the way towards divorcing the discussion from the molds, notation and examples of the “probabilistic causation” era and, naturally, proclaim the paradox “resolved.” By shunning modern notation like do(x), Yx, or their equivalent, the article gives the impression that Bayesian conditionalization, as in P(y|x), is still adequate for discussing Simpson’s paradox, its ramifications and its resolution. It is not.

In particular, this notational orthodoxy makes the discussion of the Sure Thing Principle (STP) incomprehensible and obscures the reason why Simpson’s reversal does not constitute a counter example to STP. Specifically, it does not tell readers that causal independence is a necessary condition for the validity of the STP, (i.e., actions should not change the size of the subpopulations) and this independence is violated in the counterexample that Blyth contrived in 1972. (See

I will end with a humble recommendation to the editors of the Stanford Encyclopedia of Philosophy. Articles concerning causation should be written in a language that permits authors to distinguish causal from statistical dependence. I am sure future authors in this series would enjoy the freedom of saying “treatment does not change gender,” something they cannot say today, using Bayesian conditionalization. However, they will not do so on their own, unless you tell them (and their reviewers) explicitly that it is ok nowadays to deviate from the language of Reichenbach and Suppes and formally state: P(gender|do(treatment)) = P(gender).

Editorial guidance can play an incalculable role in the progress of science.

2. Comments on Spanos (2016)

In 1988, the British econometrician John Denis Sargan gave the following definition of an “economic model”: “A model is the specification of the probability distribution for a set of observations. A structure is the specification of the parameters of that distribution.” (Lectures on Advanced Econometric Theory (1988, p.27))

This definition, still cited in advanced econometric books (e.g., Cameron and Trivdi (2009) Microeconometrics) has served as a credo to a school of economics that has never elevated itself from the data-first paradigm of statistical thinking. Other prominent leaders of this school include Sir David Hendry, who wrote: “The joint density is the basis: SEMs (Structural Equation Models) are merely an interpretation of that.” Members of this school are unable to internalize the hard fact that statistics, however refined, cannot provide the information that economic models must encode to be of use to policy making. For them, a model is just a compact encoding of the density function underlying the data, so, two models encoding the same density function are deemed interchangeable.

Spanos article is a vivid example of how this statistics-minded culture copes with causal problems. Naturally, Spanos attributes the peculiarities of Simpson’s reversal to what he calls “statistical misspecification,” not to causal shortsightedness. “Causal” relationships do not exist in the models of Sargan’s school, so, if anything goes wrong, it must be “statistical misspecification,” what else? But what is this “statistical misspecification” that Spanos hopes would allow him to distinguish valid from invalid inference? I have read the paper several times, and for the life of me, it is beyond my ability to explain how the conditions that Spanos posits as necessary for “statistical adequacy” have anything to do with Simpson’s paradox. Specifically, I cannot see how “misspecified” data, which wrongly claims: “good for men, good for women, bad for people” suddenly becomes “well-specified” when we replace “gender” with “blood pressure”.

Spanos’ conditions for “statistical adequacy” are formulated in the context of the Linear Regression Model and invoke strictly statistical notions such as normality, linearity, independence etc. None of them applies to the binary case of {treatment, gender, outcome} in which Simpson’s paradox is usually cast. I therefore fail to see why replacing “gender” with “blood pressure” would turn an association from “spurious” to “trustworthy”.

Perhaps one of our readers can illuminate the rest of us how to interpret this new proposal. I am at a total loss.

For fairness, I should add that most economists that I know have second thoughts about Sargan’s definition, and claim to understand the distinction between structural and statistical models. This distinction, unfortunately, is still badly missing from econometric textbooks, see I am sure it will get there some day; Lady Science is forgiving, but what about economics students?

3. Memetea (2015)

Among the four papers under consideration, the one by Memetea, is by far the most advanced, comprehensive and forward thinking. As a thesis written in a philosophy department, Memetea treatise is unique in that it makes a serious and successful effort to break away from the cocoon of “probabilistic causality” and examines Simpson’s paradox to the light of modern causal inference, including graphical models, do-calculus, and counterfactual theories.

Memetea agrees with our view that the paradox is causal in nature, and that the tools of modern causal analysis are essential for its resolution. She disagrees however with my provocative claim that the paradox is “fully resolved”. The areas where she finds the resolution wanting are mediation cases in which the direct effect (DE) differs in sign from the total effect (TE). The classical example of such cases (Hesslow 1976) tells of a birth control pill that is suspected of producing thrombosis in women and, at the same time, has a negative indirect effect on thrombosis by reducing the rate of pregnancies (pregnancy is known to encourage thrombosis).

I have always argued that Hesslow’s example has nothing to do with Simpson’s paradox because it compares apples and oranges, namely, it compare direct vs. total effects where reversals are commonplace. In other words, Simpson’s reversal evokes no surprise in such cases. For example, I wrote, “we are not at all surprised when smallpox inoculation carries risks of fatal reaction, yet reduces overall mortality by irradicating smallpox. The direct effect (fatal reaction) in this case is negative for every subpopulation, yet the total effect (on mortality) is positive for the population as a whole.” (Quoted from When a conflict arises between the direct and total effects, the investigator need only decide what research question represents the practical aspects of the case in question and, once this is done, the appropriate graphical tools should be invoked to properly assess DE or TE. [Recall, complete algorithms are available for both, going beyond simple adjustment, and extending to other counterfactually defined effects (e.g., ETT, causes-of-effect, and more).]

Memetea is not satisfied with this answer. Her condition for resolving Simpson’s paradox requires that the analyst be told whether it is the direct or the total effect that should be the target of investigation. This would require, of course, that the model includes information about the investigator’s ultimate aims, whether alternative interventions are available (e.g. to prevent pregnancy), whether the study result will be used by a policy maker or a curious scientist, whether legal restrictions (e.g., on sex discrimination) apply to the direct or the total effect, and so on. In short, the entire spectrum of scientific and social knowledge should enter into the causal model before we can determine, in any given scenario, whether it is the direct or indirect effect that warrants our attention.

This is a rather tall order to satisfy given that our investigators are fairly good in determining what their research problem is. It should perhaps serve as a realizable goal for artificial intelligence researchers among us, who aim to build an automated scientist some day, capable of reasoning like our best investigators. I do not believe though that we need to wait for that day to declare Simpson’s paradox “resolved”. Alternatively, we can declare it resolved modulo the ability of investigators to define their research problems.

4. Comments on Bandyopadhyay, etal (2015)

There are several motivations behind the resistance to characterize Simpson’s paradox as a causal phenomenon. Some resist because causal relationships are not part of their scientific vocabulary, and some because they think they have discovered a more cogent explanation, which is perhaps easier to demonstrate or communicate.

Spanos’s article represents the first group, while Bandyopadhyay etal’s represents the second. They simulated Simpson’s reversal using urns and balls and argued that, since there are no interventions involved in this setting, merely judgment of conditional probabilities, the fact that people tend to make wrong judgments in this setting proves that Simpson’s surprise is rooted in arithmetic illusion, not in causal misinterpretation.

I have countered this argument in and I think it is appropriate to repeat the argument here.

“In explaining the surprise, we must first distinguish between ‘Simpson’s reversal’ and ‘Simpson’s paradox’; the former being an arithmetic phenomenon in the calculus of proportions, the latter a psychological phenomenon that evokes surprise and disbelief. A full understanding of Simpson’s paradox should explain why an innocent arithmetic reversal of an association, albeit uncommon, came to be regarded as `paradoxical,’ and why it has captured the fascination of statisticians, mathematicians and philosophers for over a century (though it was first labeled ‘paradox’ by Blyth (1972)) .

“The arithmetics of proportions has its share of peculiarities, no doubt, but these tend to become objects of curiosity once they have been demonstrated and explained away by examples. For instance, naive students of probability may expect the average of a product to equal the product of the averages but quickly learn to guard against such expectations, given a few counterexamples. Likewise, students expect an association measured in a mixture distribution to equal a weighted average of the individual associations. They are surprised, therefore, when ratios of sums, (a+b)/(c+d), are found to be ordered differently than individual ratios, a/c and b/d.1 Again, such arithmetic peculiarities are quickly accommodated by seasoned students as reminders against simplistic reasoning.

“In contrast, an arithmetic peculiarity becomes ‘paradoxical’ when it clashes with deeply held convictions that the peculiarity is impossible, and this occurs when one takes seriously the causal implications of Simpson’s reversal in decision-making contexts.  Reversals are indeed impossible whenever the third variable, say age or gender, stands for a pre-treatment covariate because, so the reasoning goes, no drug can be harmful to both males and females yet beneficial to the population as a whole. The universality of this intuition reflects a deeply held and valid conviction that such a drug is physically impossible.  Remarkably, such impossibility can be derived mathematically in the calculus of causation in the form of a ‘sure-thing’ theorem (Pearl, 2009, p. 181):

‘An action A that increases the probability of an event in each subpopulation (of C) must also increase the probability of B in the population as a whole, provided that the action does not change the distribution of the subpopulations.’2

“Thus, regardless of whether effect size is measured by the odds ratio or other comparisons, regardless of whether  is a confounder or not, and regardless of whether we have the correct causal structure on hand, our intuition should be offended by any effect reversal that appears to accompany the aggregation of data.

“I am not aware of another condition that rules out effect reversal with comparable assertiveness and generality, requiring only that Z not be affected by our action, a requirement satisfied by all treatment-independent covariates Z. Thus, it is hard, if not impossible, to explain the surprise part of Simpson’s reversal without postulating that human intuition is governed by causal calculus together with a persistent tendency to attribute causal interpretation to statistical associations.”

1. In Simpson’s paradox we witness the simultaneous orderings: (a1+b1)/(c1+d1)(a2+b2)/(c2+d2), (a1/c1)< (a2/c2), and (b1/d1)< (b2/d2)
2. The no-change provision is probabilistic; it permits the action to change the classification of individual units so long as the relative sizes of the subpopulations remain unaltered.

Final Remarks

I used to be extremely impatient with the slow pace in which causal ideas have been penetrating scientific communities that are not used to talk cause-and-effect. Recently, however, I re-read Thomas Kuhn’ classic The Structure of Schientific Revolution and I found there a quote that made me calm, content, even humorous and hopeful. Here it is:

—————- Kuhn —————-

“The transfer of allegiance from paradigm to paradigm is a conversion experience that cannot be forced. Lifelong resistance, particularly from those whose productive careers have committed them to an older tradition of normal science, is not a violation of scientific standards but an index to the nature of scientific research itself.”
p. 151

“Conversions will occur a few at a time until, after the last holdouts have died, the whole profession will again be practicing under a single, but now a different, paradigm.”
p. 152

We are now seeing the last holdouts.



Addendum: Simpson and the Potential-Outcome Camp

My discussion of the four Simpson’s papers would be incomplete without mentioning another paper, which represents the thinking within the potential outcome camp. The paper in question is “A Fruitful Resolution to Simpson’s Paradox via Multiresolution Inference,” by Keli Liu and Xiao-Li Meng (2014), which appeared in the same issue of Statistical Science as my “Understanding Simpson’s Paradox”

The intriguing feature of Liu and Meng’s paper is that they, too, do not see any connection to causality. In their words: “Peeling away the [Simpson’s] paradox is as easy (or hard) as avoiding a comparison of apples and oranges, a concept requiring no mention of causality” p.17, and again: ” The central issues of Simpson’s paradox can be addressed adequately without necessarily invoking causality.” (p. 18). Two comments:

  1. Liu and Meng fail to see that the distinction between apples and oranges must be made with causal considerations in mind — statistical criteria alone cannot help us avoid a comparison of apples and oranges. This has been shown again and again, even by Simpson himself.
  2. Liu and Meng do not endorse the resolution offered by causal modeling and, as a result, they end up with the wrong conclusion. Quoting: “Simpson’s Warning: less conditioning is most likely to lead to serious bias when Simpson’s Paradox appears.” (p. 17). Again, Simpson himself brings an example where conditioning leads to more bias, not less.

Thus, in contrast to the data-only economists (Spanos), the potential-outcome camp does not object to causal reasoning per-se, this is their specialty. What they object to are attempts to resolve Simpson’s paradox formally and completely, namely, explicate formally what the differences are between “apples and oranges” and deal squarely with the decision problem: “What to do in case of reversal.”

Why are they resisting the complete solution? Because (and this is a speculation) the complete solution requires graphical tools and we all know the attitude of potential-outcome enthusiasts towards graphs. We dealt with this cultural peculiarity before so, at this point, we should just add Simpson’s paradox to their list of challenges, and resign humbly to the slow pace with which Kuhn’s paradigms are shifting.


« Previous PageNext Page »

Powered by WordPress