Causal Analysis in Theory and Practice

March 19, 2019

CAUSAL INFERENCE SUMMER SHORT COURSE AT HARVARD

Filed under: Uncategorized — Judea Pearl @ 5:37 am

We are informed of the following short course at Harvard. Readers of this blog will probably wonder what this Harvard-specific jargon is all about, and whether it has a straightforward translation into Structural Causal Models. It has! And one of the challengesof contemporary causal inference is to navigate the literature despite its seeming diversity, and to work towards convergence of ideas, tools and terminology.

Summer Short Course “An Introduction to Causal Inference”

Date: June 3-7, 2019

Instructors: Miguel Hernán, Judith Lok, James Robins, Eric Tchetgen Tchetgen & Tyler VanderWeele

This 5-day course introduces concepts and methods for causal inference from observational data. Upon completion of the course, participants will be prepared to further explore the causal inference literature. Topics covered include the g-formula, inverse probability weighting of marginal structural models, g-estimation of structural nested models, causal mediation analysis, and methods to handle unmeasured confounding. The last day will end with a “capstone” open Q&A session with the instructors.

Prerequisites: Participants are expected to be familiar with basic concepts in epidemiology and biostatistics, including linear and logistic regression and survival analysis techniques.

Tuition: $600/person, to be paid at the time of registration. A limited number of tuition waivers are available for students.

Date/Location: June 3-7, 2019 at the Harvard T.H. Chan School of Public Health.

Details and registration:https://www.hsph.harvard.edu/causal/shortcourse/

Comments (1)

February 12, 2019

Lion Man – Ulm Museum

Filed under: Uncategorized — Judea Pearl @ 6:25 am

Stefan Conrady, Managing Partner of Bayesia, was kind enough to send us an interesting selfie he took with the Lion Man that is featured in Chapter 1 of Book of Why.

He also added that the Ulm Museum (where the Lion Man is on exhibit) is situated near the house where Albert Einstein was born in 1879.

This makes Ulm a home to two revolutions of human cognition.

Comments (1)

January 15, 2019

More on Gelman’s views of causal inference

Filed under: Uncategorized — Judea Pearl @ 5:37 pm

In the past two days I have been engaged in discussions regarding Andrew Gelman’s review of Book of Why.

These discussions unveils some of our differences as well as some agreements. I am posting some of the discussions below, because Gelman’s blog represents the thinking of a huge segment of practicing statisticians who are, by and large, not very talkative about causation. It is interesting therefore to understand how they think, and what makes them tick.

Judea Pearl says: January 12, 2019 at 8:24 am

Andrew,
I appreciate your kind invitation to comment on your blog. Let me start with a Tweet that I posted on https://twitter.com/yudapearl

(updated 1.10.19)
1.8.19 @11:59pm – Gelman’s review of #Bookofwhy should be of interest because it represents an attitude that paralyzes wide circles of statistical researchers. My initial reaction is now posted on https://bit.ly/2H3BH3b Related posts: https://ucla.in/2sgzkPZ and https://ucla.in/2v72QK5

These postings speak for themselves but I would like to respond here to your recommendation: “Similarly, I’d recommend that Pearl recognize that the apparatus of statistics, hierarchical regression modeling, interactions, post-stratification, machine learning, etc etc solves real problems in causal inference.”

It sounds like a mild and friendly recommendation, and your readers would probably get upset at anyone who would be so stubborn as to refuse it.

But I must. Because, from everything I know about causation, the apparatus you mentioned does NOT, and CANNOT solve any problem known as “causal” by the causal-inference community (which includes your favorites Rubin, Angrist, Imbens, Rosenbaum, etc etc.). Why?

Because the solution to any causal problem must rest on causal assumptions and the apparatus you mentioned has no representation for such assumptions.

1. Hierarchical models are based on set-subset relationships, not causal relationships.

2. “interactions” is not an apparatus unless you represent them in some model, and act upon them.

3. “post-stratification” is valid only after you decide what you stratify on, and this requires a causal structure (which you claim above to be an unnecessary “wrapping” and complication”)

4. “Machine learning” is just fancy curve fitting of data see https://ucla.in/2umzd65

Thus, what you call “statistical apparatus” is helpless in solving causal problems. We came to this juncture several times in the past and, invariably, you pointed me to books, articles, and elaborated works which, in your opinion, do solve “real life causal problems”. So, how are we going to resolve our disagreement on whether those “real life” problems are “causal” and, if they are, whether your solution of them is valid. I suggested applying your methods to toy problems whose causal character is beyond dispute. You did not like this solution, and I do not blame you, because solving ONE toy problem will turn your perception of causal analysis upside down. It is frightening. So I would not press you. But I will add another Tweet before I depart:

1.9.19 @2:55pm – An ounce of advice to readers who comment on this “debate”: Solving one toy problem in causal inference tells us more about statistics and science than ten debates, no matter who the debaters are. #Bookofwhy

Addendum. Solving ONE toy problem will tells you more than dozen books and articles and multi-cited reports. You can find many such toy problems (solved in R) here: https://ucla.in/2KYYviP sample of solution manual: https://ucla.in/2G11xUE

For your readers convenience, I have provided free access to chapter 4 here: https://ucla.in/2G2rWBv It is about counterfactuals and, if I were not inhibited by modesty, I would confess that it is the best text on counterfactuals and their applications that you can find anywhere.

I hope you take advantage of my honesty.
Enjoy
Judea

Andrew says: January 12, 2019 at 11:37 am

Judea:

We are in agreement. I agree that data analysis alone cannot solve any causal problems. Substantive assumptions are necessary too. To take a familiar sort of example, there are people out there who just think that if you fit a regression of the form, y = a + bx + cz + error, that the coefficients b and c can be considered as causal effects. At the level of data analysis, there are lots of ways of fitting this regression model. In some settings with good data, least squares is just fine. In more noisy problems, you can do better with regularization. If there is bias in the measurements of x, z, and y, that can be incorporated into the model also. But none of this legitimately gives us a causal interpretation until we make some assumptions. There are various ways of expressing such assumptions, and these are talked about in various ways in your books, in the books by Angrist and Pischke, in the book by Imbens and Rubin, in my book with Hill, and in many places. Your view is that your way of expressing causal assumptions is better than the expositions of Angrist and Pischke, Imbens and Rubin, etc., that are more standard in statistics and econometrics. You may be right! Indeed, I think that for some readers your formulation of this material is the best thing out there.

Anyway, just to say it again: We agree on the fundamental point. This is what I call in the above post the division of labor, quoting Frank Sinatra etc. To do causal inference requires (a) assumptions about causal structure, and (b) models of data and measurement. Neither is enough. And, as I wrote above:

I agree with Pearl and Mackenzie that typical presentations of statistics, econometrics, etc., can focus way too strongly on the quantitative without thinking at all seriously about the qualitative aspects of the problem. It’s usually all about how to get the answer given the assumptions, and not enough about where the assumptions come from. And even when statisticians write about assumptions, they tend to focus on the most technical and least important ones, for example in regression focusing on the relatively unimportant distribution of the error term rather than the much more important concerns of validity and additivity.
If all you do is set up probability models, without thinking seriously about their connections to reality, then you’ll be missing a lot, and indeed you can make major errors in casual reasoning . . .

Where we disagree is just on terminology, I think. I wrote, “the apparatus of statistics, hierarchical regression modeling, interactions, poststratification, machine learning, etc etc., solves real problems in causal inference.” When I speak of this apparatus, I’m not just talking about probability models; I’m also talking about assumptions that map those probability models to causality. I’m talking about assumptions such as those discussed by Angrist and Pischke, Imbens and Rubin, etc.—and, quite possibly, mathematically equivalent in these examples to assumptions expressed by you.

So, to summarize: To do causal inference, we need (a) causal assumptions (assumptions of causal structure), and (b) models or data analysis. The statistics curriculum spends much more time on (b) than (a). Econometrics focuses on (a) as well as (b). You focus on (a). When Angrist, Pischke, Imbens, Rubin, Hill, me, and various others do causal inference, we do both (a) and (b). You argue that if we were to follow your approach on (a), we’d be doing better work for those problems that involve causal inference. You may be right, and in any case I’m glad you and Mackenzie wrote this book which so many people have found helpful, just as I’m glad that the aforementioned researchers wrote their books on causal inference which so many have found helpful. A framework for causal inference—whatever that framework may be—is complementary to, not in competition with, data-analysis tools such as hierarchical modeling, poststratification, machine learning, etc.

P.S. I’ll ignore the bit in your comment where you say you know what is “frightening” to me.

Judea Pearl says: January 13, 2019 at 6:59 am

Andrew,

I would love to believe that where we disagree is just on terminology. Indeed, I see sparks of convergence in your last post, where you enlighten me to understand that by “the apparatus of statistics, …’ you include the assumptions that PO folks (Angrist and Pischke, Imbens and Rubin etc.) are making, namely, assumptions of conditional ignorability. This is a great relief, because I could not see how the apparatus of regression, interaction, post-stratification or machine learning alone, could elevate you from rung-1 to rung-2 of the Ladder of Causation. Accordingly, I will assume that whenever Gelman and Hill talk about causal inference they tacitly or explicitly make the ignorability assumptions that are needed to take them
from associations to causal conclusions. Nice. Now we can proceed to your summary and see if we still have differences beyond terminology.

I almost agree with your first two sentences: “So, to summarize: To do causal inference, we need (a) causal assumptions (assumptions of causal structure), and (b) models or data analysis. The statistics curriculum spends much more time on (b) than (a)”.

But we need to agree that just making “causal assumptions” and leaving them hanging in the air is not enough. We need to do something with the assumptions, listen to them, and process them so as to properly guide us in the data analysis stage.

I believe that by (a) and (b) you meant to distinguish identification from estimation. Identification indeed takes the assumptions and translate them into a recipe with which we can operate on the data so as to produce a valid estimate of the research question of interest. If my interpretation of your (a) and (b) distinction is correct, permit me to split (a) into (a1) and (a2) where (a2) stands for identification.

With this refined-taxonomy, I have strong reservation to your third sentence: “Econometrics focuses on (a) as well as (b).” Not all of econometrics. The economists you mentioned, while commencing causal analysis with “assumptions” (a1), vehemently resist to organizing these assumptions in any “structure”, be it a DAG or structural equations (Some even pride themselves of being “model-free”). Instead, they restrict their assumptions to conditional ignorability statements so as to justify familiar estimation routines. [In https://ucla.in/2mhxKdO, I labeled them: “experimentalists” or “structure-free economists” to be distinguished from “structuralists” like Heckman, Sims, or Matzkin.]

It is hard to agree therefore that these “experimentalists” focus on (a2) — identification. They actually assume (a2) away rather than use it to guide data analysis.

Continuing with your summary, I read: “You focus on (a).” Agree. I interpret (a) to mean (a) = (a1) + (a2) and I let (b) be handled by smart statisticians, once they listen to the guidance of (a2).

Continuing, I read:
“When Angrist, Pischke, Imbens, Rubin, Hill, me, and various others do causal inference, we do both (a) and (b). Not really. And it is not a matter of choosing “an approach”. By resisting structure, these researchers a priori deprive themselves of answering causal questions that are identifiable by do-calculus and not by a single conditional ignorability assumption. Each of those questions may require a different estimand, which means that you cannot start doing the “data analysis” phase before completing the identification phase.

[Currently, even questions that are identifiable by conditional ignorability assumption cannot be answered by structure-free PO folks, because deciding on the conditioning set of covariates is intractable without the aid of DAGs, but this is a matter of efficiency not of essence.]

But your last sentence is hopeful:
“A framework for causal inference — whatever that that framework may be — is complementary to, not in competition with, data-analysis tools such as hierarchical modeling, post-stratification, machine learning, etc.”

Totally agree, with one caveat: the framework has to be a genuine “framework,” ie, capable of leverage identification to guide data-analysis.

Let us look now at why a toy problem would be frightening; not only to you, but to anyone who believes that the PO folks are offering a viable framework for causal inference.

Lets take the simplest causal problem possible, say a Markov chain X —>Z—>Y with X standing for Education, Z for Skill and Y for Salary. Let Salary be determined by Skill only, regardless of Education. Our research problem is to find the causal effect of Education on Salary given observational data of (perfectly measured) X,Y,Z.

To appreciate the transformative power of a toy example, please try to write down how Angrist, Pischke, Imbens, Rubin, Hill, would go about doing (a) and (b) according to your understanding of their framework. You are busy, I know, so let me ask any of your readers to try and write down step by step how the graph-less school would go about it. Any reader who tries this exercise ONCE will never be thesame. It is hard to believe unless you actually go through this frightening exercise, please try.

Repeating my sage-like advice: Solving one toy problem in causal inference tells us more about statistics and science than ten debates, no matter who the debaters are.
Try it.

[Judea Pearl added in editing: I have received no solution thus far, not even an attempt. For readers of this blog, the chain is part of the front-door model which is treated in Causality pp. 232-4, in both graphical and potential outcome frameworks. I have yet to meet a PO researcher who can formulate this toy story in PO, let alone solve it. Not because they can’t, but because the very idea of listening to their understanding of a problem and translating that understanding to formal assumption is foreign to them, having been conditioned to assume ignorability and estimate a quantity that is easily estimable]

Andrew says:January 13, 2019 at 8:26 pm

Judea:

I think we agree on much of the substance. And I agree with you regarding “not all econometrics” (and, for that matter, not all of statistics, not all of sociology, etc.). As I wrote in my review of your book with Mackenzie, and in my review of Angrist and Pischke’s book, causal identification is an important topic and worth its own books.

In practice, our disagreement is, I think, that we focus on different sorts of problems and different sorts of methods. And that’s fine! Division of labor. You have toy problems that interest you, I have toy problems that interest me. You have applied problems that interest you, I have applied problems that interest me. I would not expect you to come up with methods of solving the causal inference problems that I work on, but that’s OK: your work is inspirational to many people and I can well believe it has been useful in certain applications as well as in developing conceptual understanding. I consider toy problems of my own for that same reason. I’m not particularly interested in your toy problems, but that’s fine; I doubt you’re particularly interested in the problems I focus on. It’s a big world out there.

In the meantime, you continue to characterize me as being frightened or lacking courage. I wish you’d stop doing that.

[Judea Pearl added in editing: Gelman wants to move identification to separate books, because it is important, but the fact that one cannot start estimation before having an identifiable estimand is missing from his comment. Is he aware of it? Does he really do estimation before identification? I do not know, it is a foreign culture to me.]

Judea Pearl says: January 13, 2019 at 10:51 pm

Andrew,
Convergence is in sight, modulo two corrections:
1. You say:
“You [Pearl] have toy problems that interest you, I [Andrew] have toy problems that interest me. …I doubt you’re particularly interested in the problems I focus on. ”
Wrong! I am very interested in your toy problems, especially those with causal flavor. Why? Because I love to challenge the SCM framework with new tasks and new angles that other researchers found to be important, and see if SCM can be enriched with expanded scope. So, by all means, if you have a new twist, shoot. I have not been able to do it in the past, because your shots were not toy-like, e.g., 3-4 variables, clear task, with correct answer known.

2. You say:
“you continue to characterize me as being frightened or lacking courage” This was not my intention. My last remark on frightening toys was general, everyone is frightened by the honesty and transparency of toys — the adequacy of one’s favorite method is undergoing a test of fire. Who wouldn’t be frightened? But, since you prefer, I will stop using this metaphor.

3. Starting afresh, and the sake of good spirit: How about attacking a toy problem? Just for fun, just for sport.

Andrew says: January 13, 2019 at 11:24 pm

Judea:

I’ve attacked a lot of toy problems.

For an example of a toy problem in causality, see pages 962-963 of this article.

But most of the toy problems I’ve looked at do not involve causality; see for example this paper, item 4 in this post, and this paper. This article on experimental design is simple enough that I think it could count as a toy problem: it’s a simple example without data which allows us to compare different methods. And here’s a theoretical paper I wrote awhile ago that has three toy examples. Not involving causal inference, though.

I’ve written lots of papers with causal inference, but they’re almost all applied work. This may be because I consider myself much more of a practitioner of causal inference than a researcher on causal inference. To the extent I’ve done research on causal inference, it’s mostly been to resolve some confusions in my mind (as in this paper).

This gets back to the division-of-labor thing. I’m happy for you and Imbens and Hill and Robins and VanderWeele and others to do research on fundamental methods for causal inference, while I do research on statistical analysis. The methods that I’ve learned have allowed my colleagues and I to make progress on a lot of applied problems in causal inference, and have given me some clarity in understanding problems with some naive formulations of causal reasoning (as in the first reference above in this comment).

[Judea Pearl. Added in editing: Can one really make progress on a lot of applied problems in causal inference without dealing with identification Evidently, PO folks think so, at least those in Gelman’s circles]

As I wrote in my above post, I think your book with Mackenzie has lots of great things in it; I just can’t go with a statement such as, “Using a calculus of cause and effect developed by Pearl and others, scientists now have the ability to answer such questions as whether a drug cured an illness, when discrimination is to blame for disparate outcomes, and how much worse global warming can make a heat wave”—because scientists have been answering such questions before Pearl came along, and scientists continue to answer such questions using methods other than Pearl’s. For what it’s worth, I don’t think the methods that my colleagues and I have developed are necessary for solving these or any problems. Our methods are helpful in some problems, some of the time, at least until something better comes along—I think that’s pretty much all that any of us can hope for! That, and we can hope that our writings inspire new researchers to come up with new methods that are useful in the future.

Judea Pearl says:January 14, 2019 at 2:18 am

Andrew,
Agree to division of labor: causal inference on one side and statistical analysis on the other.

Assuming that you give me some credibility on the first, let me try and show you that even the publisher advertisement that you mock with disdain is actually true and carefully expressed. It reads: “Using a calculus of cause and effect developed by Pearl and others, scientists now have the ability to answer such questions as whether a drug cured an illness, when discrimination is to blame for disparate outcomes, and how much worse global warming can make a heat wave”.

First, note that it includes “Pearl and others”, which theoretically might include the people you have in mind. But it does not; it refers to those who developed mathematical formulation and mathematical tools to answer such questions. So let us examine the first question: “whether a a drug cured an illness”. This is a counterfactual “cause of effect” type question. Do you know when it was first formulated mathematically? [Don Rubin declared it non-scientific].

Now lets go to the second: “when discrimination is to blame for disparate outcomes,” This is a mediation problem. Care to guess when this problem was first formulated (see Book of Why chapter 9) and what the solution is Bottom line, Pearl is not as thoughtless as your review portrays him to be and, if you advise your readers to control their initial reaction: “Hey, statisticians have been doing it for centuries” they would value learning how things were first formulated, first solved and why statisticians were not always the first.

Andrew says:January 14, 2019 at 6:46 pm

Judea:

I disagree with your implicit claim that, before your methods were developed, scientists were not able to answer such questions as whether a drug cured an illness, when discrimination is to blame for disparate outcomes, and how much worse global warming can make a heat wave. I doubt much will be gained by discussing this particular point further so I’m just clarifying that this is a point of disagreement.

Also, I don’t think in my review I portrayed you as thoughtless. My message was that your book with Mackenzie is valuable and interesting even though it has some mistakes. In my review I wrote about the positive part as well as the mistakes. Your book is full of thought!

[Judea Pearl. Added in edit: Why can’t Gelman “go with a statement such as, “Using a calculus of cause and effect developed by Pearl and others, scientists now have the ability to answer such questions as whether a drug cured an illness, when discrimination is to blame for disparate outcomes, and how much worse global warming can make a heat wave”? His answer is: “because scientists have been answering such questions before Pearl came along” True, by trial and error, but not by mathematical analysis. And my statement marvels at the ability of doing it analytically. So why can’t Gelman acknowledge that a marvelous progress has been made, not by me, but by several researchers who realized that graph-less PO is a deadend.?]

Comments (2)

January 9, 2019

Can causal inference be done in statistical vocabulary?

Filed under: Uncategorized — Judea Pearl @ 6:59 am

Andrew Gelman has just posted a review of The Book of Why (https://andrewgelman.com/2019/01/08/book-pearl-mackenzie/), my answer to some of his comments follows below:

“Andrew,

The hardest thing for people to snap out of is the bubble of their own language. You say: “I find it baffling that Pearl and his colleagues keep taking statistical problems and, to my mind, complicating them by wrapping them in a causal structure (see, for example, here).”

No way! and again: No way! There is no way to answer causal questions without snapping out of statistical vocabulary. I have tried to demonstrate it to you in the past several years, but was not able to get you to solve ONE toy problem from beginning to end.

This will remain a perennial stumbling block until one of your readers tries honestly to solve ONE toy problem from beginning to end. No links to books or articles, no naming of fancy statistical techniques, no global economics problems, just a simple causal question whose answer we know in advance. (e.g. take Simpson’s paradox: Which data should be consulted? The aggregated or the disaggregated?)

Even this group of 73 Editors found it impossible, and have issued the following guidelines for reporting observational studies: https://www.atsjournals.org/doi/pdf/10.1513/AnnalsATS.201808-564PS

To readers of your blog: Please try it. The late Dennis Lindley was the only statistician I met who had the courage to admit: “We need to enrich our language with a do-operator”. Try it, and you will see why he came to this conclusion, and perhaps you will also see why Andrew is unable to follow him.”

Addendum:

In his response to my comment above, Andrew Gelman suggested that we agree to disagree, since science is full of disagreements and there is lots of room for progress using different methods. Unfortunately, the need to enrich statistics with new vocabulary is a mathematical fact, not an opinion. This need cannot be resolved by “there are many ways to skin a cat” without snapping out of traditional statistical language and enriching it with causal vocabulary. Neyman-Rubin’s potential outcomes vocabulary is an example of such enrichment, since it goes beyond joint distributions of observed variables.

Andrew further refers us to three chapters in his book (with Jennifer Hill) on causal inference. I am craving instead for one toy problem, solved from assumptions to conclusions, so that we can follow precisely the roll played by the extra-statistical vocabulary, and why it is absolutely needed. The Book of Why presents dozen such examples, but readers would do well to choose their own.

Comments (4)

December 13, 2018

Winter Greetings from the UCLA Causality Blog

Filed under: Announcement,Book (J Pearl),General — Judea Pearl @ 11:37 pm

Dear friends in causality research,

In the past 5 months, since the publication of The Book of Why http://bayes.cs.ucla.edu/WHY/ I have been involved in conversations with many inquisitive readers on Twitter @yudapearl and have not been able to update our blog as frequently as I should. I am glad to return to this forum and update it with the major developments since July, 2018.

1.
Initial reviews of the Book of Why are posted on its trailer page http://bayes.cs.ucla.edu/WHY/ They vary from technical discussions to philosophical speculations, from relationships to machine learning to debates about the supremacy of randomized contolled trials.

2.
A search-able file of all my 750 tweets is available here: https://ucla.in/2Kz0FoY. It can be used for (1) extracting talking points, adages and arguments in the defense of causal inference, and (2) understanding the thinking of neighboring cultures, e.g., statistics, epidemiology, economics, deep learning and reinforcement learning, primarily on issues of transparency, testability, manipulability, do-expressions and counterfactuals.

3.
The 6th printing of the Book Of Why is now available, with corrections to all errors and typos discovered up to Oct. 29, 2018. To check that you have the latest printing, make sure the last line on the copywright page ends with … 8 7 6

4.
Please examine the latest papers and reports from our brewry:

R-484 Pearl, “Causal and Counterfactual Inference,” Forthcoming section in The Handbook of Rationality, MIT press. https://ucla.in/2Iz9myt

R-484 Pearl, “A note on oxygen, matches and fires, On Non-manipulable Causes,” September 2018. https://ucla.in/2Qb1h6v

R-483 Pearl, “Does Obesity Shorten Life? Or is it the Soda? On Non-manipulable Causes,” https://ucla.in/2EpxcNU Journal of Causal Inference, 6(2), online, September 2018.

R-481 Pearl, “The Seven Tools of Causal Inference with Reflections on Machine Learning,” July 2018 https://ucla.in/2umzd65 Forthcoming, Communications of ACM.

R-479 Cinelli and Pearl, “On the utility of causal diagrams in modeling attrition: a practical example,” April 2018. https://ucla.in/2L8KAWw Forthcoming, Journal of Epidemiology.

R-478 Pearl and Bareinboim, “A note on `Generalizability of Study Results’,” April 2018. Forthcoming, Journal of Epidemiology. https://ucla.in/2NIsI6B

Earlier papers can be found here: http://bayes.cs.ucla.edu/csl_papers.html

5.
I wish in particular to call attention to the introduction of R-478, https://ucla.in/2NIsI6B. It provides a “three bullets” recipe for comparing
the structural and potential outcome frameworks:

* To determine if there exist sets of covariates $W$ that satisfy “conditional exchangeability”
** To estimate causal parameters at the target population in cases where such sets $W$ do not exist, and
*** To decide if one’s modeling assumptions are compatible with the available data.

I have listed the “three bullets” above in the hope that they serve to facilitate and concretize future conversations with our neighbors from the potential outcome framework.

6. We are informed of a most relevant workshop: AAAI-WHY 2019, March 26-27, Stanford, CA. The 2019 AAAI Spring Symposium will host a new workshop: Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI. See https://why19.causalai.net. Submissions due December 17, 2018

Greetings and Happy Holidays
Judea

Comments (0)

June 15, 2018

A Statistician’s Re-Reaction to The Book of Why

Filed under: Book (J Pearl),Discussion,Simpson's Paradox — Judea Pearl @ 2:29 am

Responding to my June 11 comment, Kevin Gray posted a reply on kdnuggets.com in which he doubted the possibility that the Causal Revolution has solved problems that generations of statisticians and philosophers have labored over and could not solve. Below is my reply to Kevin’s Re-Reaction, which I have also submitted to kdhuggets.com:

Dear Kevin,
I am not suggesting that you are only superficially acquainted with my works. You actually show much greater acquaintance than most statisticians in my department, and I am extremely appreciative that you are taking the time to comment on The Book of Why. You are showing me what other readers with your perspective would think about the Book, and what they would find unsubstantiated or difficult to swallow. So let us go straight to these two points (i.e., unsubstantiated and difficult to swallow) and give them an in-depth examination.

You say that I have provided no evidence for my claim: “Even today, only a small percentage of practicing statisticians can solve any of the causal toy problems presented in the Book of Why.” I believe that I did provide such evidence, in each of the Book’s chapters, and that the claim is valid once we agree on what is meant by “solve.”

Let us take the first example that you bring, Simpson’s paradox, which is treated in Chapter 6 of the Book, and which is familiar to every red-blooded statistician. I characterized the paradox in these words: “It has been bothering statisticians for more than sixty years – and it remains vexing to this very day” (p. 201). This was, as you rightly noticed, a polite way of saying: “Even today, the vast majority of statisticians cannot solve Simpson’s paradox,” a fact which I strongly believe to be true.

You find this statement hard to swallow, because: “generations of researchers and statisticians have been trained to look out for it [Simpson’s Paradox]” an observation that seems to contradict my claim. But I beg you to note that “trained to look out for it” does not make the researchers capable of “solving it,” namely capable of deciding what to do when the paradox shows up in the data.

This distinction appears vividly in the debate that took place in 2014 on the pages of The American Statistician, which you and I cite. However, whereas you see the disagreements in that debate as evidence that statisticians have several ways of resolving Simpson’s paradox, I see it as evidence that they did not even come close. In other words, none of the other participants presented a method for deciding whether the aggregated data or the segregated data give the correct answer to the question: “Is the treatment helpful or harmful?”

Please pay special attention to the article by Keli Liu and Xiao-Li Meng, both are from Harvard’s department of statistics (Xiao-Li is a senior professor and a Dean), so they cannot be accused of misrepresenting the state of statistical knowledge in 2014. Please read their paper carefully and judge for yourself whether it would help you decide whether treatment is helpful or not, in any of the examples presented in the debate.

It would not!! And how do I know? I am listening to their conclusions:

They disavow any connection to causality (p.18), and
They end up with the wrong conclusion. Quoting: “less conditioning is most likely to lead to serious bias when Simpson’s Paradox appears.” (p.17) Simpson himself brings an example where conditioning leads to more bias, not less.

I dont blame Liu and Meng for erring on this point, it is not entirely their fault (Rosenbaum and Rubin made the same error). The correct solution to Simpson’s dilemma rests on the back-door criterion, which is almost impossible to articulate without the aid of DAGs. And DAGs, as you are probably aware, are forbidden from entering a 5 mile no-fly zone around Harvard [North side, where the statistics department is located].

So, here we are. Most statisticians believe that everyone knows how to “watch for” Simpson’s paradox, and those who seek an answer to: “Should we treat or not?” realize that “watching” is far from “solving.” Moreover, the also realize that there is no solution without stepping outside the comfort zone of statistical analysis and entering the forbidden city of causation and graphical models.

One thing I do agree with you — your warning about the implausibility of the Causal Revolution. Quoting: “to this day, philosophers disagree about what causation is, thus to suggest he has found the answer to it is not plausible”. It is truly not plausible that someone, especially a semi-outsider, has found a Silver Bullet. It is hard to swallow. That is why I am so excited about the Causal Revolution and that is why I wrote the book. The Book does not offer a Silver Bullet to every causal problem in existence, but it offers a solution to a class of problems that centuries of statisticians and Philosophers tried and could not crack. It is implausible, I agree, but it happened. It happened not because I am smarter but because I took Sewall Wright’s idea seriously and milked it to its logical conclusions as much as I could.

It took quite a risk on my part to sound pretentious and call this development a Causal Revolution. I thought it was necessary. Now I am asking you to take a few minutes and judge for yourself whether the evidence does not justify such a risky characterization.

It would be nice if we could alert practicing statisticians, deeply invested in the language of statistics to the possibility that paradigm shifts can occur even in the 21st century, and that centuries of unproductive debates do not make such shifts impossible.

You were right to express doubt and disbelief in the need for a paradigm shift, as would any responsible scientist in your place. The next step is to let the community explore:

How many statisticians can actually answer Simpson’s question, and
How to make that number reach 90%.

I believe The Book of Why has already doubled that number, which is some progress. It is in fact something that I was not able to do in the past thirty years through laborious discussions with the leading statisticians of our time.

It is some progress, let’s continue,
Judea

Comments (4)

June 11, 2018

A Statistician’s Reaction to The Book of Why

Filed under: Book (J Pearl) — Judea Pearl @ 12:37 am

Carlos Cinelli brough to my attention a review of The Book of Why, written by Kevin Gray, who disagrees with my claim that statistics has been delinquent in neglecting causality. See https://www.kdnuggets.com/2018/06/gray-pearl-book-of-why.html I have received similar reactions from statisticians in the past, and I expect more in the future. These reactions reflect a linguistic dissonance which The Book of Why describes thus: “Many scientists have been quite traumatized to learn that none of the methods they learned in statistics is sufficient even to articulate, let alone answer, a simple question like ‘What happens if we double the price?'” p 31.

I have asked Carlos to post the following response on Kevin’s blog:

————————————————
Kevin’s prediction that many statisticians may find my views “odd or exaggerated” is accurate. This is exactly what I have found in numerous conversations I have had with statisticians in the past 30 years. However, if you examine my views closely, you will find that they are not as thoughtless or exaggerated as they may appear at first sight.

Of course many statisticians will scratch their heads and ask: “Isn’t this what we have been doing for years, though perhaps under a different name or not name at all?” And here lies the essence of my views. Doing it informally, under various names, while refraining from doing it mathematically under uniform notation has had a devastating effect on progress in causal inference, both in statistics and in the many disciplines that look to statistics for guidance. The best evidence for this lack of progress is the fact that, even today, only a small percentage of practicing statisticians can solve any of the causal toy problems presented in the Book of Why.

Take for example:

Selecting a sufficient set of covariates to control for confounding
Articulating assumptions that would enable consistent estimates of causal effects
Finding if those assumptions are testable
Estimating causes of effect (as opposed to effects of cause)
More and more.

Every chapter of The Book of Why brings with it a set of problems that statisticians were deeply concerned about, and have been struggling with for years, albeit under the wrong name (eg. ANOVA or MANOVA) “or not name at all.” The results are many deep concerns but no solution.

A valid question to be asked at this point is what gives humble me the audacity to state so sweepingly that no statistician (in fact no scientist) was able to properly solve those toy problems prior to the 1980’s. How can one be so sure that some bright statistician or philosopher did not come up with the correct resolution of the Simpson’s paradox or a correct way to distinguish direct from indirect effects? The answer is simple: we can see it in the syntax of the equations that scientists used in the 20th century. To properly define causal problems, let alone solve them, requires a vocabulary that resides outside the language of probability theory. This means that all the smart and brilliant statisticians who used joint density functions, correlation analysis, contingency tables, ANOVA, Entropy, Risk Ratios, etc., etc., and did not enrich them with either diagrams or counterfactual symbols have been laboring in vain — orthogonally to the question — you can’t answer a question if you have no words to ask it. (Book of Why, page 10)

It is this notational litmus test that gives me the confidence to stand behind each one of statements that you were kind enough to cite from the Book of Why. Moreover, if you look closely at this litmus test, you will find that it not just notational but conceptual and practical as well. For example, Fisher’s blunder of using ANOVA to estimate direct effects is still haunting the practices of present day mediation analysts. Numerous other examples are described in the Book of Why and I hope you weigh seriously the lesson that each of them conveys.

Yes, many of your friends and colleagues will be scratching their head saying: “Hmmm… Isn’t this what we have been doing for years, though perhaps under a different name or not name at all?” What I hope you will be able to do after reading “The Book of Why” is to catch some of the head-scratchers and tell them: “Hey, before you scratch further, can you solve any of the toy problems in the Book of Why?” You will be surprised by the results — I was!
————————————————

To me, solving problems is the test of understanding, not head scratching. That is why I wrote this Book.

Judea

Comments (5)

June 7, 2018

Updates on The Book of Why

Filed under: Announcement,Book (J Pearl) — Judea Pearl @ 11:54 pm

Dear friends in causality research,

Three months ago, I sent you a special greeting, announcing the forthcoming publication of The Book of Why (Basic Books, co-authored with Dana MacKenzie). Below please find an update.

The Book came out on May 15, 2018, and has since been featured by the Wall Street Journal, Quanta Magazine, and The Times of London. You can view these articles here:
http://bayes.cs.ucla.edu/WHY/

Eager to allay public fears of the dangers of artificial intelligence, these three articles interpreted my critics of model-blind learning as general impediments to AI and machine learning. This has probably helped put the Book on Amazon’s #1 bestseller lists in several categories.

However, the limitations of current machine learning techniques are only part of the message conveyed in the Book of Why. The second, and more important part of the Book describes how these limitations are circumvented through the use of causal models, however qualitative or incomplete. The impacts that causal modeling has had on the social and health sciences make it only natural that a similar ‘revolution’ will soon be sweeping machine learning research, and liberate it from its current predicaments of opaqueness, forgetfulness and lack of explainability. (See, for example, http://www.sciencemag.org/news/2018/05/ai-researchers-allege-machine-learning-alchemy and https://arxiv.org/pdf/1801.00631.pdf)

I was happy therefore to see that this positive message was understood by many readers who wrote to me about the book, especially readers coming from traditional machine learning background (See, for example, www.inference.vc/untitled). It was also recognized by a more recent review in the New York Times
https://www.nytimes.com/2018/06/01/business/dealbook/review-the-book-of-why-examines-the-science-of-cause-and-effect.html which better reflects my optimism about what artificial intelligence can achieve.

I am hoping that you and your students will find inspiration in the optimistic message of the Book of Why, and that you take active part in the on-going development of “model-assisted machine learning.”

Sincerely,

Judea

Comments (1)

April 28, 2018

Causal Inference Workshop at UAI 2018

Filed under: Announcement,Conferences — Judea Pearl @ 12:42 am

Dear friends in causality research,

You may find an upcoming workshop at UAI to be of interest; see the details below for more information:

7th Causal Inference Workshop at UAI 2018 – Intercontinental, Monterey, CA; August 2018

Description
In recent years, causal inference has seen important advances, especially through a dramatic expansion in its theoretical and practical domains. By assuming a central role in decision making, causal inference has attracted interest from computer science, statistics, and machine learning, each field contributing a fresh and unique perspective.

More specifically, computer science has focused on the algorithmic understanding of causality, and general conditions under which causal structures may be inferred. Machine learning methods have focused on high-dimensional models and non-parametric methods, whereas more classical causal inference has been guiding policy in complex domains involving economics, social and health sciences, and business. Through such advances a powerful cross-pollination has emerged as a new set of methodologies promising to deliver robust data analysis than each field could individually — some examples include concepts such as doubly-robust methods, targeted learning, double machine learning, causal trees, all of which have recently been introduced.

This workshop is aimed at facilitating more interactions between researchers in machine learning, statistics, and computer science working on questions of causal inference. In particular, it is an opportunity to bring together highly technical individuals who are strongly motivated by the practical importance and real-world impact of their work. Cultivating such interactions will lead to the development of theory, methodology, and – most importantly – practical tools, that better target causal questions across different domains.

Important Dates
May 20 — Paper submission deadline; submission page: https://easychair.org/conferences/?conf=causaluai2018
June 20 — Author notification
July 20 — Camera ready version
August 10 — Workshop

Organizers
Bryant Chen, IBM
Panos Toulis, University of Chicago
Alexander Volfovsky, Duke University

Comments (0)

March 10, 2018

Challenging the Hegemony of Randomized Controlled Trials: Comments on Deaton and Cartwright

Filed under: Data Fusion,RCTs — Judea Pearl @ 12:20 am

I was asked to comment on a recent article by Angus Deaton and Nancy Cartwright (D&C), which touches on the foundations of causal inference. The article is titled: “Understanding and misunderstanding randomized controlled trials,” and can be viewed here: https://goo.gl/x6s4Uy

My comments are a mixture of a welcome and a puzzle; I welcome D&C’s stand on the status of randomized trials, and I am puzzled by how they choose to articulate the alternatives.

D&C’s main theme is as follows: “We argue that any special status for RCTs is unwarranted. Which method is most likely to yield a good causal inference depends on what we are trying to discover as well as on
what is already known.” (Quoted from their introduction)

As a veteran challenger of the supremacy of the RCT, I welcome D&C’s challenge wholeheartedly. Indeed, “The Book of Why” (forthcoming, may 2018, http://bayes.cs.ucla.edu/WHY/) quotes me as saying:
“If our conception of causal effects had anything to do with randomized experiments, the latter would have been invented 500 years before Fisher.” In this, as well as in my other writings I go so far as claiming that the RCT earns its legitimacy by mimicking the do-operator, not the other way around. In addition, considering the practical difficulties of conducting an ideal RCT, observational studies have a definite advantage: they interrogate populations at their natural habitats, not in artificial environments choreographed by experimental protocols.

Deaton and Cartwright’s challenge of the supremacy of the RCT consists of two parts:

The first (internal validity) deals with the curse of dimensionality and argues that, in any single trial, the outcome of the RCT can be quite distant from the target causal quantity, which is usually the average treatment effect (ATE). In other words, this part concerns imbalance due to finite samples, and reflects the traditional bias-precision tradeoff in statistical analysis and machine learning.
The second part (external validity) deals with biases created by inevitable disparities between the conditions and populations under study versus those prevailing in the actual implementation of the treatment program or policy. Here, Deaton and Cartwright propose alternatives to RCT, calling all out for integrating a web of multiple information sources, including observational, experimental, quasi-experimental, and theoretical inputs, all collaborating towards the goal of estimating “what we are trying to discover”.

My only qualm with D&C’s proposal is that, in their passion to advocate the integration strategy, they have failed to notice that, in the past decade, a formal theory of integration strategies has emerged from the brewery of causal inference and is currently ready and available for empirical researchers to use. I am referring of course to the theory of Data Fusion which formalizes the integration scheme in the language of causal diagrams, and provides theoretical guarantees of feasibility and performance. (see http://www.pnas.org/content/pnas/113/27/7345.full.pdf )

Let us examine closely D&C’s main motto: “Which method is most likely to yield a good causal inference depends on what we are trying to discover as well as on what is already known.” Clearly, to cast this advice in practical settings, we must devise notation, vocabulary, and logic to represent “what we are trying to discover” as well as “what is already known” so that we can infer the former from the latter. To accomplish this nontrivial task we need tools, theorems and algorithms to assure us that what we conclude from our integrated study indeed follows from those precious pieces of knowledge that are “already known.” D&C are notably silent about the language and methodology in which their proposal should be carried out. One is left wondering therefore whether they intend their proposal to remain an informal, heuristic guideline, similar to Bradford Hill’s Criteria of the 1960’s, or be explicated in some theoretical framework that can distinguish valid from invalid inference? If they aspire to embed their integration scheme within a coherent framework, then they should celebrate; Such a framework has been worked out and is now fully developed.

To be more specific, the Data Fusion theory described in http://www.pnas.org/content/pnas/113/27/7345.full.pdf provides us with notation to characterize the nature of each data source, the nature of the population interrogated, whether the source is an observational or experimental study, which variables are randomized and which are measured and, finally, the theory tells us how to fuse all these sources together to synthesize an estimand of the target causal quantity at the target population. Moreover, if we feel uncomfortable about the assumed structure of any given data source, the theory tells us whether an alternative source can furnish the needed information and whether we can weaken any of the model’s assumptions.

Those familiar with Data Fusion theory will find it difficult to understand why D&C have not utilized it as a vehicle to demonstrate the feasibility of their proposed alternatives to RCT’s. This enigma stands out in D&C’s description of how modern analysis can rectify the deficiencies of RCTs, especially those pertaining to generalizing across populations, extrapolating across settings, and controlling for selection bias.

Here is what D&C’s article says about extrapolation (Quoting from their Section 3.5, “Re-weighting and stratifying”): “Pearl and Bareinboim (2011, 2014) and Bareinboim and Pearl (2013, 2014)
provide strategies for inferring information about new populations from trial results that are more general than re-weighting. They suppose we have available both causal information and probabilistic information for population A (e.g. the experimental one), while for population B (the target) we have only (some) probabilistic information, and also that we know that certain probabilistic and causal facts are shared between the two and certain ones are not. They offer theorems describing what causal conclusions about population B are thereby fixed. Their work underlines the fact that exactly what conclusions about one population can be supported by information about another depends on exactly what causal and probabilistic facts they have in common.”

The text is accurate up to this point, but then it changes gears and states: “But as Muller (2015) notes, this, like the problem with simple re-weighting, takes us back to the situation that RCTs are designed to avoid, where we need to start from a complete and correct specification of the causal structure. RCTs can avoid this in estimation which is one of their strengths, supporting their credibility but the benefit vanishes as soon as we try to carry their results to a new context. ” I believe D&C miss the point about re-weighing and stratifying.

First, it is not the case that “this takes us back to the situation that RCTs are designed to avoid.” It actually takes us to a more manageable situation. RCTs are designed to neutralize the confounding of treatments, whereas our methods are designed to neutralize differences between populations. Researchers may be totally ignorant of the structure of the former and quite knowledgeable about the structure of the latter. To neutralize selection bias, for example, we need to make assumptions about the process of recruiting subjects for the trial, a process over which we have some control. There is a fundamental difference therefore between assumptions about covariates that determine patients choice of treatment and those that govern the selection of subjects — the latter is (partially) under our control. Replacing one set of assumptions with another, more defensible set, does not “take us back to the situation that RCTs are designed to avoid.” It actually takes us forward, towards the ultimate goal of causal inference — to base conclusions on scrutinizable assumptions, and to base their plausibility on scientific or substantive grounds.

Second, D&C overlook the significance of the “completeness” results established for transportability problems (see http://ftp.cs.ucla.edu/pub/stat_ser/r390-L.pdf). Completeness tells us, in essence, that one cannot do any better. In other words, it delineates precisely the minimum set of assumptions that are needed to establish consistent estimate of causal effects in the target population. If any of those assumptions are violated we know that we can do only worse. From a mathematical (and philosophical) viewpoint, this is the most one can expect analysis to do for us and, therefore, completeness renders the generalizability problem “solved.”

Finally, the completeness result highlights the broad implications of the Data Fusion theory, and how it brings D&C’s desiderata closer to becoming a working methodology. Completeness tells us that any envisioned strategy of study integration is either embraceable in the structure-based framework of Data Fusion, or it is not workable in any framework. This means that one cannot dismiss the conclusions of Data Fusion theory on the grounds that: “Its assumptions are too strong.” If a set of assumptions is deemed necessary in the Data Fusion analysis, then it is necessary period; it cannot be avoided or relaxed, unless it is supplemented by other assumptions elsewhere, and the algorithm can tell you where.

It is hard to see therefore why any of D&C’s proposed strategies would resist formalization, analysis and solution within the current logic of modern causal inference.

It took more than a dozen years for researchers to accept the notion of completeness in the context of internal validity. Utilizing the tools of the do-calculus (Pearl, 1995, Tian and Pearl, 2001, Shpitser & Pearl, 2006) completeness tells us what assumptions are absolutely needed for nonparametric identification of causal effects, how to tell if they are satisfied in any specific problem description, and how to use them to extract causal parameters from non-experimental studies. Completeness in external validity context is a relatively new result (See: http://ftp.cs.ucla.edu/pub/stat_ser/r443.pdf) which will probably take a few more years for enlightened researchers to accept, appreciate and to fully utilize. One purpose of this post is to urge the research community, especially Deaton and Cartwright to study the recent mathematization of externaly validity and to benefit from its implications.

I would be very interested in seeing other readers reaction to D&C’s article, as well as to my optimistic assessment of what causal inference can do for us in this day and age. I have read the reactions of Andrew Gelman (on his blog) and Stephen J. Senn (on Deborah Mayo’s blog https://errorstatistics.com/2018/01/), but they seem to be unaware of the latest developments in Data Fusion analysis. I also invite Angus Deaton and Nancy Cartwright to share a comment or two on these issues. I hope they respond positively.

Looking forward to your comments,
Judea

Addendum to “Challenging the Hegemony of RCTs”
Upon re-reading the post above I realized that I have assumed readers to be familiar with Data Fusion theory. This Addendum aims at readers who are not familiar with the theory, who would probably be asking: “Who needs a new theory to do what statistics does so well?” “Once we recognize the importance of diverse sources of data, statistics can be helpful in making decisions and quantifying uncertainty.” [Quoted from Andrew Gelman’s blog]. The reason I question the sufficiency of statistics to manage the integration of diverse sources of data is that statistics lacks the vocabulary needed for the job. Let us demonstrate it in a couple of toy examples, taken from BP-2015 (http://ftp.cs.ucla.edu/pub/stat_ser/r450-reprint.pdf).

Example 1
Suppose we wish to estimate the average causal effect of X on Y, and we have two diverse sources of data:

An RCT in which Z, not X, is randomized, and
An observational study in which X, Y, and Z are measured.

What substantive assumptions are needed to facilitate a solution to our problem? Put another way, how can we be sure that, once we make those assumptions, we can solve our problem?

Example 2
Suppose we wish to estimate the average causal effect ACE of X on Y, and we have two diverse sources of data:

An RCT in which the effect of X on both Y and Z is measured, but the recruited subjects had non-typical values of Z.
An observational study conducted in the target population, in which both X and Z (but not Y) were measured.

What substantive assumptions would enable us to estimate ACE, and how should we combine data from the two studies so as to synthesize a consistent estimate of ACE?

The nice thing about a toy example is that the solution is known to us in advance, and so, we can check any proposed solution for correctness. Curious readers can find the solutions for these two examples in
http://ftp.cs.ucla.edu/pub/stat_ser/r450-reprint.pdf. More ambitious readers will probably try to solve them using statistical techniques, such as meta analysis or partial pooling. The reason I am confident that the second group will end up with disappointment comes from a profound statement made by Nancy Cartwright in 1989: “No Causes In, No Causes Out”. It means not only that you need substantive assumptions to derive causal conclusions; it also means that the vocabulary of statistical analysis, since it is built entirely on properties of distribution functions, is inadequate for expressing those substantive assumptions that are needed for getting causal conclusions. In our examples, although part of the data is provided by an RCT, hence it provides causal information, one can still show that the needed assumptions must invoke causal vocabulary; distributional assumptions are insufficient. As someone versed in both graphical modeling and counterfactuals, I would go even further and state that it would be a miracle if anyone succeeds in translating the needed assumptions into a comprehensible language other than causal diagrams. (See http://ftp.cs.ucla.edu/pub/stat_ser/r452-reprint.pdf Appendix, Scenario 3.)

Armed with these examples and findings, we can go back and re-examine why D&C do not embrace the Data Fusion methodology in their quest for integrating diverse sources of data. The answer, I conjecture, is that D&C were not intimately familiar with what this methodology offers us, and how vastly different it is from previous attempts to operationalize Cartwright’s dictum: “No causes in, no causes out”.
Judea

Comments (3)

« Previous Page — Next Page »