Causal Analysis in Theory and Practice

September 15, 2016

Summer-end Greeting from the UCLA Causality Blog

Filed under: Uncategorized — bryantc @ 4:39 am

Dear friends in causality research,
This greeting from UCLA Causality blog contains news and discussion on the following topics:

1. Reflections on 2016 JSM meeting.
2. The question of equivalent representations.
3. Simpson’s Paradox (Comments on four recent papers)
4. News concerning Causal Inference Primer
5. New books, blogs and other frills.

1. Reflections on JSM-2016
For those who missed the JSM 2016 meeting, my tutorial slides can be viewed here:

As you can see, I argue that current progress in causal inference should be viewed as a major paradigm shift in the history of statistics and, accordingly, nuances and disagreements are merely linguistic realignments within a unified framework. To support this view, I chose for discussion six specific achievements (called GEMS) that should make anyone connected with causal analysis proud, empowered, and mighty motivated.

The six gems are:
1. Policy Evaluation (Estimating “Treatment Effects”)
2. Attribution Analysis (Causes of Effects)
3. Mediation Analysis (Estimating Direct and Indirect Effects)
4. Generalizability (Establishing External Validity)
5. Coping with Selection Bias
6. Recovering from Missing Data

I hope you enjoy the slides and appreciate the gems.

2. The question of equivalent representations
One challenging question that came up from the audience at JSM concerned the unification of the graphical and potential-outcome frameworks. “How can two logically equivalent representations be so different in actual use?”. I elaborate on this question in a separate post titled “Logically equivalent yet way too different.”

3. Simpson’s Paradox: The riddle that would not die
(Comments on four recent papers)
If you search Google for “Simpson’s paradox”, as I did yesterday, you would get 111,000 results, more than any other statistical paradox that I could name. What elevates this innocent reversal of associations to “paradoxical” status, and why it has captured the fascination of statisticians, mathematicians and philosophers for over a century are questions that we discussed at length on this (and other) blogs. The reason I am back to this topic is the publication of four recent papers that give us a panoramic view at how the understanding of causal reasoning has progressed in communities that do not usually participate in our discussions.

4. News concerning Causal Inference – A Primer
We are grateful to Jim Grace for his in-depth review on Amazon:

For those of you awaiting the solutions to the study questions in the Primer, I am informed that the Solution Manual is now available (to instructors) from Wiley. To obtain a copy, see page 2 of: However, rumor has it that a quicker way to get it is through your local Wiley representative, at

If you encounter difficulties, please contact us at and we will try to help. Readers tell me that the solutions are more enlightening than the text. I am not surprised, there is nothing more invigorating than seeing a non-trivial problem solved from A to Z.

5. New books, blogs and other frills
We are informed that a new book by Joseph Halpern, titled “Actual Causality”, is available now from MIT Press. ( Readers familiar with Halpern’s fundamental contributions to causal reasoning will not be surprised to find here a fresh and comprehensive solution to the age-old problem of actual causality. Not to be missed.

Adam Kelleher writes about an interesting math-club and causal-minded blog that he is orchestrating. See his post,

Glenn Shafer just published a review paper: “A Mathematical Theory of Evidence turn 40” celebrating the 40th anniversary of the publication of his 1976 book “A Mathematical Theory of Evidence” I have enjoyed reading this article for nostalgic reasons, reminding me of the stormy days in the 1980’s, when everyone was arguing for another calculus of evidential reasoning. My last contribution to that storm, just before sailing off to causality land, was this paper: Section 10 of Shafer’s article deals with his 1996 book “The Art of Causal Conjecture” My thought: Now, that the causal inference field has matured, perhaps it is time to take another look at the way Shafer views causation.

Wishing you a super productive Fall season.

J. Pearl

September 12, 2016

Logically equivalent yet way too different

Filed under: Uncategorized — bryantc @ 2:50 am

Contributor: Judea Pearl

In comparing the tradeoffs between the structural and potential outcome frameworks, I often state that the two are logically equivalent yet poles apart in terms of transparency and computational efficiency. (See Slide #34 of the JSM tutorial). Indeed, anyone who examines how the two frameworks solve a specific problem from begining to end (See, e.g., Slides #35-36 ) would find the differences astonishing.

The question naturally arises: How can two equivalent frameworks differ so substantially in actual use.

The answer is that epistemic equivalence does not mean representational equivalence. Two representations of the same information may highlight different aspects of the problem and thus differ substantially in how easy it is to solve a given problem.  This is a recurrent theme in complexity analysis, but is not generally appreciated outside computer science. We saw it in our discussions with Guido Imbens who could not accept the fact that the use of graphical models is a mathematical necessity not just a matter of taste. (

The examples usually cited in complexity analysis are combinatorial problems whose solution times depend critically on the initial representation. I hesitated from bringing up these examples, fearing that they will not be compelling to readers on this blog who are more familiar with classical mathematics.

Last week I stumbled upon a very simple example that demonstrates representational differences in no ambiguous terms; I would like to share it with readers.

Consider the age-old problem of finding a solution to an algebraic equation, say
y(x) = x3 + ax2 + bx + c = 0

This is a tough problem for those of us who do not remember Tartalia’s solution of the cubic.  (It can be made much tougher once we go to quintic equation.)

But there are many syntactic ways of representing the same function y(x) . Here is one equivalent representation:
y(x) = x(x2+ax) + b(x+c/b) = 0
and here is another:
y(x) = (x-x1)(x-x2)(x-x3) = 0,
where x1, x2, and x3 are some functions of a, b, c.

The last representation permits an immediate solution, which is:
x=x1, x=x2, x=x3.

The example may appear trivial, and some may even call it cheating, saying that finding x1, x2, and x3 is as hard as solving the original problem. This is true, but the purpose of the example was not to produce an easy solution to the cubic. The purpose was to demonstrate that different syntactic ways of representing the same information (i.e., the same polynomial) may lead to substantial differences in the complexity of computing an answer to a query (i.e., find a root).

A preferred representation is one that makes certain desirable aspects of the problem explicit, thus facilitating a speedy solution. Complexity theory is full of such examples.

Note that the complexity is query-dependent. Had our goal been to find a value x that makes the polynomial y(x) equal 4, not zero, the representation above y(x) = (x-x1)(x-x2)(x-x3) would offer no help at all. For this query, the representation
y(x) = (x-z1)(x-z2)(x-z3) + 4  
would yield an immediate solution
x=z1, x=z2, x=z3,
where z1, z2, and z3 are the roots of another polynomial:
x3 + ax2 + bx + (c-4) = 0

This simple example demonstrates nicely the principle that makes graphical models more efficient than alternative representations of the same causal information, say a set of ignorability assumptions. What makes graphical models efficient is the fact that they make explicit the logical ramifications of the conditional-independencies conveyed by the model. Deriving those ramifications by algebraic or logical means takes substantially more work. (See for the logic of counterfactual independencies)

A typical example of how nasty such derivations can get is given in Heckman and Pinto’s paper on “Causal Inference after Haavelmo” (Econometric Theory, 2015). Determined to avoid graphs at all cost, Heckman and Pinto derived conditional independence relations directly from Dawid’s axioms and the Markov condition (See The results are pages upon pages of derivations of independencies that are displayed explicitly in the graph.

Of course, this and other difficulties will not dissuade econometricians to use graphs; that would rake a scientific revolution of Kuhnian proportions. (see Still, awareness of these complexity issues should give inquisitive students the ammunition to hasten the revolution and equip econometrics with modern tools of causal analysis.

They eventually will.

September 11, 2016

An interesting math and causality-minded club

Filed under: Announcement — bryantc @ 6:08 pm

from Adam Kelleher:

The math and algorithm reading group ( is based in NYC, and was founded when I moved here three years ago. It’s a very casual group that grew out of a reading group I was in during graduate school. Some friends who were math graduate students were interested in learning more about general relativity, and I (a physicist) was interested in learning more math. Together, we read about differential geometry, with the goal of bringing our knowledge together. We reasoned that we could learn more as a group, by pooling our different perspectives and experience, than we could individually. That’s the core motivation of our reading group: not only are we there to help resolve each other get through the material if anyone gets stuck, but we’re also there to add what else we know (in the format of a group discussion) to the content of the material.

We’re currently reading Causality cover to cover. We’ve paused to implement some of the algorithms, and plan on pausing again soon for a review session. We intend to do a “hacking session”, to try our hands at causal inference and analysis on some open data sets.

Inspired by reading Causality, and realizing that the best open implementations of causal inference were packaged in the (old, relatively inaccessible) Tetrad package, I’ve started a modern implementation of some tools for causal inference and analysis in the causality package in Python. It’s on pypi (pip install causality, or check the tutorial on, but it’s still a work in progress. The IC* algorithm is implemented, along with a small suite of conditional independence tests. I’m adding some classic methods for causal inference and causal effects estimation, aimed at making the package more general-purpose. I invite new contributions to help build out the package. Just open an issue, and label it an “enhancement” to kick of the discussion!

Finally, to make all of the work more accessible to people without more advanced math background, I’ve been writing a series of blog posts aimed at introducing anyone with an intermediate background in probability and statistics to the material in Causality! It’s aimed especially at practitioners, like data scientists. The hope is that more people, managers included (the intended audience for the first 3 posts), will understand the issues that come up when you’re not thinking causally. I’d especially recommend the article about understanding bias, but the whole series (still in progress) is indexed here:

August 24, 2016

Simpson’s Paradox: The riddle that would not die. (Comments on four recent papers)

Filed under: Simpson's Paradox — bryantc @ 12:06 am

Contributor: Judea Pearl

If you search Google for “Simpson’s paradox,” as I did yesterday, you will get 111,000 results, more than any other statistical paradox that I could name. What elevates this innocent reversal of association to “paradoxical” status, and why it has captured the fascination of statisticians, mathematicians and philosophers for over a century are questions that we discussed at length on this (and other) blogs. The reason I am back to this topic is the publication of four recent papers that give us a panoramic view at how the understanding of causal reasoning has progressed in communities that do not usually participate in our discussions.

As readers of this blog recall, I have been trying since the publication of Causality (2000) to convince statisticians, philosophers and other scientific communities that Simpson’s paradox is: (1) a product of wrongly applied causal principles, and (2) that it can be fully resolved using modern tools of causal inference.

The four papers to be discussed do not fully agree with the proposed resolution.

To reiterate my position, Simpson’s paradox is (quoting Lord Russell) “another relic of a bygone age,” an age when we believed that every peculiarity in the data can be understood and resolved by statistical means. Ironically, Simpson’s paradox has actually become an educational tool for demonstrating the limits of statistical methods, and why causal, rather than statistical considerations are necessary to avoid paradoxical interpretations of data. For example, our recent book Causal Inference in Statistics: A Primer, uses Simpson’s paradox at the very beginning (Section 1.1), to show students the inevitability of causal thinking and the futility of trying to interpret data using statistical tools alone. See

Thus, my interest in the four recent articles stems primarily from curiosity to gauge the penetration of causal ideas into communities that were not intimately involved in the development of graphical or counterfactual models. Discussions of Simpson’s paradox provide a sensitive litmus test to measure the acceptance of modern causal thinking. “Talk to me about Simpson,” I often say to friendly colleagues, “and I will tell you how far you are on the causal trail.” (Unfriendly colleagues balk at the idea that there is a trail they might have missed.)

The four papers for discussion are the following:

Malinas, G. and Bigelow, J. “Simpson’s Paradox,” The Stanford Encyclopedia of Philosophy (Summer 2016 Edition), Edward N. Zalta (ed.), URL = <>.

Spanos, A., “Revisiting Simpson’s Paradox: a statistical misspecification perspective,” ResearchGate Article, <>, online May 2016.

Memetea, S. “Simpson’s Paradox in Epistemology and Decision Theory,” The University of British Columbia (Vancouver), Department of Philosophy, Ph.D. Thesis, May 2015.

Bandyopadhyay, P.S., Raghavan, R.V., Deruz, D.W., and Brittan, Jr., G. “Truths about Simpson’s Paradox Saving the Paradox from Falsity,” in Mohua Banerjee and Shankara Narayanan Krishna (Eds.), Logic and Its Applications, Proceedings of the 6th Indian Conference ICLA 2015 , LNCS 8923, Berlin Heidelberg: Springer-Verlag, pp. 58-73, 2015 .

——————- Discussion ——————-

1. Molina and Bigelow 2016 (MB)

I will start the discussion with Molina and Bigelow 2016 (MB) because the Stanford Encyclopedia of Philosophy enjoys both high visibility and an aura of authority. MB’s new entry is a welcome revision of their previous article (2004) on “Simpson’s Paradox,” which was written almost entirely from the perspective of “probabilistic causality,” echoing Reichenbach, Suppes, Cartwright, Good, Hesslow, Eells, to cite a few.

Whereas the previous version characterizes Simpson’s reversal as “A Logically Benign, empirically Treacherous Hydra,” the new version dwarfs the dangers of that Hydra and correctly states that Simpson’s paradox poses problem only for “philosophical programs that aim to eliminate or reduce causation to regularities and relations between probabilities.” Now, since the “probabilistic causality” program is fairly much abandoned in the past two decades, we can safely conclude that Simpson’s reversal poses no problem to us mortals. This is reassuring.

MB also acknowledge the role that graphical tools play in deciding whether one should base a decision on the aggregate population or on the partitioned subpopulations, and in testing one’s hypothesized model.

My only disagreement with the MB’s article is that it does not go all the way towards divorcing the discussion from the molds, notation and examples of the “probabilistic causation” era and, naturally, proclaim the paradox “resolved.” By shunning modern notation like do(x), Yx, or their equivalent, the article gives the impression that Bayesian conditionalization, as in P(y|x), is still adequate for discussing Simpson’s paradox, its ramifications and its resolution. It is not.

In particular, this notational orthodoxy makes the discussion of the Sure Thing Principle (STP) incomprehensible and obscures the reason why Simpson’s reversal does not constitute a counter example to STP. Specifically, it does not tell readers that causal independence is a necessary condition for the validity of the STP, (i.e., actions should not change the size of the subpopulations) and this independence is violated in the counterexample that Blyth contrived in 1972. (See

I will end with a humble recommendation to the editors of the Stanford Encyclopedia of Philosophy. Articles concerning causation should be written in a language that permits authors to distinguish causal from statistical dependence. I am sure future authors in this series would enjoy the freedom of saying “treatment does not change gender,” something they cannot say today, using Bayesian conditionalization. However, they will not do so on their own, unless you tell them (and their reviewers) explicitly that it is ok nowadays to deviate from the language of Reichenbach and Suppes and formally state: P(gender|do(treatment)) = P(gender).

Editorial guidance can play an incalculable role in the progress of science.

2. Comments on Spanos (2016)

In 1988, the British econometrician John Denis Sargan gave the following definition of an “economic model”: “A model is the specification of the probability distribution for a set of observations. A structure is the specification of the parameters of that distribution.” (Lectures on Advanced Econometric Theory (1988, p.27))

This definition, still cited in advanced econometric books (e.g., Cameron and Trivdi (2009) Microeconometrics) has served as a credo to a school of economics that has never elevated itself from the data-first paradigm of statistical thinking. Other prominent leaders of this school include Sir David Hendry, who wrote: “The joint density is the basis: SEMs (Structural Equation Models) are merely an interpretation of that.” Members of this school are unable to internalize the hard fact that statistics, however refined, cannot provide the information that economic models must encode to be of use to policy making. For them, a model is just a compact encoding of the density function underlying the data, so, two models encoding the same density function are deemed interchangeable.

Spanos article is a vivid example of how this statistics-minded culture copes with causal problems. Naturally, Spanos attributes the peculiarities of Simpson’s reversal to what he calls “statistical misspecification,” not to causal shortsightedness. “Causal” relationships do not exist in the models of Sargan’s school, so, if anything goes wrong, it must be “statistical misspecification,” what else? But what is this “statistical misspecification” that Spanos hopes would allow him to distinguish valid from invalid inference? I have read the paper several times, and for the life of me, it is beyond my ability to explain how the conditions that Spanos posits as necessary for “statistical adequacy” have anything to do with Simpson’s paradox. Specifically, I cannot see how “misspecified” data, which wrongly claims: “good for men, good for women, bad for people” suddenly becomes “well-specified” when we replace “gender” with “blood pressure”.

Spanos’ conditions for “statistical adequacy” are formulated in the context of the Linear Regression Model and invoke strictly statistical notions such as normality, linearity, independence etc. None of them applies to the binary case of {treatment, gender, outcome} in which Simpson’s paradox is usually cast. I therefore fail to see why replacing “gender” with “blood pressure” would turn an association from “spurious” to “trustworthy”.

Perhaps one of our readers can illuminate the rest of us how to interpret this new proposal. I am at a total loss.

For fairness, I should add that most economists that I know have second thoughts about Sargan’s definition, and claim to understand the distinction between structural and statistical models. This distinction, unfortunately, is still badly missing from econometric textbooks, see I am sure it will get there some day; Lady Science is forgiving, but what about economics students?

3. Memetea (2015)

Among the four papers under consideration, the one by Memetea, is by far the most advanced, comprehensive and forward thinking. As a thesis written in a philosophy department, Memetea treatise is unique in that it makes a serious and successful effort to break away from the cocoon of “probabilistic causality” and examines Simpson’s paradox to the light of modern causal inference, including graphical models, do-calculus, and counterfactual theories.

Memetea agrees with our view that the paradox is causal in nature, and that the tools of modern causal analysis are essential for its resolution. She disagrees however with my provocative claim that the paradox is “fully resolved”. The areas where she finds the resolution wanting are mediation cases in which the direct effect (DE) differs in sign from the total effect (TE). The classical example of such cases (Hesslow 1976) tells of a birth control pill that is suspected of producing thrombosis in women and, at the same time, has a negative indirect effect on thrombosis by reducing the rate of pregnancies (pregnancy is known to encourage thrombosis).

I have always argued that Hesslow’s example has nothing to do with Simpson’s paradox because it compares apples and oranges, namely, it compare direct vs. total effects where reversals are commonplace. In other words, Simpson’s reversal evokes no surprise in such cases. For example, I wrote, “we are not at all surprised when smallpox inoculation carries risks of fatal reaction, yet reduces overall mortality by irradicating smallpox. The direct effect (fatal reaction) in this case is negative for every subpopulation, yet the total effect (on mortality) is positive for the population as a whole.” (Quoted from When a conflict arises between the direct and total effects, the investigator need only decide what research question represents the practical aspects of the case in question and, once this is done, the appropriate graphical tools should be invoked to properly assess DE or TE. [Recall, complete algorithms are available for both, going beyond simple adjustment, and extending to other counterfactually defined effects (e.g., ETT, causes-of-effect, and more).]

Memetea is not satisfied with this answer. Her condition for resolving Simpson’s paradox requires that the analyst be told whether it is the direct or the total effect that should be the target of investigation. This would require, of course, that the model includes information about the investigator’s ultimate aims, whether alternative interventions are available (e.g. to prevent pregnancy), whether the study result will be used by a policy maker or a curious scientist, whether legal restrictions (e.g., on sex discrimination) apply to the direct or the total effect, and so on. In short, the entire spectrum of scientific and social knowledge should enter into the causal model before we can determine, in any given scenario, whether it is the direct or indirect effect that warrants our attention.

This is a rather tall order to satisfy given that our investigators are fairly good in determining what their research problem is. It should perhaps serve as a realizable goal for artificial intelligence researchers among us, who aim to build an automated scientist some day, capable of reasoning like our best investigators. I do not believe though that we need to wait for that day to declare Simpson’s paradox “resolved”. Alternatively, we can declare it resolved modulo the ability of investigators to define their research problems.

4. Comments on Bandyopadhyay, etal (2015)

There are several motivations behind the resistance to characterize Simpson’s paradox as a causal phenomenon. Some resist because causal relationships are not part of their scientific vocabulary, and some because they think they have discovered a more cogent explanation, which is perhaps easier to demonstrate or communicate.

Spanos’s article represents the first group, while Bandyopadhyay etal’s represents the second. They simulated Simpson’s reversal using urns and balls and argued that, since there are no interventions involved in this setting, merely judgment of conditional probabilities, the fact that people tend to make wrong judgments in this setting proves that Simpson’s surprise is rooted in arithmetic illusion, not in causal misinterpretation.

I have countered this argument in and I think it is appropriate to repeat the argument here.

“In explaining the surprise, we must first distinguish between ‘Simpson’s reversal’ and ‘Simpson’s paradox’; the former being an arithmetic phenomenon in the calculus of proportions, the latter a psychological phenomenon that evokes surprise and disbelief. A full understanding of Simpson’s paradox should explain why an innocent arithmetic reversal of an association, albeit uncommon, came to be regarded as `paradoxical,’ and why it has captured the fascination of statisticians, mathematicians and philosophers for over a century (though it was first labeled ‘paradox’ by Blyth (1972)) .

“The arithmetics of proportions has its share of peculiarities, no doubt, but these tend to become objects of curiosity once they have been demonstrated and explained away by examples. For instance, naive students of probability may expect the average of a product to equal the product of the averages but quickly learn to guard against such expectations, given a few counterexamples. Likewise, students expect an association measured in a mixture distribution to equal a weighted average of the individual associations. They are surprised, therefore, when ratios of sums, (a+b)/(c+d), are found to be ordered differently than individual ratios, a/c and b/d.1 Again, such arithmetic peculiarities are quickly accommodated by seasoned students as reminders against simplistic reasoning.

“In contrast, an arithmetic peculiarity becomes ‘paradoxical’ when it clashes with deeply held convictions that the peculiarity is impossible, and this occurs when one takes seriously the causal implications of Simpson’s reversal in decision-making contexts.  Reversals are indeed impossible whenever the third variable, say age or gender, stands for a pre-treatment covariate because, so the reasoning goes, no drug can be harmful to both males and females yet beneficial to the population as a whole. The universality of this intuition reflects a deeply held and valid conviction that such a drug is physically impossible.  Remarkably, such impossibility can be derived mathematically in the calculus of causation in the form of a ‘sure-thing’ theorem (Pearl, 2009, p. 181):

‘An action A that increases the probability of an event in each subpopulation (of C) must also increase the probability of B in the population as a whole, provided that the action does not change the distribution of the subpopulations.’2

“Thus, regardless of whether effect size is measured by the odds ratio or other comparisons, regardless of whether  is a confounder or not, and regardless of whether we have the correct causal structure on hand, our intuition should be offended by any effect reversal that appears to accompany the aggregation of data.

“I am not aware of another condition that rules out effect reversal with comparable assertiveness and generality, requiring only that Z not be affected by our action, a requirement satisfied by all treatment-independent covariates Z. Thus, it is hard, if not impossible, to explain the surprise part of Simpson’s reversal without postulating that human intuition is governed by causal calculus together with a persistent tendency to attribute causal interpretation to statistical associations.”

1. In Simpson’s paradox we witness the simultaneous orderings: (a1+b1)/(c1+d1)(a2+b2)/(c2+d2), (a1/c1)< (a2/c2), and (b1/d1)< (b2/d2)
2. The no-change provision is probabilistic; it permits the action to change the classification of individual units so long as the relative sizes of the subpopulations remain unaltered.

Final Remarks
I used to be extremely impatient with the slow pace in which causal ideas have been penetrating scientific communities that are not used to talk cause-and-effect. Recently, however, I re-read Thomas Kuhn’ classic The Structure of Schientific Revolution and I found there a quote that made me calm, content, even humorous and hopeful. Here it is:

—————- Kuhn —————-

“The transfer of allegiance from paradigm to paradigm is a conversion experience that cannot be forced. Lifelong resistance, particularly from those whose productive careers have committed them to an older tradition of normal science, is not a violation of scientific standards but an index to the nature of scientific research itself.”
p. 151

“Conversions will occur a few at a time until, after the last holdouts have died, the whole profession will again be practicing under a single, but now a different, paradigm.”
p. 152

We are now seeing the last holdouts.



July 24, 2016

External Validity and Extrapolations

Filed under: Generalizability,Selection Bias — bryantc @ 7:52 pm

Author: Judea Pearl

The July issue of the Proceedings of the National Academy of Sciences contains several articles on Causal Analysis in the age of Big Data, among them our (Bareinboim and Pearl’s) paper on data fusion and external validity. Several nuances of this problem were covered earlier on this blog under titles such as transportability, generalizability, extrapolation and selection-bias, see and

The PNAS paper has attracted the attention of the UCLA Newsroom which issued a press release with a very accessible description of the problem and its solution. You can find it here:

A few remarks:
I consider the mathematical solution of the external validity problem to be one of the real gems of modern causal analysis. The problem has its roots in the writings of 18th century demographers and its more recent awareness is usually associated with Campbell (1957) and Cook and Campbell (1979) writings on quasi-experiments. Our formal treatment of the problem using do-calculus has reduced it to a puzzle in logic and graph theory (see Bareinboim has further given this puzzle a complete algorithmic solution.

I said it is a gem because solving any problem instance gives me as much pleasure as solving a puzzle in ancient Greek geometry. It is in fact more fun than solving geometry problems, for two reasons.

First, when you stare at any external validity problem you do not have a clue whether it has or does not have a solution (i.e., whether an externally valid estimate exists or not) yet after a few steps of analysis — Eureka — the answer shines at you with clarity and says: “how could you have missed me?”. It is like communicating secretly with the oracle of Delphi, who whispers in your ears: “trisecting an angle?” forget it; “trisecting a line segment?” I will show you how. A miracle!

Second, while geometrical construction problems reside in the province of recreational mathematics, external validity is a serious matter; it has practical ramifications in every branch of science.

My invitation to readers of this blog: Anyone with intellectual curiosity and a thrill for mathematical discovery, please join us in the excitement over the mathematical solution of the external validity problem. Try it, and please send us your impressions.

It is hard for me to predict when scientists who critically need solutions to real-life extrapolation problems would come to recognize that an elegant and complete solution now exists for them. Most of these scientists (e.g., Campbell’s disciples) do not read graphs and cannot therefore heed my invitation. Locked in a graph-deprived vocabulary, they are left to struggle with meta-analytic techniques or opaque re-calibration routines (see waiting perhaps for a more appealing invitation to discover the availability of a solution to their problems.

It will be interesting to see how long it would take, in the age of internet.

July 9, 2016

The Three Layer Causal Hierarchy

Filed under: Causal Effect,Counterfactual,Discussion,structural equations — bryantc @ 8:57 pm

Recent discussions concerning causal mediation gave me the impression that many researchers in the field are not familiar with the ramifications of the Causal Hierarchy, as articulated in Chapter 1 of Causality (2000, 2009). This note presents the Causal Hierarchy in table form (Fig. 1) and discusses the distinctions between its three layers: 1. Association, 2. Intervention, 3. Counterfactuals.


June 28, 2016

On the Classification and Subsumption of Causal Models

Filed under: Causal Effect,Counterfactual,structural equations — bryantc @ 5:32 pm

From Christos Dimitrakakis:

>> To be honest, there is such a plethora of causal models, that it is not entirely clear what subsumes what, and which one is equivalent to what. Is there a simple taxonomy somewhere? I thought that influence diagrams were sufficient for all causal questions, for example, but one of Pearl’s papers asserts that this is not the case.

Reply from J. Pearl:

Dear Christos,

From my perspective, I do not see a plethora of causal models at all, so it is hard for me to answer your question in specific terms. What I do see is a symbiosis of all causal models in one framework, called Structural Causal Model (SCM) which unifies structural equations, potential outcomes, and graphical models. So, for me, the world appears simple, well organized, and smiling. Perhaps you can tell us what models lured your attention and caused you to see a plethora of models lacking subsumption taxonomy.

The taxonomy that has helped me immensely is the three-level hierarchy described in chapter 1 of my book Causality: 1. association, 2. intervention, and 3 counterfactuals. It is a useful hierarchy because it has an objective criterion for the classification: You cannot answer questions at level i unless you have assumptions from level i or higher.

As to influence diagrams, the relations between them and SCM is discussed in Section 11.6 of my book Causality (2009), Influence diagrams belong to the 2nd layer of the causal hierarchy, together with Causal Bayesian Networks. They lack however two facilities:

1. The ability to process counterfactuals.
2. The ability to handle novel actions.

To elaborate,

1. Counterfactual sentences (e.g., Given what I see, I should have acted differently) require functional models. Influence diagrams are built on conditional and interventional probabilities, that is, p(y|x) or p(y|do(x)). There is no interpretation of E(Y_x| x’) in this framework.

2. The probabilities that annotate links emanating from Action Nodes are interventional type, p(y|do(x)), that must be assessed judgmentally by the user. No facility is provided for deriving these probabilities from data together with the structure of the graph. Such a derivation is developed in chapter 3 of Causality, in the context of Causal Bayes Networks where every node can turn into an action node.

Using the causal hierarchy, the 1st Law of Counterfactuals and the unification provided by SCM, the space of causal models should shine in clarity and simplicity. Try it, and let us know of any questions remaining.


June 21, 2016

Spring Greeting from the UCLA Causality Blog

Filed under: Announcement — bryantc @ 3:13 am

Dear friends in causality research,
This Spring Greeting from UCLA Causality blog contains:
A. News items concerning causality research,
B. New postings, new problems and some solutions.

The American Statistical Association (ASA) has announced recipients of the 2016 “Causality in Statistics Education Award”.
Congratulations go to Onyebuchi Arah and Arvid Sjolander who will receive this Award in July, at the 2016 JSM meeting in Chicago.
For details of purpose and selection criteria, see

I will be giving another tutorial at the 2016 JSM meeting, titled “Causal Inference in Statistics: A Gentle Introduction.”
Details and Abstract can be viewed here:

A3. Causal Inference — A Primer
For the many readers who have inquired, the print version of our new book “Causal Inference in Statistics – A Primer” is now up and running on Amazon and Wiley, and is awaiting your reviews, your questions and suggestions. We have posted a book page for this very purpose, which includes selected excerpts from each chapter, errata and updates, and a sample homework solution manual.

The errata page was updated recently under the diligent eye of Adamo Vincenzo. Thank you Adamo!

The Solution Manual will be available for instructors and will incorporate software solutions based on a DAGitty R package, authored by Johannes Textor.  See

Vol. 4 Issue 2 of the Journal of Causal Inference (JCI) is scheduled to appear in September 2018. The current issue can be viewed here: My own contribution to the current issue discusses Savage’s Sure Thing Principle and its ramifications to causal reasoning.

As always, submissions are welcome on all aspects of causal analysis, especially those deemed foundational. Chances of acceptance are inversely proportional to the time it takes a reviewer to figure out what problem the paper attempts to solve. So, please be transparent.

Recollections from the WCE conference at Stanford.

On May 21, Kosuke Imai and I participated in a panel on Mediation, at the annual meeting of the West Coast Experiment Conference, organized by Stanford Graduate School of Business.

Some of my recollections are summarized on our Causality Blog here:

B2. Generalizing Experimental findings
In light of new results concerning generalizability and selection bias, our team has updated the “external validity” entry of wikipedia. Previously, the entry was all about threats to validity, with no word on how those threats can be circumvented. You may wish to check this entry for accuracy and possible extensions.

B3. Causality celebrates its 10,000 citations
According to Google Scholar,, my book Causality (Cambridge, 2000, 2009) has crossed the symbolic mark of 10,000 citations. To celebrate this numerological event, I wish to invite all readers of this blog to an open online party with the beer entirely on me. I dont exactly know how to choreograph such a huge party, or how to make sure that each of you gets a fair share of the inspiration (or beer). So, please send creative suggestions for posting on this blog.

On a personal note: I am extremely gratified by this sign of receptiveness, and I thank readers of Causality for their comments, questions, corrections and reservations which have helped bring this book to its current shape (see


June 20, 2016

Recollections from the WCE conference at Stanford

Filed under: Counterfactual,General,Mediated Effects,structural equations — bryantc @ 7:45 am

On May 21, Kosuke Imai and I participated in a panel on Mediation, at the annual meeting of the West Coast Experiment Conference, organized by Stanford Graduate School of Business The following are some of my recollections from that panel.

We began the discussion by reviewing causal mediation analysis and summarizing the exchange we had on the pages of Psychological Methods (2014)

My slides for the panel can be viewed here:

We ended with a consensus regarding the importance of causal mediation and the conditions for identifying of Natural Direct and Indirect Effects, from randomized as well as observational studies.

We proceeded to discuss the symbiosis between the structural and the counterfactual languages. Here I focused on slides 4-6 (page 3), and remarked that only those who are willing to solve a toy problem from begining to end, using both potential outcomes and DAGs can understand the tradeoff between the two. Such a toy problem (and its solution) was presented in slide 5 (page 3) titled “Formulating a problem in Three Languages” and the questions that I asked the audience are still ringing in my ears. Please have a good look at these two sets of assumptions and ask yourself:

a. Have we forgotten any assumption?
b. Are these assumptions consistent?
c. Is any of the assumptions redundant (i.e. does it follow logically from the others)?
d. Do they have testable implications?
e. Do these assumptions permit the identification of causal effects?
f. Are these assumptions plausible in the context of the scenario given?

As I was discussing these questions over slide 5, the audience seemed to be in general agreement with the conclusion that, despite their logical equivalence, the graphical language  enables  us to answer these questions immediately while the potential outcome language remains silent on all.

I consider this example to be pivotal to the comparison of the two frameworks. I hope that questions a,b,c,d,e,f will be remembered, and speakers from both camps will be asked to address them squarely and explicitly .

The fact that graduate students made up the majority of the participants gives me the hope that questions a,b,c,d,e,f will finally receive the attention they deserve.

As we discussed the virtues of graphs, I found it necessary to reiterate the observation that DAGs are more than just “natural and convenient way to express assumptions about causal structures” (Imbens and Rubin , 2013, p. 25). Praising their transparency while ignoring their inferential power misses the main role that graphs play in causal analysis. The power of graphs lies in computing complex implications of causal assumptions (i.e., the “science”) no matter in what language they are expressed.  Typical implications are: conditional independencies among variables and counterfactuals, what covariates need be controlled to remove confounding or selection bias, whether effects can be identified, and more. These implications could, in principle, be derived from any equivalent representation of the causal assumption, not necessarily graphical, but not before incurring a prohibitive computational cost. See, for example, what happens when economists try to replace d-separation with graphoid axioms

Following the discussion of representations, we addressed questions posed to us by the audience, in particular, five questions submitted by Professor Jon Krosnick (Political Science, Stanford).

I summarize them in the following slide:

Krosnick’s Questions to Panel
1) Do you think an experiment has any value without mediational analysis?
2) Is a separate study directly manipulating the mediator useful? How is the second study any different from the first one?
3) Imai’s correlated residuals test seems valuable for distinguishing fake from genuine mediation. Is that so? And how it is related to traditional mediational test?
4) Why isn’t it easy to test whether participants who show the largest increases in the posited mediator show the largest changes in the outcome?
5) Why is mediational analysis any “worse” than any other method of investigation?
My answers focused on question 2, 4 and 5, which I summarize below:

Q. Is a separate study directly manipulating the mediator useful?
Answer: Yes, it is useful if physically feasible but, still, it cannot give us an answer to the basic mediation question: “What percentage of the observed response is due to mediation?” The concept of mediation is necessarily counterfactual, i.e. sitting on the top layer of the causal hierarchy (see “Causality” chapter 1). It cannot be defined therefore in terms of population experiments, however clever. Mediation can be evaluated with the help of counterfactual assumptions such as “conditional ignorability” or “no interaction,” but these assumptions cannot be verified in population experiments.

Q. Why isn’t it easy to test whether participants who show the largest increases in the posited mediator show the largest changes in the outcome?
Answer: Translating the question to counterfactual notation the test suggested requires the existence of monotonic function f_m such that, for every individual, we have Y_1 – Y_0 =f_m (M_1 – M_0)

This condition expresses a feature we expect to find in mediation, but it cannot be taken as a DEFINITION of mediation. This condition is essentially the way indirect effects are defined in the Principal Strata framework (Frangakis and Rubin, 2002) the deficiencies of which are well known. See

In particular, imagine a switch S controlling two light bulbs L1 and L2. Positive correlation between L1 and L2 does not mean that L1 mediates between the switch and L2. Many examples of incompatibility are demonstrated in the paper above.

The conventional mediation tests (in the Baron and Kenny tradition) suffer from the same problem; they test features of mediation that are common in linear systems, but not the essence of mediation which is universal to all systems, linear and nonlinear, continuous as well as categorical variables.

Q. Why is mediational analysis any “worse” than any other method of investigation?
Answer: The answer is closely related to the one given to question 3). Mediation is not a “method” but a property of the population which is defined counterfactually, and therefore requires counterfactual assumption for evaluation. Experiments are not sufficient; and in this sense mediation is “worse” than other properties under investigation, eg., causal effects, which can be estimated entirely from experiments.

About the only thing we can ascertain experimentally is whether the (controlled) direct effect differs from the total effect, but we cannot evaluate the extent of mediation.

Another way to appreciate why stronger assumptions are needed for mediation is to note that non-confoundedness is not the same as ignorability. For non-binary variables one can construct examples where X and Y are not confounded ( i.e., P(y|do(x))= P(y|x)) and yet they are not ignorable, (i.e., Y_x is not independent of X.) Mediation requires ignorability in addition to nonconfoundedness.

Overall, the panel was illuminating, primarily due to the active participation of curious students. It gave me good reasons to believe that Political Science is destined to become a bastion of modern causal analysis. I wish economists would follow suit, despite the hurdles they face in getting causal analysis to economics education.


June 10, 2016

Post-doc Causality and Machine Learning

Filed under: Announcement — bryantc @ 7:58 am

We received the following announcement from Isabelle Guyon (UPSud/INRIA):

The Machine Learning and Optimization (TAO) group of the Laboratory of Research in Informatics (LRI) is seeking a postdoctoral researcher for working at the interface of machine learning and causal modeling to support scientific discovery and computer assisted decision making using big data. The researcher will work with an interdisciplinary group including Isabelle Guyon (UPSud/INRIA), Cecile Germain UPSud), Balazs Kegl (CNRS), Antoine Marot (RTE), Patrick Panciatici (RTE), Marc Schoenauer (INRIA), Michele Sebag (CNRS), and Olivier Teytaud (INRIA).

Some research directions we want to pursue include: extending the formulation of causal discovery as a pattern recognition problem (developed through the ChaLearn cause-effect pairs challenge) to times series and spatio-temporal data; combining feature learning using deep learning methods with the creation of cause-effect explanatory models; furthering the unification of structural equation models and reinforcement learning approaches; and developing interventional learning algorithms.

As part of the exciting applications we are working on, we will be leveraging a long term collaboration with the company RTE (French Transmission System Operator for electricity). With the current limitations on adding new transportation lines, the opportunity to use demand response, and the advent of renewable energies interfaced through fast power electronics to the grid, there is an urgent need to adapt the historical way to operate the electricity power grid. The candidate will have the opportunity to use a combination of historical data (several years of data for the entire RTE network sampled every 5 minutes) and very accurate simulations (precise at the MW level), to develop causal models capable of identifying strategies to prevent or to mitigate the impact of incidents on the network as well as inferring what would have happened if we had intervened (i.e., counterfactual).Other applications we are working on with partner laboratories include epidemiology studies about diabetes and happiness in the workplace, modeling embryologic development, modeling high energy particle collision, analyzing human behavior in videos, and game playing.

The candidate will also be part of the Paris-Saclay Center of Data Science and will be expected to participate in the mission of the center through its activities, including organizing challenges on machine learning, and help advising PhD students.

We are accepting candidates with background in machine learning, reinforcement learning, causality, statistics, scientific modeling, physics, and other neighboring disciplines. The candidate should have the ability of working on cross-disciplinary problems, have a strong math background, and the experience or strong desire to work on practical problems.

The TAO group ( conducts interdisciplinary research in theory, algorithms, and applications of machine learning and optimization and it has also strong ties with AppStat the physics machine learning group of the Linear Accelerator Laboratory ( Both laboratories are part of the University Paris-Saclay, located in the outskirts of Paris. The position is available for a period of three years, starting in (the earliest) September, 2016. The monthly salary is around 2500 Euros per month. Interested candidates should send a motivation letter, a CV, and the names and addresses of three referees to Isabelle Guyon.

Contact: Isabelle Guyon (
Deadline: June 30, 2016, then every in 2 weeks until the position is filled.

« Previous PageNext Page »

Powered by WordPress