Causal Analysis in Theory and Practice

December 13, 2018

Winter Greetings from the UCLA Causality Blog

Filed under: Announcement,Book (J Pearl),General — Judea Pearl @ 11:37 pm

Dear friends in causality research,

In the past 5 months, since the publication of The Book of Why http://bayes.cs.ucla.edu/WHY/ I have been involved in conversations with many inquisitive readers on Twitter @yudapearl and have not been able to update our blog as frequently as I should. I am glad to return to this forum and update it with the major developments since July, 2018.

1.
Initial reviews of the Book of Why are posted on its trailer page http://bayes.cs.ucla.edu/WHY/ They vary from technical discussions to philosophical speculations, from relationships to machine learning to debates about the supremacy of randomized contolled trials.

2.
A search-able file of all my 750 tweets is available here: https://ucla.in/2Kz0FoY. It can be used for (1) extracting talking points, adages and arguments in the defense of causal inference, and (2) understanding the thinking of neighboring cultures, e.g., statistics, epidemiology, economics, deep learning and reinforcement learning, primarily on issues of transparency, testability, manipulability, do-expressions and counterfactuals.

3.
The 6th printing of the Book Of Why is now available, with corrections to all errors and typos discovered up to Oct. 29, 2018. To check that you have the latest printing, make sure the last line on the copywright page ends with … 8 7 6

4.
Please examine the latest papers and reports from our brewry:

R-484 Pearl, “Causal and Counterfactual Inference,” Forthcoming section in The Handbook of Rationality, MIT press. https://ucla.in/2Iz9myt

R-484 Pearl, “A note on oxygen, matches and fires, On Non-manipulable Causes,” September 2018. https://ucla.in/2Qb1h6v

R-483 Pearl, “Does Obesity Shorten Life? Or is it the Soda? On Non-manipulable Causes,” https://ucla.in/2EpxcNU Journal of Causal Inference, 6(2), online, September 2018.

R-481 Pearl, “The Seven Tools of Causal Inference with Reflections on Machine Learning,” July 2018 https://ucla.in/2umzd65 Forthcoming, Communications of ACM.

R-479 Cinelli and Pearl, “On the utility of causal diagrams in modeling attrition: a practical example,” April 2018. https://ucla.in/2L8KAWw Forthcoming, Journal of Epidemiology.

R-478 Pearl and Bareinboim, “A note on `Generalizability of Study Results’,” April 2018. Forthcoming, Journal of Epidemiology. https://ucla.in/2NIsI6B

Earlier papers can be found here: http://bayes.cs.ucla.edu/csl_papers.html

5.
I wish in particular to call attention to the introduction of R-478, https://ucla.in/2NIsI6B. It provides a “three bullets” recipe for comparing
the structural and potential outcome frameworks:

* To determine if there exist sets of covariates $W$ that satisfy “conditional exchangeability”
** To estimate causal parameters at the target population in cases where such sets $W$ do not exist, and
*** To decide if one’s modeling assumptions are compatible with the available data.

I have listed the “three bullets” above in the hope that they serve to facilitate and concretize future conversations with our neighbors from the potential outcome framework.

6. We are informed of a most relevant workshop: AAAI-WHY 2019, March 26-27, Stanford, CA. The 2019 AAAI Spring Symposium will host a new workshop: Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI. See https://why19.causalai.net. Submissions due December 17, 2018

Greetings and Happy Holidays
Judea

Comments (0)

January 10, 2018

2018 Winter Update

Filed under: Announcement,General — Judea Pearl @ 10:07 pm

Dear friends in causality research,

Welcome to the 2018 Winter Greeting from the UCLA Causality Blog. This greeting discusses the following topics:

1. A report is posted, on the “What If” workshop at the NIPS conference (see December 19, 2017 post below). It discusses my presentation of: Theoretical Impediments to Machine Learning, a newly revised version of which can be viewed here. [http://ftp.cs.ucla.edu/pub/stat_ser/r475.pdf]

2. New posting: “Facts and Fiction from the Missing Data Framework”. We are inviting discussion of two familiar mantras:
Mantra-1. “The role of missing data analysis in causal inference is well understood (eg causal inference theory based on counterfactuals relies on the missing data framework).
and
Mantra-2. “while missing data methods can form tools for causal inference, the converse cannot be true.”

We explain why we believe both mantras to be false, but we would like to hear you opinion before firming up our minds.

3. A review paper is available here:
http://ftp.cs.ucla.edu/pub/stat_ser/r473-L.pdf
Titled: “Graphical Models for Processing Missing Data.” It explains and demonstrates why missing data is a causal inference problem.

4. A new page is now up, providing information on “The Book of Why”
http://bayes.cs.ucla.edu/WHY/
It contains Table of Contents and excerpts from the book.

5. Nominations are now open for the ASA Causality in Education Award. The nomination deadline is March 1, 2018. For more information, please see
http://www.amstat.org/education/causalityprize/.

6. For those of us who were waiting patiently for the Korean translation of Primer — our long wait is finally over. The book is available now in colorful cover and in optimistic North Korean accent.
http://www.kyowoo.co.kr/02_sub/view.php?p_idx=1640&cate=0014_0017_

Don’t miss the gentlest introduction to causal inference.
http://bayes.cs.ucla.edu/PRIMER/

Enjoy, and have a productive 2018.
JP

Comments (4)

December 19, 2017

NIPS 2017: Q&A Follow-up

Filed under: Conferences,General — Judea Pearl @ 6:42 am

Dear friends in causal research,

Last week I spoke at a workshop on machine learning and causality, which followed the NIPS conference in Long Beach. Below please find my response to several questions I was asked

after my talk. I hope you will find the questions and answers to be of relevance to issues discussed on this blog.

-Judea

———————————————–

To: Participants at the NIPS “What If” workshop

Dear friends,

Some of you asked me for copies of my slides. I am attaching them with this message, and you can get the accompanying paper by clicking here:
http://ftp.cs.ucla.edu/pub/stat_ser/r475.pdf

NIPS 17 – What If? Workshop Slides (PDF)

NIPS 17 – What If? Workshop Slides (PPT [zipped])

I have also received interesting questions at the end of my talk, which I could not fully answer in the short break we had. I will try to answer them below.

Q.1. What do you mean by the “Causal Revolution”?
Ans.1: “Revolution” is a poetic word to summarize Gary King’s observation: “More has been learned about causal inference in the last few decades than the sum total of everything that had been learned about it in all prior recorded history” (see cover of Morgan and Winship’s book, 2015). It captures the miracle that only three decades ago we could not write a formula for: “Mud does not
cause Rain” and, today, we can formulate and estimate every causal or counterfactual statement.

Q2: Are the estimates produced by graphical models the same as those produced by the potential outcome approach?
Ans.2: Yes, provided the two approaches start with the same set of assumptions. The assumptions in the graphical approach are advertised in the graph, while those in the potential outcome approach are articulated separately by the investigator, using counterfactual vocabulary.

Q3: The method of imputing potential outcomes to individual units in a table appears totally different from the methods used in the graphical approach. Why the difference?
Ans.3: Imputation works only when certain assumptions of conditional ignorability hold. The table itself does not show us what the assumption are, nor what they mean. To see what they mean we need a graph, since no mortal can process such assumptions in his/her head. The apparent difference in procedures reflects the insistence (in the graphical framework) on seeing the assumptions, rather than wishing them away.

Q4: Some say that economists do not use graphs because their problems are different, and they cannot afford to model the entire economy. Do you agree with this explanation?
Ans.4: No way! Mathematically speaking, economic problems are no different from those faced by epidemiologists (or other social scientists) for whom graphical models have become a second language. Moreover, epidemiologists have never complained that graphs force them to model the entirety of the human anatomy. Graph-avoidance among (some) economists is a cultural phenomenon, reminiscent of telescope-avoidance among Church astronomers in 17th century Italy. Bottom line: epidemiologists can judge the plausibility of their assumptions — graph-avoiding economists cannot. (I have offered them many opportunities to demonstrate it in public, and I don’t blame them for remaining silent; it is not a problem that can be managed by an unaided intellect)

Q.5: Isn’t deep-learning more than just glorified curve-fitting? After all, the objective of curve-fitting is to maximize “fit”, while in deep-learning much effort goes into minimizing “over-fit”.
Ans.5: No matter what acrobatics you go through to minimize overfitting or other flaws in your learning strategy, you are still optimizing some property of the observed data while making no reference to the world outside the data. This puts you right back on rung-1 of the Ladder of Causation with all the limitations that rung-1 entails.

If you have additional questions on these or other topics, feel free to post them here on our blog causality.cs.ucla.edu/blog, (anonymity will be respected), and I will try my best to answer them.

Enjoy,
Judea
———————————————–

Comments (2)

June 20, 2016

Recollections from the WCE conference at Stanford

Filed under: Counterfactual,General,Mediated Effects,structural equations — bryantc @ 7:45 am

On May 21, Kosuke Imai and I participated in a panel on Mediation, at the annual meeting of the West Coast Experiment Conference, organized by Stanford Graduate School of Business http://www.gsb.stanford.edu/facseminars/conferences/west-coast-experiments-conference. The following are some of my recollections from that panel.

1.
We began the discussion by reviewing causal mediation analysis and summarizing the exchange we had on the pages of Psychological Methods (2014)
http://ftp.cs.ucla.edu/pub/stat_ser/r389-imai-etal-commentary-r421-reprint.pdf

My slides for the panel can be viewed here:
http://web.cs.ucla.edu/~kaoru/stanford-may2016-bw.pdf

We ended with a consensus regarding the importance of causal mediation and the conditions for identifying of Natural Direct and Indirect Effects, from randomized as well as observational studies.

2.
We proceeded to discuss the symbiosis between the structural and the counterfactual languages. Here I focused on slides 4-6 (page 3), and remarked that only those who are willing to solve a toy problem from begining to end, using both potential outcomes and DAGs can understand the tradeoff between the two. Such a toy problem (and its solution) was presented in slide 5 (page 3) titled “Formulating a problem in Three Languages” and the questions that I asked the audience are still ringing in my ears. Please have a good look at these two sets of assumptions and ask yourself:

a. Have we forgotten any assumption?
b. Are these assumptions consistent?
c. Is any of the assumptions redundant (i.e. does it follow logically from the others)?
d. Do they have testable implications?
e. Do these assumptions permit the identification of causal effects?
f. Are these assumptions plausible in the context of the scenario given?

As I was discussing these questions over slide 5, the audience seemed to be in general agreement with the conclusion that, despite their logical equivalence, the graphical language enables us to answer these questions immediately while the potential outcome language remains silent on all.

I consider this example to be pivotal to the comparison of the two frameworks. I hope that questions a,b,c,d,e,f will be remembered, and speakers from both camps will be asked to address them squarely and explicitly .

The fact that graduate students made up the majority of the participants gives me the hope that questions a,b,c,d,e,f will finally receive the attention they deserve.

3.
As we discussed the virtues of graphs, I found it necessary to reiterate the observation that DAGs are more than just “natural and convenient way to express assumptions about causal structures” (Imbens and Rubin , 2013, p. 25). Praising their transparency while ignoring their inferential power misses the main role that graphs play in causal analysis. The power of graphs lies in computing complex implications of causal assumptions (i.e., the “science”) no matter in what language they are expressed. Typical implications are: conditional independencies among variables and counterfactuals, what covariates need be controlled to remove confounding or selection bias, whether effects can be identified, and more. These implications could, in principle, be derived from any equivalent representation of the causal assumption, not necessarily graphical, but not before incurring a prohibitive computational cost. See, for example, what happens when economists try to replace d-separation with graphoid axioms http://ftp.cs.ucla.edu/pub/stat_ser/r420.pdf.

4.
Following the discussion of representations, we addressed questions posed to us by the audience, in particular, five questions submitted by Professor Jon Krosnick (Political Science, Stanford).

I summarize them in the following slide:

Krosnick’s Questions to Panel
———————————————-
1) Do you think an experiment has any value without mediational analysis?
2) Is a separate study directly manipulating the mediator useful? How is the second study any different from the first one?
3) Imai’s correlated residuals test seems valuable for distinguishing fake from genuine mediation. Is that so? And how it is related to traditional mediational test?
4) Why isn’t it easy to test whether participants who show the largest increases in the posited mediator show the largest changes in the outcome?
5) Why is mediational analysis any “worse” than any other method of investigation?
———————————————-
My answers focused on question 2, 4 and 5, which I summarize below:

2)
Q. Is a separate study directly manipulating the mediator useful?
Answer: Yes, it is useful if physically feasible but, still, it cannot give us an answer to the basic mediation question: “What percentage of the observed response is due to mediation?” The concept of mediation is necessarily counterfactual, i.e. sitting on the top layer of the causal hierarchy (see “Causality” chapter 1). It cannot be defined therefore in terms of population experiments, however clever. Mediation can be evaluated with the help of counterfactual assumptions such as “conditional ignorability” or “no interaction,” but these assumptions cannot be verified in population experiments.

4)
Q. Why isn’t it easy to test whether participants who show the largest increases in the posited mediator show the largest changes in the outcome?
Answer: Translating the question to counterfactual notation the test suggested requires the existence of monotonic function f_m such that, for every individual, we have Y_1 – Y_0 =f_m (M_1 – M_0)

This condition expresses a feature we expect to find in mediation, but it cannot be taken as a DEFINITION of mediation. This condition is essentially the way indirect effects are defined in the Principal Strata framework (Frangakis and Rubin, 2002) the deficiencies of which are well known. See http://ftp.cs.ucla.edu/pub/stat_ser/r382.pdf.

In particular, imagine a switch S controlling two light bulbs L1 and L2. Positive correlation between L1 and L2 does not mean that L1 mediates between the switch and L2. Many examples of incompatibility are demonstrated in the paper above.

The conventional mediation tests (in the Baron and Kenny tradition) suffer from the same problem; they test features of mediation that are common in linear systems, but not the essence of mediation which is universal to all systems, linear and nonlinear, continuous as well as categorical variables.

5)
Q. Why is mediational analysis any “worse” than any other method of investigation?
Answer: The answer is closely related to the one given to question 3). Mediation is not a “method” but a property of the population which is defined counterfactually, and therefore requires counterfactual assumption for evaluation. Experiments are not sufficient; and in this sense mediation is “worse” than other properties under investigation, eg., causal effects, which can be estimated entirely from experiments.

About the only thing we can ascertain experimentally is whether the (controlled) direct effect differs from the total effect, but we cannot evaluate the extent of mediation.

Another way to appreciate why stronger assumptions are needed for mediation is to note that non-confoundedness is not the same as ignorability. For non-binary variables one can construct examples where X and Y are not confounded ( i.e., P(y|do(x))= P(y|x)) and yet they are not ignorable, (i.e., Y_x is not independent of X.) Mediation requires ignorability in addition to nonconfoundedness.

Summary
Overall, the panel was illuminating, primarily due to the active participation of curious students. It gave me good reasons to believe that Political Science is destined to become a bastion of modern causal analysis. I wish economists would follow suit, despite the hurdles they face in getting causal analysis to economics education.
http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf
http://ftp.cs.ucla.edu/pub/stat_ser/r395.pdf

Judea

Comments (3)

February 12, 2016

Winter Greeting from the UCLA Causality Blog

Filed under: Announcement,Book (J Pearl),General,structural equations,Uncategorized — bryantc @ 5:04 pm

Friends in causality research,
This greeting from the UCLA Causality blog contains:

A. An introduction to our newly published book, Causal Inference in Statistics – A Primer, Wiley 2016 (with M. Glymour and N. Jewell)
B. Comments on two other books: (1) R. Klein’s Structural Equation Modeling and (2) L Pereira and A. Saptawijaya’s on Machine Ethics.
C. News, Journals, awards and other frills.

A.
Our publisher (Wiley) has informed us that the book “Causal Inference in Statistics – A Primer” by J. Pearl, M. Glymour and N. Jewell is already available on Kindle, and will be available in print Feb. 26, 2016.
http://www.amazon.com/Causality-A-Primer-Judea-Pearl/dp/1119186846
http://www.amazon.com/Causal-Inference-Statistics-Judea-Pearl-ebook/dp/B01B3P6NJM/ref=mt_kindle?_encoding=UTF8&me=

This book introduces core elements of causal inference into undergraduate and lower-division graduate classes in statistics and data-intensive sciences. The aim is to provide students with the understanding of how data are generated and interpreted at the earliest stage of their statistics education. To that end, the book empowers students with models and tools that answer nontrivial causal questions using vivid examples and simple mathematics. Topics include: causal models, model testing, effects of interventions, mediation and counterfactuals, in both linear and nonparametric systems.

The Table of Contents, Preface and excerpts from the four chapters can be viewed here:
http://bayes.cs.ucla.edu/PRIMER/
A book website providing answers to home-works and interactive computer programs for simulation and analysis (using dagitty) is currently under construction.

B1
We are in receipt of the fourth edition of Rex Kline’s book “Principles and Practice of Structural Equation Modeling”, http://psychology.concordia.ca/fac/kline/books/nta.pdf

This book is unique in that it treats structural equation models (SEMs) as carriers of causal assumptions and tools for causal inference. Gone are the inhibitions and trepidation that characterize most SEM texts in their treatments of causation.

To the best of my knowledge, Chapter 8 in Kline’s book is the first SEM text to introduce graphical criteria for parameter identification — a long overdue tool
in a field that depends on identifiability for model “fitting”. Overall, the book elevates SEM education to new heights and promises to usher a renaissance for a field that, five decades ago, has pioneered causal analysis in the behavioral sciences.

B2
Much has been written lately on computer ethics, morality, and free will. The new book “Programming Machine Ethics” by Luis Moniz Pereira and Ari Saptawijaya formalizes these concepts in the language of logic programming. See book announcement http://www.springer.com/gp/book/9783319293530. As a novice to the literature on ethics and morality, I was happy to find a comprehensive compilation of the many philosophical works on these topics, articulated in a language that even a layman can comprehend. I was also happy to see the critical role that the logic of counterfactuals plays in moral reasoning. The book is a refreshing reminder that there is more to counterfactual reasoning than “average treatment effects”.

C. News, Journals, awards and other frills.
C1.
Nominations are Invited for the Causality in Statistics Education Award (Deadline is February 15, 2016).

The ASA Causality in Statistics Education Award is aimed at encouraging the teaching of basic causal inference in introductory statistics courses. Co-sponsored by Microsoft Research and Google, the prize is motivated by the growing importance of introducing core elements of causal inference into undergraduate and lower-division graduate classes in statistics. For more information, please see http://www.amstat.org/education/causalityprize/ .

Nominations and questions should be sent to the ASA office at educinfo@amstat.org . The nomination deadline is February 15, 2016.

C.2.
Issue 4.1 of the Journal of Causal Inference is scheduled to appear March 2016, with articles covering all aspects of causal analysis. For mission, policy, and submission information please see: http://degruyter.com/view/j/jci

C.3
Finally, enjoy new results and new insights posted on our technical report page: http://bayes.cs.ucla.edu/csl_papers.html

Judea

Comments (2)

August 11, 2015

Mid-Summer Greeting from the UCLA Causality Blog

Filed under: Announcement,Causal Effect,Counterfactual,General — moderator @ 6:09 pm

Friends in causality research,

This mid-summer greeting of UCLA Causality blog contains:
A. News items concerning causality research
B. Discussions and scientific results

1. The next issue of the Journal of Causal Inference is scheduled to appear this month, and the table of content can be viewed here.

2. A new digital journal “Observational Studies” is out this month (link) and its first issue is dedicated to the legacy of William Cochran (1909-1980).

My contribution to this issue can be viewed here:
http://ftp.cs.ucla.edu/pub/stat_ser/r456.pdf

April 24, 2015

Flowers of the First Law of Causal Inference (3)

Filed under: Counterfactual,do-calculus,General,Generalizability,structural equations — moderator @ 8:50 pm

Flower 3 — Generalizing experimental findings

Continuing our examination of “the flowers of the First Law” (see previous flowers here and here) this posting looks at one of the most crucial questions in causal inference: “How generalizable are our randomized clinical trials?” Readers of this blog would be delighted to learn that one of our flowers provides an elegant and rather general answer to this question. I will describe this answer in the context of transportability theory, and compare it to the way researchers have attempted to tackle the problem using the language of ignorability. We will see that ignorability-type assumptions are fairly limited, both in their ability to define conditions that permit generalizations, and in our ability to justify them in specific applications.

1. Transportability and Selection Bias
The problem of generalizing experimental findings from the trial sample to the population as a whole, also known as the problem of “sample selection-bias” (Heckman, 1979; Bareinboim et al., 2014), has received wide attention lately, as more researchers come to recognize this bias as a major threat to the validity of experimental findings in both the health sciences (Stuart et al., 2015) and social policy making (Manski, 2013).

Since participation in a randomized trial cannot be mandated, we cannot guarantee that the study population would be the same as the population of interest. For example, the study population may consist of volunteers, who respond to financial and medical incentives offered by pharmaceutical firms or experimental teams, so, the distribution of outcomes in the study may differ substantially from the distribution of outcomes under the policy of interest.

Another impediment to the validity of experimental finding is that the types of individuals in the target population may change over time. For example, as more individuals become eligible for health insurance, the types of individuals seeking services would no longer match the type of individuals that were sampled for the study. A similar change would occur as more individuals become aware of the efficacy of the treatment. The result is an inherent disparity between the target population and the population under study.

The problem of generalizing across disparate populations has received a formal treatment in (Pearl and Bareinboim, 2014) where it was labeled “transportability,” and where necessary and sufficient conditions for valid generalization were established (see also Bareinboim and Pearl, 2013). The problem of selection bias, though it has some unique features, can also be viewed as a nuance of the transportability problem, thus inheriting all the theoretical results established in (Pearl and Bareinboim, 2014) that guarantee valid generalizations. We will describe the two problems side by side and then return to the distinction between the type of assumptions that are needed for enabling generalizations.

The transportability problem concerns two dissimilar populations, Π and Π^∗, and requires us to estimate the average causal effect P^∗(y_x) (explicitly: P^∗(y_x) ≡ P^∗(Y = y|do(X = x)) in the target population Π^∗, based on experimental studies conducted on the source population Π. Formally, we assume that all differences between Π and Π^∗ can be attributed to a set of factors S that produce disparities between the two, so that P^∗(y_x) = P(y_x|S = 1). The information available to us consists of two parts; first, treatment effects estimated from experimental studies in Π and, second, observational information extracted from both Π and Π^∗. The former can be written P(y|do(x),z), where Z is set of covariates measured in the experimental study, and the latters are written P^∗(x, y, z) = P (x, y, z|S = 1), and P (x, y, z) respectively. In addition to this information, we are also equipped with a qualitative causal model M, that encodes causal relationships in Π and Π^∗, with the help of which we need to identify the query P^∗(y_x). Mathematically, identification amounts to transforming the query expression

P^∗(y_x) = P(y|do(x),S = 1)

into a form derivable from the available information I_TR, where

I_TR = { P(y|do(x),z), P(x,y,z|S = 1), P(x,y,z) }.

The selection bias problem is slightly different. Here the aim is to estimate the average causal effect P(y_x) in the Π population, while the experimental information available to us, I_SB, comes from a preferentially selected sample, S = 1, and is given by P (y|do(x), z, S = 1). Thus, the selection bias problem calls for transforming the query P(y_x) to a form derivable from the information set:

I_SB = { P(y|do(x),z,S = 1), P(x,y,z|S = 1), P(x,y,z) }.

In the Appendix section, we demonstrate how transportability problems and selection bias problems are solved using the transformations described above.

The analysis reported in (Pearl and Bareinboim, 2014) has resulted in an algorithmic criterion (Bareinboim and Pearl, 2013) for deciding whether transportability is feasible and, when confirmed, the algorithm produces an estimand for the desired effects. The algorithm is complete, in the sense that, when it fails, a consistent estimate of the target effect does not exist (unless one strengthens the assumptions encoded in M).

There are several lessons to be learned from this analysis when considering selection bias problems.

1. The graphical criteria that authorize transportability are applicable to selection bias problems as well, provided that the graph structures for the two problems are identical. This means that whenever a selection bias problem is characterizes by a graph for which transportability is feasible, recovery from selection bias is feasible by the same algorithm. (The Appendix demonstrates this correspondence).

2. The graphical criteria for transportability are more involved than the ones usually invoked in testing treatment assignment ignorability (e.g., through the back-door test). They may require several d-separation tests on several sub-graphs. It is utterly unimaginable therefore that such criteria could be managed by unaided human judgment, no matter how ingenious. (See discussions with Guido Imbens regarding computational barriers to graph-free causal inference, click here). Graph avoiders, should reckon with this predicament.

3. In general, problems associated with external validity cannot be handled by balancing disparities between distributions. The same disparity between P (x, y, z) and P^∗(x, y, z) may demand different adjustments, depending on the location of S in the causal structure. A simple example of this phenomenon is demonstrated in Fig. 3(b) of (Pearl and Bareinboim, 2014) where a disparity in the average reading ability of two cities requires two different treatments, depending on what causes the disparity. If the disparity emanates from age differences, adjustment is necessary, because age is likely to affect the potential outcomes. If, on the other hand the disparity emanates from differences in educational programs, no adjustment is needed, since education, in itself, does not modify response to treatment. The distinction is made formal and vivid in causal graphs.

4. In many instances, generalizations can be achieved by conditioning on post-treatment variables, an operation that is frowned upon in the potential-outcome framework (Rosenbaum, 2002, pp. 73–74; Rubin, 2004; Sekhon, 2009) but has become extremely useful in graphical analysis. The difference between the conditioning operators used in these two frameworks is echoed in the difference between Q_c and Q_do, the two z-specific effects discussed in a previous posting on this blog (link). The latter defines information that is estimable from experimental studies, whereas the former invokes retrospective counterfactual that may or may not be estimable empirically.

In the next Section we will discuss the benefit of leveraging the do-operator in problems concerning generalization.

2. Ignorability versus Admissibility in the Pursuit of Generalization

A key assumption in almost all conventional analyses of generalization (from sample-to-population) is S-ignorability, written Y_x ⊥ S|Z where Y_x is the potential outcome predicated on the intervention X = x, S is a selection indicator (with S = 1 standing for selection into the sample) and Z a set of observed covariates. This condition, sometimes written as a difference Y₁ − Y₀ ⊥ S|Z, and sometimes as a conjunction {Y₁, Y₀} ⊥ S|Z, appears in Hotz et al. (2005); Cole and Stuart (2010); Tipton et al. (2014); Hartman et al. (2015), and possibly other researchers committed to potential-outcome analysis. This assumption says: If we succeed in finding a set Z of pre-treatment covariates such that cross-population differences disappear in every stratum Z = z, then the problem can be solved by averaging over those strata. (Lacking a procedure for finding Z, this solution avoids the harder part of the problem and, in this sense, it somewhat borders on the circular. It amounts to saying: If we can solve the problem in every stratum Z = z then the problem is solved; hardly an informative statement.)

In graphical analysis, on the other hand, the problem of generalization has been studied using another condition, labeled S-admissibility (Pearl and Bareinboim, 2014), which is defined by:

P (y|do(x), z) = P (y|do(x), z, s)

or, using counterfactual notation,

P(y_x|z_x) = P (y_x|z_x, s_x)

It states that in every treatment regime X = x, the observed outcome Y is conditionally independent of the selection mechanism S, given Z, all evaluated at that same treatment regime.

Clearly, S-admissibility coincides with S-ignorability for pre-treatment S and Z; the two notions differ however for treatment-dependent covariates. The Appendix presents scenarios (Fig. 1(a) and (b)) in which post-treatment covariates Z do not satisfy S-ignorability, but satisfy S-admissibility and, thus, enable generalization to take place. We also present scenarios where both S-ignorability and S-admissibility hold and, yet, experimental findings are not generalizable by standard procedures of post-stratification. Rather the correct procedure is uncovered naturally from the graph structure.

One of the reasons that S-admissibility has received greater attention in the graph-based literature is that it has a very simple graphical representation: Z and X should separate Y from S in a mutilated graph, from which all arrows entering X have been removed. Such a graph depicts conditional independencies among observed variables in the population under experimental conditions, i.e., where X is randomized.

In contrast, S-ignorability has not been given a simple graphical interpretation, but it can be verified from either twin networks (Causality, pp. 213-4) or from counterfactually augmented graphs (Causality, p. 341), as we have demonstrated in an earlier posting on this blog (link). Using either representation, it is easy to see that S-ignorability is rarely satisfied in transportability problems in which Z is a post-treatment variable. This is because, whenever S is a proxy to an ancestor of Z, Z cannot separate Y_x from S.

The simplest result of both PO and graph-based approaches is the re-calibration or post-stratification formula. It states that, if Z is a set of pre-treatment covariates satisfying S-ignorability (or S-admissibility), then the causal effect in the population at large can be recovered from a selection-biased sample by a simple re-calibration process. Specifically, if P(y_x|S = 1,Z = z) is the z-specific probability distribution of Y_x in the sample, then the distribution of Y_x in the population at large is given by

P(y_x) = ∑_z P(y_x|S = 1,z) P(z) (*)

where P(z) is the probability of Z = z in the target population (where S = 0). Equation (*) follows from S-ignorability by conditioning on z and, adding S = 1 to the conditioning set – a one-line proof. The proof fails however when Z is treatment dependent, because the counterfactual factor P(y_x|S = 1,z) is not normally estimable in the experimental study. (See Q_c vs. Q_do discussion here).

As noted in (Keiding, 1987) this re-calibration formula goes back to 18th century demographers (Dale, 1777; Tetens, 1786) facing the task of predicting overall mortality (across populations) from age-specific data. Their reasoning was probably as follows: If the source and target populations differ in distribution by a set of attributes Z, then to correct for these differences we need to weight samples by a factor that would restore similarity to the two distributions. Some researchers view Eq. (*) as a version of Horvitz and Thompson (1952) post-stratification method of estimating the mean of a super-population from un-representative stratified samples. The essential difference between survey sampling calibration and the calibration required in Eq. (*) is that the calibrating covariates Z are not just any set by which the distributions differ; they must satisfy the S-ignorability (or admissibility) condition, which is a causal, not a statistical condition. It is not discernible therefore from distributions over observed variables. In other words, the re-calibration formula should depend on disparities between the causal models of the two populations, not merely on distributional disparities. This is demonstrated explicitly in Fig. 4(c) of (Pearl and Bareinboim, 2014), which is also treated in the Appendix (Fig. 1(a)).

While S-ignorability and S-admissibility are both sufficient for re-calibrating pre-treatment covariates Z, S-admissibility goes further and permits generalizations in cases where Z consists of post-treatment covariates. A simple example is the bio-marker model shown in Fig. 4(c) (Example 3) of (Pearl and Bareinboim, 2014), which is also discussed in the Appendix.

Conclusions

1. Many opportunities for generalization are opened up through the use of post-treatment variables. These opportunities remain inaccessible to ignorability-based analysis, partly because S-ignorability does not always hold for such variables but, mainly, because ignorability analysis requires information in the form of z-specific counterfactuals, which is often not estimable from experimental studies.

2. Most of these opportunities have been chartered through the completeness results for transportability (Bareinboim et al., 2014), others can be revealed by simple derivations in do-calculus as shown in the Appendix.

3. There is still the issue of assisting researchers in judging whether S-ignorability (or S-admissibility) is plausible in any given application. Graphs excel in this dimension because graphs match the format in which people store scientific knowledge. Some researchers prefer to do it by direct appeal to intuition; they do so at their own peril.

For references and appendix, click here.

Comments (2)

December 22, 2014

Flowers of the First Law of Causal Inference

Filed under: Counterfactual,Definition,General,structural equations — judea @ 5:22 am

Flower 1 — Seeing counterfactuals in graphs

Some critics of structural equations models and their associated graphs have complained that those graphs depict only observable variables but: “You can’t see the counterfactuals in the graph.” I will soon show that this is not the case; counterfactuals can in fact be seen in the graph, and I regard it as one of many flowers blooming out of the First Law of Causal Inference (see here). But, first, let us ask why anyone would be interested in locating counterfactuals in the graph.

This is not a rhetorical question. Those who deny the usefulness of graphs will surely not yearn to find counterfactuals there. For example, researchers in the Imbens-Rubin camp who, ostensibly, encode all scientific knowledge in the “Science” = Pr(W,X,Y(0),Y(1)), can, theoretically, answer all questions about counterfactuals straight from the “science”; they do not need graphs.

On the other extreme we have students of SEM, for whom counterfactuals are but byproducts of the structural model (as the First Law dictates); so, they too do not need to see counterfactuals explicitly in their graphs. For these researchers, policy intervention questions do not require counterfactuals, because those can be answered directly from the SEM-graph, in which the nodes are observed variables. The same applies to most counterfactual questions, for example, the effect of treatment on the treated (ETT) and mediation problems; graphical criteria have been developed to determine their identification conditions, as well as their resulting estimands (see here and here).

So, who needs to see counterfactual variables explicitly in the graph?

There are two camps of researchers who may benefit from such representation. First, researchers in the Morgan-Winship camp (link here) who are using, interchangeably, both graphs and potential outcomes. These researchers prefer to do the analysis using probability calculus, treating counterfactuals as ordinary random variables, and use graphs only when the algebra becomes helpless. Helplessness arises, for example, when one needs to verify whether causal assumptions that are required in the algebraic derivations (e.g., ignorability conditions) hold true in one’s model of reality. These researchers understand that “one’s model of reality” means one’s graph, not the “Science” = Pr(W,X,Y(0),Y(1)), which is cognitively inaccessible. So, although most of the needed assumptions can be verified without counterfactuals from the SEM-graphs itself (e.g., through the back door condition), the fact that their algebraic expressions already carry counterfactual variables makes it more convenient to see those variables represented explicitly in the graph.

The second camp of researchers are those who do not believe that scientific knowledge is necessarily encoded in an SEM-graph. For them, the “Science” = Pr(W,X,Y(0),Y(1)), is the source of all knowledge and assumptions, and a graph may be constructed, if needed, as an auxiliary tool to represent sets of conditional independencies that hold in Pr(*). [I was surprised to discover sizable camps of such researchers in political science and biostatistics; possibly because they were exposed to potential outcomes prior to studying structural equation models.] These researchers may resort to other graphical representations of independencies, not necessarily SEM-graphs, but occasionally seek the comfort of the meaningful SEM-graph to facilitate counterfactual manipulations. Naturally, they would prefer to see counterfactual variables represented as nodes on the SEM-graph, and use d-separation to verify conditional independencies, when needed.

After this long introduction, let us see where the counterfactuals are in an SEM-graph. They can be located in two ways, first, augmenting the graph with new nodes that represent the counterfactuals and, second, mutilate the graph slightly and use existing nodes to represent the counterfactuals.

The first method is illustrated in chapter 11 of Causality (2nd Ed.) and can be accessed directly here. The idea is simple: According to the structural definition of counterfactuals, Y(0) (similarly Y(1)) represents the value of Y under a condition where X is held constant at X=0. Statistical variations of Y(0) would therefore be governed by all exogenous variables capable of influencing Y when X is held constant, i.e. when the arrows entering X are removed. We are done, because connecting these variables to a new node labeled Y(0), Y(1) creates the desired representation of the counterfactual. The book-section linked above illustrates this construction in visual details.

The second method mutilates the graph and uses the outcome node, Y, as a temporary surrogate for Y(x), with the understanding that the substitution is valid only under the mutilation. The mutilation required for this substitution is dictated by the First Law, and calls for removing all arrows entering the treatment variable X, as illustrated in the following graph (taken from here).

This method has some disadvantages compared with the first; the removal of X’s parents prevents us from seeing connections that might exist between Y_x and the pre-intervention treatment node X (as well as its descendants). To remedy this weakness, Shpitser and Pearl (2009) (link here) retained a copy of the pre-intervention X node, and kept it distinct from the manipulated X node.

Equivalently, Richardson and Robins (2013) spliced the X node into two parts, one to represent the pre-intervention variable X and the other to represent the constant X=x.

All in all, regardless of which variant you choose, the counterfactuals of interest can be represented as nodes in the structural graph, and inter-connections among these nodes can be used either to verify identification conditions or to facilitate algebraic operations in counterfactual logic.

Note, however, that all these variants stem from the First Law, Y(x) = Y[M_x], which DEFINES counterfactuals in terms of an operation on a structural equation model M.

Finally, to celebrate this “Flower of the First Law” and, thereby, the unification of the structural and potential outcome frameworks, I am posting a flowery photo of Don Rubin and myself, taken during Don’s recent visit to UCLA.

Comments (6)

December 20, 2014

A new book out, Morgan and Winship, 2nd Edition

Filed under: Announcement,Book (J Pearl),General,Opinion — judea @ 2:49 pm

Here is my book recommendation for the month:
Counterfactuals and Causal Inference: Methods and Principles for Social Research (Analytical Methods for Social Research) Paperback – November 17, 2014
by Stephen L. Morgan (Author), Christopher Winship (Author)
ISBN-13: 978-1107694163 ISBN-10: 1107694167 Edition: 2nd

My book-cover blurb reads:
“This improved edition of Morgan and Winship’s book elevates traditional social sciences, including economics, education and political science, from a hopeless flirtation with regression to a solid science of causal interpretation, based on two foundational pillars: counterfactuals and causal graphs. A must for anyone seeking an understanding of the modern tools of causal analysis, and a must for anyone expecting science to secure explanations, not merely descriptions.”

But Gary King puts it in a more compelling historical perspective:
“More has been learned about causal inference in the last few decades than the sum total of everything that had been learned about it in all prior recorded history. The first comprehensive survey of the modern causal inference literature was the first edition of Morgan and Winship. Now with the second edition of this successful book comes the most up-to-date treatment.” Gary King, Harvard University

King’s statement is worth repeating here to remind us that we are indeed participating in an unprecedented historical revolution:

“More has been learned about causal inference in the last few decades than the sum total of everything that had been learned about it in all prior recorded history.”

It is the same revolution that Miquel Porta noted to be transforming the discourse in Epidemiology (link).

Social science and Epidemiology have been spear-heading this revolution, but I don’t think other disciplines will sit idle for too long.

In a recent survey (here), I attributed the revolution to “a fruitful symbiosis between graphs and counterfactuals that has unified the potential outcome framework of Neyman, Rubin, and Robins with the econometric tradition of Haavelmo, Marschak, and Heckman. In this symbiosis, counterfactuals emerge as natural byproducts of structural equations and serve to formally articulate research questions of interest. Graphical models, on the other hand, are used to encode scientific assumptions in a qualitative (i.e. nonparametric) and transparent language and to identify the logical ramifications of these assumptions, in particular their testable implications.”

Other researchers may wish to explain the revolution in other ways; still, Morgan and Winship’s book is a perfect example of how the symbiosis can work when taken seriously.

Comments (3)

A new review of Causality

Filed under: Book (J Pearl),General,Opinion — eb @ 2:46 pm

A new review of Causality (2nd Edition, 2013 printing) has appeared in Acta Sociologica 2014, Vol. 57(4) 369-375.
http://bayes.cs.ucla.edu/BOOK-2K/elwert-review2014.pdf
Reviewed by Felix Elwert, University of Wisconsin-Madison, USA.

Elwert highlights specific sections of Causality that can empower social scientists with new insights or new tools for applying modern methods of causal inference in their research. Coming from a practical social science perspective, this review is a welcome addition to the list of 33 other reviews of Causality, which tend to be more philosophical. see http://bayes.cs.ucla.edu/BOOK-2K/book_review.html

I am particularly gratified by Elwert’s final remarks:
“Pearl’s language empowers social scientists to communicate causal models with each other across sub-disciplines…and enables social scientists to communicate more effectively with statistical methodologists.”

Comments (1)