Causal Analysis in Theory and Practice

June 20, 2016

Recollections from the WCE conference at Stanford

Filed under: Counterfactual,General,Mediated Effects,structural equations — bryantc @ 7:45 am

On May 21, Kosuke Imai and I participated in a panel on Mediation, at the annual meeting of the West Coast Experiment Conference, organized by Stanford Graduate School of Business http://www.gsb.stanford.edu/facseminars/conferences/west-coast-experiments-conference. The following are some of my recollections from that panel.

1.
We began the discussion by reviewing causal mediation analysis and summarizing the exchange we had on the pages of Psychological Methods (2014)
http://ftp.cs.ucla.edu/pub/stat_ser/r389-imai-etal-commentary-r421-reprint.pdf

My slides for the panel can be viewed here:
http://web.cs.ucla.edu/~kaoru/stanford-may2016-bw.pdf

We ended with a consensus regarding the importance of causal mediation and the conditions for identifying of Natural Direct and Indirect Effects, from randomized as well as observational studies.

2.
We proceeded to discuss the symbiosis between the structural and the counterfactual languages. Here I focused on slides 4-6 (page 3), and remarked that only those who are willing to solve a toy problem from begining to end, using both potential outcomes and DAGs can understand the tradeoff between the two. Such a toy problem (and its solution) was presented in slide 5 (page 3) titled “Formulating a problem in Three Languages” and the questions that I asked the audience are still ringing in my ears. Please have a good look at these two sets of assumptions and ask yourself:

a. Have we forgotten any assumption?
b. Are these assumptions consistent?
c. Is any of the assumptions redundant (i.e. does it follow logically from the others)?
d. Do they have testable implications?
e. Do these assumptions permit the identification of causal effects?
f. Are these assumptions plausible in the context of the scenario given?

As I was discussing these questions over slide 5, the audience seemed to be in general agreement with the conclusion that, despite their logical equivalence, the graphical language  enables  us to answer these questions immediately while the potential outcome language remains silent on all.

I consider this example to be pivotal to the comparison of the two frameworks. I hope that questions a,b,c,d,e,f will be remembered, and speakers from both camps will be asked to address them squarely and explicitly .

The fact that graduate students made up the majority of the participants gives me the hope that questions a,b,c,d,e,f will finally receive the attention they deserve.

3.
As we discussed the virtues of graphs, I found it necessary to reiterate the observation that DAGs are more than just “natural and convenient way to express assumptions about causal structures” (Imbens and Rubin , 2013, p. 25). Praising their transparency while ignoring their inferential power misses the main role that graphs play in causal analysis. The power of graphs lies in computing complex implications of causal assumptions (i.e., the “science”) no matter in what language they are expressed.  Typical implications are: conditional independencies among variables and counterfactuals, what covariates need be controlled to remove confounding or selection bias, whether effects can be identified, and more. These implications could, in principle, be derived from any equivalent representation of the causal assumption, not necessarily graphical, but not before incurring a prohibitive computational cost. See, for example, what happens when economists try to replace d-separation with graphoid axioms http://ftp.cs.ucla.edu/pub/stat_ser/r420.pdf.

4.
Following the discussion of representations, we addressed questions posed to us by the audience, in particular, five questions submitted by Professor Jon Krosnick (Political Science, Stanford).

I summarize them in the following slide:

Krosnick’s Questions to Panel
———————————————-
1) Do you think an experiment has any value without mediational analysis?
2) Is a separate study directly manipulating the mediator useful? How is the second study any different from the first one?
3) Imai’s correlated residuals test seems valuable for distinguishing fake from genuine mediation. Is that so? And how it is related to traditional mediational test?
4) Why isn’t it easy to test whether participants who show the largest increases in the posited mediator show the largest changes in the outcome?
5) Why is mediational analysis any “worse” than any other method of investigation?
———————————————-
My answers focused on question 2, 4 and 5, which I summarize below:

2)
Q. Is a separate study directly manipulating the mediator useful?
Answer: Yes, it is useful if physically feasible but, still, it cannot give us an answer to the basic mediation question: “What percentage of the observed response is due to mediation?” The concept of mediation is necessarily counterfactual, i.e. sitting on the top layer of the causal hierarchy (see “Causality” chapter 1). It cannot be defined therefore in terms of population experiments, however clever. Mediation can be evaluated with the help of counterfactual assumptions such as “conditional ignorability” or “no interaction,” but these assumptions cannot be verified in population experiments.

4)
Q. Why isn’t it easy to test whether participants who show the largest increases in the posited mediator show the largest changes in the outcome?
Answer: Translating the question to counterfactual notation the test suggested requires the existence of monotonic function f_m such that, for every individual, we have Y_1 – Y_0 =f_m (M_1 – M_0)

This condition expresses a feature we expect to find in mediation, but it cannot be taken as a DEFINITION of mediation. This condition is essentially the way indirect effects are defined in the Principal Strata framework (Frangakis and Rubin, 2002) the deficiencies of which are well known. See http://ftp.cs.ucla.edu/pub/stat_ser/r382.pdf.

In particular, imagine a switch S controlling two light bulbs L1 and L2. Positive correlation between L1 and L2 does not mean that L1 mediates between the switch and L2. Many examples of incompatibility are demonstrated in the paper above.

The conventional mediation tests (in the Baron and Kenny tradition) suffer from the same problem; they test features of mediation that are common in linear systems, but not the essence of mediation which is universal to all systems, linear and nonlinear, continuous as well as categorical variables.

5)
Q. Why is mediational analysis any “worse” than any other method of investigation?
Answer: The answer is closely related to the one given to question 3). Mediation is not a “method” but a property of the population which is defined counterfactually, and therefore requires counterfactual assumption for evaluation. Experiments are not sufficient; and in this sense mediation is “worse” than other properties under investigation, eg., causal effects, which can be estimated entirely from experiments.

About the only thing we can ascertain experimentally is whether the (controlled) direct effect differs from the total effect, but we cannot evaluate the extent of mediation.

Another way to appreciate why stronger assumptions are needed for mediation is to note that non-confoundedness is not the same as ignorability. For non-binary variables one can construct examples where X and Y are not confounded ( i.e., P(y|do(x))= P(y|x)) and yet they are not ignorable, (i.e., Y_x is not independent of X.) Mediation requires ignorability in addition to nonconfoundedness.

Summary
Overall, the panel was illuminating, primarily due to the active participation of curious students. It gave me good reasons to believe that Political Science is destined to become a bastion of modern causal analysis. I wish economists would follow suit, despite the hurdles they face in getting causal analysis to economics education.
http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf
http://ftp.cs.ucla.edu/pub/stat_ser/r395.pdf

Judea

February 12, 2016

Winter Greeting from the UCLA Causality Blog

Friends in causality research,
This greeting from the UCLA Causality blog contains:

A. An introduction to our newly published book, Causal Inference in Statistics – A Primer, Wiley 2016 (with M. Glymour and N. Jewell)
B. Comments on two other books: (1) R. Klein’s Structural Equation Modeling and (2) L Pereira and A. Saptawijaya’s on Machine Ethics.
C. News, Journals, awards and other frills.

A.
Our publisher (Wiley) has informed us that the book “Causal Inference in Statistics – A Primer” by J. Pearl, M. Glymour and N. Jewell is already available on Kindle, and will be available in print Feb. 26, 2016.
http://www.amazon.com/Causality-A-Primer-Judea-Pearl/dp/1119186846
http://www.amazon.com/Causal-Inference-Statistics-Judea-Pearl-ebook/dp/B01B3P6NJM/ref=mt_kindle?_encoding=UTF8&me=

This book introduces core elements of causal inference into undergraduate and lower-division graduate classes in statistics and data-intensive sciences. The aim is to provide students with the understanding of how data are generated and interpreted at the earliest stage of their statistics education. To that end, the book empowers students with models and tools that answer nontrivial causal questions using vivid examples and simple mathematics. Topics include: causal models, model testing, effects of interventions, mediation and counterfactuals, in both linear and nonparametric systems.

The Table of Contents, Preface and excerpts from the four chapters can be viewed here:
http://bayes.cs.ucla.edu/PRIMER/
A book website providing answers to home-works and interactive computer programs for simulation and analysis (using dagitty)  is currently under construction.

B1
We are in receipt of the fourth edition of Rex Kline’s book “Principles and Practice of Structural Equation Modeling”, http://psychology.concordia.ca/fac/kline/books/nta.pdf

This book is unique in that it treats structural equation models (SEMs) as carriers of causal assumptions and tools for causal inference. Gone are the inhibitions and trepidation that characterize most SEM texts in their treatments of causation.

To the best of my knowledge, Chapter 8 in Kline’s book is the first SEM text to introduce graphical criteria for parameter identification — a long overdue tool
in a field that depends on identifiability for model “fitting”. Overall, the book elevates SEM education to new heights and promises to usher a renaissance for a field that, five decades ago, has pioneered causal analysis in the behavioral sciences.

B2
Much has been written lately on computer ethics, morality, and free will. The new book “Programming Machine Ethics” by Luis Moniz Pereira and Ari Saptawijaya formalizes these concepts in the language of logic programming. See book announcement http://www.springer.com/gp/book/9783319293530. As a novice to the literature on ethics and morality, I was happy to find a comprehensive compilation of the many philosophical works on these topics, articulated in a language that even a layman can comprehend. I was also happy to see the critical role that the logic of counterfactuals plays in moral reasoning. The book is a refreshing reminder that there is more to counterfactual reasoning than “average treatment effects”.

C. News, Journals, awards and other frills.
C1.
Nominations are Invited for the Causality in Statistics Education Award (Deadline is February 15, 2016).

The ASA Causality in Statistics Education Award is aimed at encouraging the teaching of basic causal inference in introductory statistics courses. Co-sponsored by Microsoft Research and Google, the prize is motivated by the growing importance of introducing core elements of causal inference into undergraduate and lower-division graduate classes in statistics. For more information, please see http://www.amstat.org/education/causalityprize/ .

Nominations and questions should be sent to the ASA office at educinfo@amstat.org . The nomination deadline is February 15, 2016.

C.2.
Issue 4.1 of the Journal of Causal Inference is scheduled to appear March 2016, with articles covering all aspects of causal analysis. For mission, policy, and submission information please see: http://degruyter.com/view/j/jci

C.3
Finally, enjoy new results and new insights posted on our technical report page: http://bayes.cs.ucla.edu/csl_papers.html

Judea

August 11, 2015

Mid-Summer Greeting from the UCLA Causality Blog

Filed under: Announcement,Causal Effect,Counterfactual,General — moderator @ 6:09 pm

Friends in causality research,

This mid-summer greeting of UCLA Causality blog contains:
A. News items concerning causality research
B. Discussions and scientific results

1. The next issue of the Journal of Causal Inference is scheduled to appear this month, and the table of content can be viewed here.

2. A new digital journal “Observational Studies” is out this month (link) and its first issue is dedicated to the legacy of William Cochran (1909-1980).

My contribution to this issue can be viewed here:
http://ftp.cs.ucla.edu/pub/stat_ser/r456.pdf

See also comment 1 below.

3. A video recording of my Cassel Lecture at the SER conference, June 2015, Denver, CO, can be viewed here:
https://epiresearch.org/about-us/archives/video-archives-2/the-scientific-approach-to-causal-inference/

4. A video of a conversation with Robert Gould concerning the teaching of causality can be viewed on Wiley’s Statistics Views, link (2 parts, scroll down).

5. We are informed of the upcoming publication of a new book, Rex Kline “Principles and Practice of Structural Equation Modeling, Fourth Edition (link). Judging by the chapters I read, this book promises to be unique; it treats structural equation models for what they are: carriers of causal assumptions and tools for causal inference. Kudos, Rex.

6. We are informed of another book on causal inference: Imbens, Guido W.; Rubin, Donald B. “Causal Inference in Statistics, Social, and Biomedical Sciences: An Introduction” Cambridge University Press (2015). Readers will quickly realize that the ideas, methods, and tools discussed on this blog were kept out of this book. Omissions include: Control of confounding, testable implications of causal assumptions, visualization of causal assumptions, generalized instrumental variables, mediation analysis, moderation, interaction, attribution, external validity, explanation, representation of scientific knowledge and, most importantly, the unification of potential outcomes and structural models.

Given that the book is advertised as describing “the leading analysis methods” of causal inference, unsuspecting readers will get the impression that the field as a whole is facing fundamental obstacles, and that we are still lacking the tools to cope with basic causal tasks such as confounding control and model testing. I do not believe mainstream methods of causal inference are in such state of helplessness.

The authors’ motivation and rationale for this exclusion were discussed at length on this blog. See
“Are economists smarter than epidemiologists”
http://causality.cs.ucla.edu/blog/?p=1241

and “On the First Law of Causal Inference”
http://causality.cs.ucla.edu/blog/?m=201411

As most of you know, I have spent many hours trying to explain to leaders of the potential outcome school what insights and tools their students would be missing if not given exposure to a broader intellectual environment, one that embraces model-based inferences side by side with potential outcomes.

This book confirms my concerns, and its insularity-based impediments are likely to evoke interesting public discussions on the subject. For example, educators will undoubtedly wish to ask:

(1) Is there any guidance we can give students on how to select covariates for matching or adjustment?.

(2) Are there any tools available to help students judge the plausibility of ignorability-type assumptions?

(3) Aren’t there any methods for deciding whether identifying assumptions have testable implications?.

I believe that if such questions are asked often enough, they will eventually evoke non-ignorable answers.

7. The ASA has come up with a press release yesterday, recognizing Tyler VanderWeele’s new book “Explanation in Causal Inference,” winner of the 2015 Causality in Statistics Education Award
http://www.amstat.org/newsroom/pressreleases/JSM2015-CausalityinStatisticsEducationAward.pdf

Congratulations, Tyler.

Information on nominations for the 2016 Award will soon be announced.

8. Since our last Greetings (Spring, 2015) we have had a few lively discussions posted on this blog. I summarize them below:

8.1. Indirect Confounding and Causal Calculus
(How getting too anxious to criticize do-calculus may cause you to miss an easy solution to a problem you thought was hard).
July 23, 2015
http://causality.cs.ucla.edu/blog/?p=1545

8.2. Does Obesity Shorten Life? Or is it the Soda?
(Discusses whether it was the earth that caused the apple to fall? or the gravitational field created by the earth?.)
May 27, 2015
http://causality.cs.ucla.edu/blog/?p=1534

8.3. Causation without Manipulation
(Asks whether anyone takes this mantra seriously nowadays, and whether we need manipulations to store scientific knowledge)
May 14, 2015
http://causality.cs.ucla.edu/blog/?p=1518

8.4. David Freedman, Statistics, and Structural Equation Models
(On why Freedman invented “response schedule”?)
May 6, 2015
http://causality.cs.ucla.edu/blog/?p=1502

8.5. We also had a few breakthroughs posted on our technical report page
http://bayes.cs.ucla.edu/csl_papers.html

My favorites this summer are these two:
http://ftp.cs.ucla.edu/pub/stat_ser/r452.pdf
http://ftp.cs.ucla.edu/pub/stat_ser/r450.pdf
because they deal with the tough and long-standing problem:
“How generalizable are empirical studies?”

Enjoy the rest of the summer
Judea

April 24, 2015

Flowers of the First Law of Causal Inference (3)

Flower 3 — Generalizing experimental findings

Continuing our examination of “the flowers of the First Law” (see previous flowers here and here) this posting looks at one of the most crucial questions in causal inference: “How generalizable are our randomized clinical trials?” Readers of this blog would be delighted to learn that one of our flowers provides an elegant and rather general answer to this question. I will describe this answer in the context of transportability theory, and compare it to the way researchers have attempted to tackle the problem using the language of ignorability. We will see that ignorability-type assumptions are fairly limited, both in their ability to define conditions that permit generalizations, and in our ability to justify them in specific applications.

1. Transportability and Selection Bias
The problem of generalizing experimental findings from the trial sample to the population as a whole, also known as the problem of “sample selection-bias” (Heckman, 1979; Bareinboim et al., 2014), has received wide attention lately, as more researchers come to recognize this bias as a major threat to the validity of experimental findings in both the health sciences (Stuart et al., 2015) and social policy making (Manski, 2013).

Since participation in a randomized trial cannot be mandated, we cannot guarantee that the study population would be the same as the population of interest. For example, the study population may consist of volunteers, who respond to financial and medical incentives offered by pharmaceutical firms or experimental teams, so, the distribution of outcomes in the study may differ substantially from the distribution of outcomes under the policy of interest.

Another impediment to the validity of experimental finding is that the types of individuals in the target population may change over time. For example, as more individuals become eligible for health insurance, the types of individuals seeking services would no longer match the type of individuals that were sampled for the study. A similar change would occur as more individuals become aware of the efficacy of the treatment. The result is an inherent disparity between the target population and the population under study.

The problem of generalizing across disparate populations has received a formal treatment in (Pearl and Bareinboim, 2014) where it was labeled “transportability,” and where necessary and sufficient conditions for valid generalization were established (see also Bareinboim and Pearl, 2013). The problem of selection bias, though it has some unique features, can also be viewed as a nuance of the transportability problem, thus inheriting all the theoretical results established in (Pearl and Bareinboim, 2014) that guarantee valid generalizations. We will describe the two problems side by side and then return to the distinction between the type of assumptions that are needed for enabling generalizations.

The transportability problem concerns two dissimilar populations, Π and Π, and requires us to estimate the average causal effect P(yx) (explicitly: P(yx) ≡ P(Y = y|do(X = x)) in the target population Π, based on experimental studies conducted on the source population Π. Formally, we assume that all differences between Π and Π can be attributed to a set of factors S that produce disparities between the two, so that P(yx) = P(yx|S = 1). The information available to us consists of two parts; first, treatment effects estimated from experimental studies in Π and, second, observational information extracted from both Π and Π. The former can be written P(y|do(x),z), where Z is set of covariates measured in the experimental study, and the latters are written P(x, y, z) = P (x, y, z|S = 1), and P (x, y, z) respectively. In addition to this information, we are also equipped with a qualitative causal model M, that encodes causal relationships in Π and Π, with the help of which we need to identify the query P(yx). Mathematically, identification amounts to transforming the query expression

P(yx) = P(y|do(x),S = 1)

into a form derivable from the available information ITR, where

ITR = { P(y|do(x),z),  P(x,y,z|S = 1),   P(x,y,z) }.

The selection bias problem is slightly different. Here the aim is to estimate the average causal effect P(yx) in the Π population, while the experimental information available to us, ISB, comes from a preferentially selected sample, S = 1, and is given by P (y|do(x), z, S = 1). Thus, the selection bias problem calls for transforming the query P(yx) to a form derivable from the information set:

ISB = { P(y|do(x),z,S = 1), P(x,y,z|S = 1), P(x,y,z) }.

In the Appendix section, we demonstrate how transportability problems and selection bias problems are solved using the transformations described above.

The analysis reported in (Pearl and Bareinboim, 2014) has resulted in an algorithmic criterion (Bareinboim and Pearl, 2013) for deciding whether transportability is feasible and, when confirmed, the algorithm produces an estimand for the desired effects. The algorithm is complete, in the sense that, when it fails, a consistent estimate of the target effect does not exist (unless one strengthens the assumptions encoded in M).

There are several lessons to be learned from this analysis when considering selection bias problems.

1. The graphical criteria that authorize transportability are applicable to selection bias problems as well, provided that the graph structures for the two problems are identical. This means that whenever a selection bias problem is characterizes by a graph for which transportability is feasible, recovery from selection bias is feasible by the same algorithm. (The Appendix demonstrates this correspondence).

2. The graphical criteria for transportability are more involved than the ones usually invoked in testing treatment assignment ignorability (e.g., through the back-door test). They may require several d-separation tests on several sub-graphs. It is utterly unimaginable therefore that such criteria could be managed by unaided human judgment, no matter how ingenious. (See discussions with Guido Imbens regarding computational barriers to graph-free causal inference, click here). Graph avoiders, should reckon with this predicament.

3. In general, problems associated with external validity cannot be handled by balancing disparities between distributions. The same disparity between P (x, y, z) and P(x, y, z) may demand different adjustments, depending on the location of S in the causal structure. A simple example of this phenomenon is demonstrated in Fig. 3(b) of (Pearl and Bareinboim, 2014) where a disparity in the average reading ability of two cities requires two different treatments, depending on what causes the disparity. If the disparity emanates from age differences, adjustment is necessary, because age is likely to affect the potential outcomes. If, on the other hand the disparity emanates from differences in educational programs, no adjustment is needed, since education, in itself, does not modify response to treatment. The distinction is made formal and vivid in causal graphs.

4. In many instances, generalizations can be achieved by conditioning on post-treatment variables, an operation that is frowned upon in the potential-outcome framework (Rosenbaum, 2002, pp. 73–74; Rubin, 2004; Sekhon, 2009) but has become extremely useful in graphical analysis. The difference between the conditioning operators used in these two frameworks is echoed in the difference between Qc and Qdo, the two z-specific effects discussed in a previous posting on this blog (link). The latter defines information that is estimable from experimental studies, whereas the former invokes retrospective counterfactual that may or may not be estimable empirically.

In the next Section we will discuss the benefit of leveraging the do-operator in problems concerning generalization.

2. Ignorability versus Admissibility in the Pursuit of Generalization

A key assumption in almost all conventional analyses of generalization (from sample-to-population) is S-ignorability, written Yx ⊥ S|Z where Yx is the potential outcome predicated on the intervention X = x, S is a selection indicator (with S = 1 standing for selection into the sample) and Z a set of observed covariates. This condition, sometimes written as a difference Y1 − Y0 ⊥ S|Z, and sometimes as a conjunction {Y1, Y0} ⊥ S|Z, appears in Hotz et al. (2005); Cole and Stuart (2010); Tipton et al. (2014); Hartman et al. (2015), and possibly other researchers committed to potential-outcome analysis. This assumption says: If we succeed in finding a set Z of pre-treatment covariates such that cross-population differences disappear in every stratum Z = z, then the problem can be solved by averaging over those strata. (Lacking a procedure for finding Z, this solution avoids the harder part of the problem and, in this sense, it somewhat borders on the circular. It amounts to saying: If we can solve the problem in every stratum Z = z then the problem is solved; hardly an informative statement.)

In graphical analysis, on the other hand, the problem of generalization has been studied using another condition, labeled S-admissibility (Pearl and Bareinboim, 2014), which is defined by:

P (y|do(x), z) = P (y|do(x), z, s)

or, using counterfactual notation,

P(yx|zx) = P (yx|zx, sx)

It states that in every treatment regime X = x, the observed outcome Y is conditionally independent of the selection mechanism S, given Z, all evaluated at that same treatment regime.

Clearly, S-admissibility coincides with S-ignorability for pre-treatment S and Z; the two notions differ however for treatment-dependent covariates. The Appendix presents scenarios (Fig. 1(a) and (b)) in which post-treatment covariates Z do not satisfy S-ignorability, but satisfy S-admissibility and, thus, enable generalization to take place. We also present scenarios where both S-ignorability and S-admissibility hold and, yet, experimental findings are not generalizable by standard procedures of post-stratification. Rather the correct procedure is uncovered naturally from the graph structure.

One of the reasons that S-admissibility has received greater attention in the graph-based literature is that it has a very simple graphical representation: Z and X should separate Y from S in a mutilated graph, from which all arrows entering X have been removed. Such a graph depicts conditional independencies among observed variables in the population under experimental conditions, i.e., where X is randomized.

In contrast, S-ignorability has not been given a simple graphical interpretation, but it can be verified from either twin networks (Causality, pp. 213-4) or from counterfactually augmented graphs (Causality, p. 341), as we have demonstrated in an earlier posting on this blog (link). Using either representation, it is easy to see that S-ignorability is rarely satisfied in transportability problems in which Z is a post-treatment variable. This is because, whenever S is a proxy to an ancestor of Z, Z cannot separate Yx from S.

The simplest result of both PO and graph-based approaches is the re-calibration or post-stratification formula. It states that, if Z is a set of pre-treatment covariates satisfying S-ignorability (or S-admissibility), then the causal effect in the population at large can be recovered from a selection-biased sample by a simple re-calibration process. Specifically, if P(yx|S = 1,Z = z) is the z-specific probability distribution of Yx in the sample, then the distribution of Yx in the population at large is given by

P(yx) = ∑z  P(yx|S = 1,z)   P(z)  (*)

where P(z) is the probability of Z = z in the target population (where S = 0). Equation (*) follows from S-ignorability by conditioning on z and, adding S = 1 to the conditioning set – a one-line proof. The proof fails however when Z is treatment dependent, because the counterfactual factor P(yx|S = 1,z) is not normally estimable in the experimental study. (See Qc vs. Qdo discussion here).

As noted in (Keiding, 1987) this re-calibration formula goes back to 18th century demographers (Dale, 1777; Tetens, 1786) facing the task of predicting overall mortality (across populations) from age-specific data. Their reasoning was probably as follows: If the source and target populations differ in distribution by a set of attributes Z, then to correct for these differences we need to weight samples by a factor that would restore similarity to the two distributions. Some researchers view Eq. (*) as a version of Horvitz and Thompson (1952) post-stratification method of estimating the mean of a super-population from un-representative stratified samples. The essential difference between survey sampling calibration and the calibration required in Eq. (*) is that the calibrating covariates Z are not just any set by which the distributions differ; they must satisfy the S-ignorability (or admissibility) condition, which is a causal, not a statistical condition. It is not discernible therefore from distributions over observed variables. In other words, the re-calibration formula should depend on disparities between the causal models of the two populations, not merely on distributional disparities. This is demonstrated explicitly in Fig. 4(c) of (Pearl and Bareinboim, 2014), which is also treated in the Appendix (Fig. 1(a)).

While S-ignorability and S-admissibility are both sufficient for re-calibrating pre-treatment covariates Z, S-admissibility goes further and permits generalizations in cases where Z consists of post-treatment covariates. A simple example is the bio-marker model shown in Fig. 4(c) (Example 3) of (Pearl and Bareinboim, 2014), which is also discussed in the Appendix.

Conclusions

1. Many opportunities for generalization are opened up through the use of post-treatment variables. These opportunities remain inaccessible to ignorability-based analysis, partly because S-ignorability does not always hold for such variables but, mainly, because ignorability analysis requires information in the form of z-specific counterfactuals, which is often not estimable from experimental studies.

2. Most of these opportunities have been chartered through the completeness results for transportability (Bareinboim et al., 2014), others can be revealed by simple derivations in do-calculus as shown in the Appendix.

3. There is still the issue of assisting researchers in judging whether S-ignorability (or S-admissibility) is plausible in any given application. Graphs excel in this dimension because graphs match the format in which people store scientific knowledge. Some researchers prefer to do it by direct appeal to intuition; they do so at their own peril.

For references and appendix, click here.

December 22, 2014

Flowers of the First Law of Causal Inference

Filed under: Counterfactual,Definition,General,structural equations — judea @ 5:22 am

Flower 1 — Seeing counterfactuals in graphs

Some critics of structural equations models and their associated graphs have complained that those graphs depict only observable variables but: “You can’t see the counterfactuals in the graph.” I will soon show that this is not the case; counterfactuals can in fact be seen in the graph, and I regard it as one of many flowers blooming out of the First Law of Causal Inference (see here). But, first, let us ask why anyone would be interested in locating counterfactuals in the graph.

This is not a rhetorical question. Those who deny the usefulness of graphs will surely not yearn to find counterfactuals there. For example, researchers in the Imbens-Rubin camp who, ostensibly, encode all scientific knowledge in the “Science” = Pr(W,X,Y(0),Y(1)), can, theoretically, answer all questions about counterfactuals straight from the “science”; they do not need graphs.

On the other extreme we have students of SEM, for whom counterfactuals are but byproducts of the structural model (as the First Law dictates); so, they too do not need to see counterfactuals explicitly in their graphs. For these researchers, policy intervention questions do not require counterfactuals, because those can be answered directly from the SEM-graph, in which the nodes are observed variables. The same applies to most counterfactual questions, for example, the effect of treatment on the treated (ETT) and mediation problems; graphical criteria have been developed to determine their identification conditions, as well as their resulting estimands (see here and here).

So, who needs to see counterfactual variables explicitly in the graph?

There are two camps of researchers who may benefit from such representation. First, researchers in the Morgan-Winship camp (link here) who are using, interchangeably, both graphs and potential outcomes. These researchers prefer to do the analysis using probability calculus, treating counterfactuals as ordinary random variables, and use graphs only when the algebra becomes helpless. Helplessness arises, for example, when one needs to verify whether causal assumptions that are required in the algebraic derivations (e.g., ignorability conditions) hold true in one’s model of reality. These researchers understand that “one’s model of reality” means one’s graph, not the “Science” = Pr(W,X,Y(0),Y(1)), which is cognitively inaccessible. So, although most of the needed assumptions can be verified without counterfactuals from the SEM-graphs itself (e.g., through the back door condition), the fact that their algebraic expressions already carry counterfactual variables makes it more convenient to see those variables represented explicitly in the graph.

The second camp of researchers are those who do not believe that scientific knowledge is necessarily encoded in an SEM-graph. For them, the “Science” = Pr(W,X,Y(0),Y(1)), is the source of all knowledge and assumptions, and a graph may be constructed, if needed, as an auxiliary tool to represent sets of conditional independencies that hold in Pr(*). [I was surprised to discover sizable camps of such researchers in political science and biostatistics; possibly because they were exposed to potential outcomes prior to studying structural equation models.] These researchers may resort to other graphical representations of independencies, not necessarily SEM-graphs, but occasionally seek the comfort of the meaningful SEM-graph to facilitate counterfactual manipulations. Naturally, they would prefer to see counterfactual variables represented as nodes on the SEM-graph, and use d-separation to verify conditional independencies, when needed.

After this long introduction, let us see where the counterfactuals are in an SEM-graph. They can be located in two ways, first, augmenting the graph with new nodes that represent the counterfactuals and, second, mutilate the graph slightly and use existing nodes to represent the counterfactuals.

The first method is illustrated in chapter 11 of Causality (2nd Ed.) and can be accessed directly here. The idea is simple: According to the structural definition of counterfactuals, Y(0) (similarly Y(1)) represents the value of Y under a condition where X is held constant at X=0. Statistical variations of Y(0) would therefore be governed by all exogenous variables capable of influencing Y when X is held constant, i.e. when the arrows entering X are removed. We are done, because connecting these variables to a new node labeled Y(0), Y(1) creates the desired representation of the counterfactual. The book-section linked above illustrates this construction in visual details.

The second method mutilates the graph and uses the outcome node, Y, as a temporary surrogate for Y(x), with the understanding that the substitution is valid only under the mutilation. The mutilation required for this substitution is dictated by the First Law, and calls for removing all arrows entering the treatment variable X, as illustrated in the following graph (taken from here).

This method has some disadvantages compared with the first; the removal of X’s parents prevents us from seeing connections that might exist between Y_x and the pre-intervention treatment node X (as well as its descendants). To remedy this weakness, Shpitser and Pearl (2009) (link here) retained a copy of the pre-intervention X node, and kept it distinct from the manipulated X node.

Equivalently, Richardson and Robins (2013) spliced the X node into two parts, one to represent the pre-intervention variable X and the other to represent the constant X=x.

All in all, regardless of which variant you choose, the counterfactuals of interest can be represented as nodes in the structural graph, and inter-connections among these nodes can be used either to verify identification conditions or to facilitate algebraic operations in counterfactual logic.

Note, however, that all these variants stem from the First Law, Y(x) = Y[M_x], which DEFINES counterfactuals in terms of an operation on a structural equation model M.

Finally, to celebrate this “Flower of the First Law” and, thereby, the unification of the structural and potential outcome frameworks, I am posting a flowery photo of Don Rubin and myself, taken during Don’s recent visit to UCLA.

December 20, 2014

A new book out, Morgan and Winship, 2nd Edition

Filed under: Announcement,Book (J Pearl),General,Opinion — judea @ 2:49 pm

Here is my book recommendation for the month:
Counterfactuals and Causal Inference: Methods and Principles for Social Research (Analytical Methods for Social Research) Paperback – November 17, 2014
by Stephen L. Morgan (Author), Christopher Winship (Author)
ISBN-13: 978-1107694163 ISBN-10: 1107694167 Edition: 2nd

My book-cover blurb reads:
“This improved edition of Morgan and Winship’s book elevates traditional social sciences, including economics, education and political science, from a hopeless flirtation with regression to a solid science of causal interpretation, based on two foundational pillars: counterfactuals and causal graphs. A must for anyone seeking an understanding of the modern tools of causal analysis, and a must for anyone expecting science to secure explanations, not merely descriptions.”

But Gary King puts it in a more compelling historical perspective:
“More has been learned about causal inference in the last few decades than the sum total of everything that had been learned about it in all prior recorded history. The first comprehensive survey of the modern causal inference literature was the first edition of Morgan and Winship. Now with the second edition of this successful book comes the most up-to-date treatment.” Gary King, Harvard University

King’s statement is worth repeating here to remind us that we are indeed participating in an unprecedented historical revolution:

“More has been learned about causal inference in the last few decades than the sum total of everything that had been learned about it in all prior recorded history.”

It is the same revolution that Miquel Porta noted to be transforming the discourse in Epidemiology (link).

Social science and Epidemiology have been spear-heading this revolution, but I don’t think other disciplines will sit idle for too long.

In a recent survey (here), I attributed the revolution to “a fruitful symbiosis between graphs and counterfactuals that has unified the potential outcome framework of Neyman, Rubin, and Robins with the econometric tradition of Haavelmo, Marschak, and Heckman. In this symbiosis, counterfactuals emerge as natural byproducts of structural equations and serve to formally articulate research questions of interest. Graphical models, on the other hand, are used to encode scientific assumptions in a qualitative (i.e. nonparametric) and transparent language and to identify the logical ramifications of these assumptions, in particular their testable implications.”

Other researchers may wish to explain the revolution in other ways; still, Morgan and Winship’s book is a perfect example of how the symbiosis can work when taken seriously.

A new review of Causality

Filed under: Book (J Pearl),General,Opinion — eb @ 2:46 pm

A new review of Causality (2nd Edition, 2013 printing) has appeared in Acta Sociologica 2014, Vol. 57(4) 369-375.
http://bayes.cs.ucla.edu/BOOK-2K/elwert-review2014.pdf
Reviewed by Felix Elwert, University of Wisconsin-Madison, USA.

Elwert highlights specific sections of Causality that can empower social scientists with new insights or new tools for applying modern methods of causal inference in their research. Coming from a practical social science perspective, this review is a welcome addition to the list of 33 other reviews of Causality, which tend to be more philosophical. see http://bayes.cs.ucla.edu/BOOK-2K/book_review.html

I am particularly gratified by Elwert’s final remarks:
“Pearl’s language empowers social scientists to communicate causal models with each other across sub-disciplines…and enables social scientists to communicate more effectively with statistical methodologists.”

November 29, 2014

On the First Law of Causal Inference

Filed under: Counterfactual,Definition,Discussion,General — judea @ 3:53 am

In several papers and lectures I have used the rhetorical title “The First Law of Causal Inference” when referring to the structural definition of counterfactuals:

The more I talk with colleagues and students, the more I am convinced that the equation deserves the title. In this post, I will explain why.

As many readers of Causality (Ch. 7) would recognize, Eq. (1) defines the potential-outcome, or counterfactual, Y_x(u) in terms of a structural equation model M and a submodel, M_x, in which the equations determining X is replaced by a constant X=x. Computationally, the definition is straightforward. It says that, if you want to compute the counterfactual Y_x(u), namely, to predict the value that Y would take, had X been x (in unit U=u), all you need to do is, first, mutilate the model, replace the equation for X with X=x and, second, solve for Y. What you get IS the counterfactual Y_x(u). Nothing could be simpler.

So, why is it so “fundamental”? Because from this definition we can also get probabilities on counterfactuals (once we assign probabilities, P(U=u), to the units), joint probabilities of counterfactuals and observables, conditional independencies over counterfactuals, graphical visualization of potential outcomes, and many more. [Including, of course, Rubin’s “science”, Pr(X,Y(0),(Y1))]. In short, we get everything that an astute causal analyst would ever wish to define or estimate, given that he/she is into solving serious problems in causal analysis, say policy analysis, or attribution, or mediation. Eq. (1) is “fundamental” because everything that can be said about counterfactuals can also be derived from this definition.
[See the following papers for illustration and operationalization of this definition:
http://ftp.cs.ucla.edu/pub/stat_ser/r431.pdf
http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf
http://ftp.cs.ucla.edu/pub/stat_ser/r370.pdf
also, Causality chapter 7.]

However, it recently occurred on me that the conceptual significance of this definition is not fully understood among causal analysts, not only among “potential outcome” enthusiasts, but also among structural equations researchers who practice causal analysis in the tradition of Sewall Wright, O.D. Duncan, and Trygve Haavelmo. Commenting on the flood of methods and results that emerge from this simple definition, some writers view it as a mathematical gimmick that, while worthy of attention, need to be guarded with suspicion. Others labeled it “an approach” that need be considered together with “other approaches” to causal reasoning, but not as a definition that justifies and unifies those other approaches.

Even authors who advocate a symbiotic approach to causal inference — graphical and counterfactuals — occasionally fail to realize that the definition above provides the logic for any such symbiosis, and that it constitutes in fact the semantical basis for the potential-outcome framework.

I will start by addressing the non-statisticians among us; i.e., economists, social scientists, psychometricians, epidemiologists, geneticists, metereologists, environmental scientists and more, namely, empirical scientists who have been trained to build models of reality to assist in analyzing data that reality generates. To these readers I want to assure that, in talking about model M, I am not talking about a newly invented mathematical object, but about your favorite and familiar model that has served as your faithful oracle and guiding light since college days, the one that has kept you cozy and comfortable whenever data misbehaved. Yes, I am talking about the equation

that you put down when your professor asked: How would household spending vary with income, or, how would earning increase with education, or how would cholesterol level change with diet, or how would the length of the spring vary with the weight that loads it. In short, I am talking about innocent equations that describe what we assume about the world. They now call them “structural equations” or SEM in order not to confuse them with regression equations, but that does not make them more of a mystery than apple pie or pickled herring. Admittedly, they are a bit mysterious to statisticians, because statistics textbooks rarely acknowledge their existence [Historians of statistics, take notes!] but, otherwise, they are the most common way of expressing our perception of how nature operates: A society of equations, each describing what nature listens to before determining the value it assigns to each variable in the domain.

Why am I elaborating on this perception of nature? To allay any fears that what is put into M is some magical super-smart algorithm that computes counterfactuals to impress the novice, or to spitefully prove that potential outcomes need no SUTVA, nor manipulation, nor missing data imputation; M is none other but your favorite model of nature and, yet, please bear with me, this tiny model is capable of generating, on demand, all conceivable counterfactuals: Y(0),Y(1), Y_x, Y_{127}, X_z, Z(X(y)) etc. on and on. Moreover, every time you compute these potential outcomes using Eq. (1) they will obey the consistency rule, and their probabilities will obey the laws of probability calculus and the graphoid axioms. And, if your model justifies “ignorability” or “conditional ignorability,” these too will be respected in the generated counterfactuals. In other words, ignorability conditions need not be postulated as auxiliary constraints to justify the use of available statistical methods; no, they are derivable from your own understanding of how nature operates.

In short, it is a miracle.

Not really! It should be self evident. Couterfactuals must be built on the familiar if we wish to explain why people communicate with counterfactuals starting at age 4 (“Why is it broken?” “Lets pretend we can fly”). The same applies to science; scientists have communicated with counterfactuals for hundreds of years, even though the notation and mathematical machinery needed for handling counterfactuals were made available to them only in the 20th century. This means that the conceptual basis for a logic of counterfactuals resides already within the scientific view of the world, and need not be crafted from scratch; it need not divorce itself from the scientific view of the world. It surely should not divorce itself from scientific knowledge, which is the source of all valid assumptions, or from the format in which scientific knowledge is stored, namely, SEM.

Here I am referring to people who claim that potential outcomes are not explicitly represented in SEM, and explicitness is important. First, this is not entirely true. I can see (Y(0), Y(1)) in the SEM graph as explicitly as I see whether ignorability holds there or not. [See, for example, Fig. 11.7, page 343 in Causality]. Second, once we accept SEM as the origin of potential outcomes, as defined by Eq. (1), counterfactual expressions can enter our mathematics proudly and explicitly, with all the inferential machinery that the First Law dictates. Third, consider by analogy the teaching of calculus. It is feasible to teach calculus as a stand-alone symbolic discipline without ever mentioning the fact that y'(x) is the slope of the function y=f(x) at point x. It is feasible, but not desirable, because it is helpful to remember that f(x) comes first, and all other symbols of calculus, e.g., f'(x), f”(x), [f(x)/x]’, etc. are derivable from one object, f(x). Likewise, all the rules of differentiation are derived from interpreting y'(x) as the slope of y=f(x).

Where am I heading?
First, I would have liked to convince potential outcome enthusiasts that they are doing harm to their students by banning structural equations from their discourse, thus denying them awareness of the scientific basis of potential outcomes. But this attempted persuasion has been going on for the past two decades and, judging by the recent exchange with Guido Imbens (link), we are not closer to an understanding than we were in 1995. Even an explicit demonstration of how a toy problem would be solved in the two languages (link) did not yield any result.

Second, I would like to call the attention of SEM practitioners, including of course econometricians, quantitative psychologists and political scientists, and explain the significance of Eq. (1) in their fields. To them, I wish to say: If you are familiar with SEM, then you have all the mathematical machinery necessary to join the ranks of modern causal analysis; your SEM equations (hopefully in nonparametric form) are the engine for generating and understanding counterfactuals.; True, your teachers did not alert you to this capability; it is not their fault, they did not know of it either. But you can now take advantage of what the First Law of causal inference tells you. You are sitting on a gold mine, use it.

Finally, I would like to reach out to authors of traditional textbooks who wish to introduce a chapter or two on modern methods of causal analysis. I have seen several books that devote 10 chapters on SEM framework: identification, structural parameters, confounding, instrumental variables, selection models, exogeneity, model misspecification, etc., and then add a chapter to introduce potential outcomes and cause-effect analyses as useful new comers, yet alien to the rest of the book. This leaves students to wonder whether the first 10 chapters were worth the labor. Eq. (1) tells us that modern tools of causal analysis are not new comers, but follow organically from the SEM framework. Consequently, one can leverage the study of SEM to make causal analysis more palatable and meaningful.

Please note that I have not mentioned graphs in this discussion; the reason is simple, graphical modeling constitutes The Second Law of Causal Inference.

Enjoy both,
Judea

November 16, 2014

On DAGs, Instruments and Social Networks

Filed under: General — eb @ 4:26 pm

Apropos the lively discussion we have had here on graphs and IV models, Felix Elwert writes:

“Dear Judea, here’s an IV paper using DAGs that we recently published. We use DAGs to evaluate genes as instrumental variables for causal peer effects in social networks. The substantive question is whether obesity is contagious among friends. We evaluated both single-IV and IV-set candidates. We found DAGs especially helpful for evaluating variations within a large class of qualitative data-generating processes by playing through a lots of increasingly realistic variations of our main DGPs (Table 1). Being able to discuss identification purely in terms of qualitative causal statements (e.g., “fat genes may affect latent causes of friendship formation”) was very helpful. One of our goals was to check far one can relax the DGP before identification would break down. After permitting a host of potential hurdles (i.e., eliminating exclusions left and right by drawing lots of additional arrows), we concluded that genes alone won’t realistically work, but that time-varying gene expression might give valid IVs for peer effects. Under linearity, we found suggestive evidence for the transmission of obesity in social networks.”

All the best,

Felix

November 9, 2014

Causal inference without graphs

Filed under: Counterfactual,Discussion,Economics,General — moderator @ 3:45 am

In a recent posting on this blog, Elias and Bryant described how graphical methods can help decide if a pseudo-randomized variable, Z, qualifies as an instrumental variable, namely, if it satisfies the exogeneity and exclusion requirements associated with the definition of an instrument. In this note, I aim to describe how inferences of this type can be performed without graphs, using the language of potential outcome. This description should give students of causality an objective comparison of graph-less vs. graph-based inferences. See my exchange with Guido Imbens [here].

Every problem of causal inference must commence with a set of untestable, theoretical assumptions that the modeler is prepared to defend on scientific grounds. In structural modeling, these assumptions are encoded in a causal graph through missing arrows and missing latent variables. Graphless methods encode these same assumptions symbolically, using two types of statements:

1. Exclusion restrictions, and
2. Conditional independencies among observable and potential outcomes.

For example, consider the causal Markov chain which represents the structural equations:

with and being omitted factors such that X, , are mutually independent.

These same assumptions can also be encoded in the language of counterfactuals, as follows:

(3) represents the missing arrow from X to Z, and (4)-(6) convey the mutual independence of X, , and .
[Remark: General rules for translating graphical models to counterfactual notation are given in Pearl (2009, pp. 232-234).]

Assume now that we are given the four counterfactual statements (3)-(6) as a specification of a model; What machinery can we use to answer questions that typically come up in causal inference tasks? One such question is, for example, is the model testable? In other words, is there an empirical test conducted on the observed variables X, Y, and Z that could prove (3)-(6) wrong? We note that none of the four defining conditions (3)-(6) is testable in isolation, because each invokes an unmeasured counterfactual entity. On the other hand, the fact the equivalent graphical model advertises the conditional independence of X and Z given Y, X _||_ Z | Y, implies that the combination of all four counterfactual statements should yield this testable implication.

Another question often posed to causal inference is that of identifiability, for example, whether the
causal effect of X on Z is estimable from observational studies.

Whereas graphical models enjoy inferential tools such as d-separation and do-calculus, potential-outcome specifications can use the axioms of counterfactual logic (Galles and Pearl 1998, Halpern, 1998) to determine identification and testable implication. In a recent paper, I have combined the graphoid and counterfactual axioms to provide such symbolic machinery (link).

However, the aim of this note is not to teach potential outcome researchers how to derive the logical consequences of their assumptions but, rather, to give researchers the flavor of what these derivation entail, and the kind of problems the potential outcome specification presents vis a vis the graphical representation.

As most of us would agree, the chain appears more friendly than the 4 equations in (3)-(6), and the reasons are both representational and inferential. On the representational side we note that it would take a person (even an expert in potential outcome) a pause or two to affirm that (3)-(6) indeed represent the chain process he/she has in mind. More specifically, it would take a pause or two to check if some condition is missing from the list, or whether one of the conditions listed is redundant (i.e., follows logically from the other three) or whether the set is consistent (i.e., no statement has its negation follows from the other three). These mental checks are immediate in the graphical representation; the first, because each link in the graph corresponds to a physical process in nature, and the last two because the graph is inherently consistent and non-redundant. As to the inferential part, using the graphoid+counterfactual axioms as inference rule is computationally intractable. These axioms are good for confirming a derivation if one is proposed, but not for finding a derivation when one is needed.

I believe that even a cursory attempt to answer research questions using (3)-(5) would convince the reader of the merits of the graphical representation. However, the reader of this blog is already biased, having been told that (3)-(5) is the potential-outcome equivalent of the chain X—>Y—>Z. A deeper appreciation can be reached by examining a new problem, specified in potential- outcome vocabulary, but without its graphical mirror.

Assume you are given the following statements as a specification.

It represents a familiar model in causal analysis that has been throughly analyzed. To appreciate the power of graphs, the reader is invited to examine this representation above and to answer a few questions:

a) Is the process described familiar to you?
b) Which assumption are you willing to defend in your interpretation of the story.
c) Is the causal effect of X on Y identifiable?
d) Is the model testable?

I would be eager to hear from readers
1. if my comparison is fair.
2. which argument they find most convincing.

Next Page »

Powered by WordPress