Causal Analysis in Theory and Practice

January 29, 2020

On Imbens’s Comparison of Two Approaches to Empirical Economics

Filed under: Counterfactual,d-separation,DAGs,do-calculus,Imbens — judea @ 11:00 pm

Many readers have asked for my reaction to Guido Imbens’s recent paper, titled, “Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics,” arXiv.19071v1 [stat.ME] 16 Jul 2019.

The note below offers brief comments on Imbens’s five major claims regarding the superiority of potential outcomes [PO] vis a vis directed acyclic graphs [DAGs].

These five claims are articulated in Imbens’s introduction (pages 1-3). [Quoting]:

” … there are five features of the PO framework that may be behind its current popularity in economics.”

I will address them sequentially, first quoting Imbens’s claims, then offering my counterclaims.

I will end with a comment on Imbens’s final observation, concerning the absence of empirical evidence in a “realistic setting” to demonstrate the merits of the DAG approach.

Before we start, however, let me clarify that there is no such thing as a “DAG approach.” Researchers using DAGs follow an approach called  Structural Causal Model (SCM), which consists of functional relationships among variables of interest, and of which DAGs are merely a qualitative abstraction, spelling out the arguments in each function. The resulting graph can then be used to support inference tools such as d-separation and do-calculus. Potential outcomes are relationships derived from the structural model and several of their properties can be elucidated using DAGs. These interesting relationships are summarized in chapter 7 of (Pearl, 2009a) and in a Statistical Survey overview (Pearl, 2009c)


Imbens’s Claim # 1
“First, there are some assumptions that are easily captured in the PO framework relative to the DAG approach, and these assumptions are critical in many identification strategies in economics. Such assumptions include
monotonicity ([Imbens and Angrist, 1994]) and other shape restrictions such as convexity or concavity ([Matzkin et al.,1991, Chetverikov, Santos, and Shaikh, 2018, Chen, Chernozhukov, Fernández-Val, Kostyshak, and Luo, 2018]). The instrumental variables setting is a prominent example, and I will discuss it in detail in Section 4.2.”

Pearl’s Counterclaim # 1
It is logically impossible for an assumption to be “easily captured in the PO framework” and not simultaneously be “easily captured” in the “DAG approach.” The reason is simply that the latter embraces the former and merely enriches it with graph-based tools. Specifically, SCM embraces the counterfactual notation Yx that PO deploys, and does not exclude any concept or relationship definable in the PO approach.

Take monotonicity, for example. In PO, monotonicity is expressed as

Yx (u) ≥ Yx’ (u) for all u and all x > x’

In the DAG approach it is expressed as:

Yx (u) ≥ Yx’ (u) for all u and all x > x’

(Taken from Causality pages 291, 294, 398.)

The two are identical, of course, which may seem surprising to PO folks, but not to DAG folks who know how to derive the counterfactuals Yx from structural models. In fact, the derivation of counterfactuals in
terms of structural equations (Balke and Pearl, 1994) is considered one of the fundamental laws of causation in the SCM framework see (Bareinboim and Pearl, 2016) and (Pearl, 2015).

Imbens’s Claim # 2
“Second, the potential outcomes in the PO framework connect easily to traditional approaches to economic models such as supply and demand settings where potential outcome functions are the natural primitives. Related to this, the insistence of the PO approach on manipulability of the causes, and its attendant distinction between non-causal attributes and causal variables has resonated well with the focus in empirical work on policy relevance ([Angrist and Pischke, 2008, Manski, 2013]).”

Pearl’s Counterclaim #2
Not so. The term “potential outcome” is a late comer to the economics literature of the 20th century, whose native vocabulary and natural primitives were functional relationships among variables, not potential outcomes. The latters are defined in terms of a “treatment assignment” and hypothetical outcome, while the formers invoke only observable variables like “supply” and “demand”. Don Rubin cited this fundamental difference as sufficient reason for shunning structural equation models, which he labeled “bad science.”

While it is possible to give PO interpretation to structural equations, the interpretation is both artificial and convoluted, especially in view of PO insistence on manipulability of causes. Haavelmo, Koopman and Marschak would not hesitate for a moment to write the structural equation:

Damage = f (earthquake intensity, other factors).

PO researchers, on the other hand, would spend weeks debating whether earthquakes have “treatment assignments” and whether we can legitimately estimate the “causal effects” of earthquakes. Thus, what Imbens perceives as a helpful distinction is, in fact, an unnecessary restriction that suppresses natural scientific discourse. See also (Pearl, 2018; 2019).

Imbens’s Claim #3
“Third, many of the currently popular identification strategies focus on models with relatively few (sets of) variables, where identification questions have been worked out once and for all.”

Pearl’s Counterclaim #3

First, I would argue that this claim is actually false. Most IV strategies that economists use are valid “conditional on controls” (see examples listed in Imbens (2014))  and the criterion that distinguishes “good controls” from “bad controls” is not trivial to articulate without the help of graphs. (See, A Crash Course in Good and Bad Control). It can certainly not be discerned “once and for all”.

Second, even if economists are lucky to guess “good controls,” it is still unclear whether they focus  on relatively few variables because, lacking graphs, they cannot handle more variables, or do they refrain from using graphs to hide the opportunities missed by focusing on few pre-fabricated, “once and for all” identification strategies.

I believe both apprehensions play a role in perpetuating the graph-avoiding subculture among economists. I have elaborated on this question here: (Pearl, 2014).

Imbens’s Claim # 4
“Fourth, the PO framework lends itself well to accounting for treatment effect heterogeneity in estimands ([Imbens and Angrist, 1994, Sekhon and Shem-Tov, 2017]) and incorporating such heterogeneity in estimation and the design of optimal policy functions ([Athey and Wager, 2017, Athey, Tibshirani, Wager, et al., 2019, Kitagawa and Tetenov, 2015]).”

Pearl’s Counterclaim #4
Indeed, in the early 1990s, economists felt ecstatic liberating themselves from the linear tradition of structural equation models and finding a framework (PO) that allowed them to model treatment effect heterogeneity.

However, whatever role treatment heterogeneity played in this excitement should have been amplified ten-fold in 1995, when completely non parametric structural equation models came into being, in which non-linear interactions and heterogeneity were assumed a priori. Indeed, the tools developed in the econometric literature cover only a fraction of the treatment-heterogeneity tasks that are currently managed by SCM. In particular, the latter includes such problems as “necessary and sufficient” causation, mediation, external validity, selection bias and more.

Speaking more generally, I find it odd for a discipline to prefer an “approach” that rejects tools over one that invites and embraces tools.

Imbens’s claim #5
“Fifth, the PO approach has traditionally connected well with design, estimation, and inference questions. From the outset Rubin and his coauthors provided much guidance to researchers and policy makers for practical implementation including inference, with the work on the propensity score ([Rosenbaum and Rubin, 1983b]) an influential example.”

Pearl’s Counterclaim #5
The initial work of Rubin and his co-authors has indeed provided much needed guidance to researchers and policy makers who were in a state of desperation, having no other mathematical notation to express causal questions of interest. That happened because economists were not aware of the counterfactual content of structural equation models, and of the non-parametric extension of those models.

Unfortunately, the clumsy and opaque notation introduced in this initial work has become a ritual in the PO framework that has prevailed, and the refusal to commence the analysis with meaningful assumptions has led to several blunders and misconceptions. One such misconception has been propensity score analysis which researchers have taken as a tool for reducing confounding bias. I have elaborated on this misguidance in Causality, Section 11.3.5, “Understanding Propensity Scores” (Pearl, 2009a).

Imbens’s final observation: Empirical Evidence
“Separate from the theoretical merits of the two approaches, another reason for the lack of adoption in economics is that the DAG literature has not shown much evidence of the benefits for empirical practice in settings that are important in economics. The potential outcome studies in MACE, and the chapters in [Rosenbaum, 2017], CISSB and MHE have detailed empirical examples of the various identification strategies proposed. In realistic settings they demonstrate the merits of the proposed methods and describe in detail the corresponding estimation and inference methods. In contrast in the DAG literature, TBOW, [Pearl, 2000], and [Peters, Janzing, and Schölkopf, 2017] have no substantive empirical examples, focusing largely on identification questions in what TBOW refers to as “toy” models. Compare the lack of impact of the DAG literature in economics with the recent embrace of regression discontinuity designs imported from the psychology literature, or with the current rapid spread of the machine learning methods from computer science, or the recent quick adoption of synthetic control methods [Abadie, Diamond, and Hainmueller, 2010]. All came with multiple concrete examples that highlighted their benefits over traditional methods. In the absence of such concrete examples the toy models in the DAG literature sometimes appear to be a set of solutions in search of problems, rather than a set of solutions for substantive problems previously posed in social sciences.”

Pearl’s comments on: Empirical Evidence
There is much truth to Imbens’s observation. The PO excitement that swept natural experimentalists in the 1990s came with outright rejection of graphical models. The hundreds, if not thousands, of empirical economists who plunged into empirical work, were warned repeatedly that graphical models may be “ill-defined,” “deceptive,” and “confusing,” and structural models have no scientific underpinning (see (Pearl, 19952009b)). Not a single paper in the econometric literature has acknowledged the existence of SCM as an alternative or complementary approach to PO.

The result has been the exact opposite of what has taken place in epidemiology where DAGs became a second language to both scholars and field workers, [Due in part to the influential 1999 paper by Greenland, Pearl and Robins.] In contrast, PO-led economists have launched a massive array of experimental programs lacking graphical tools for guidance. I would liken it to a Phoenician armada exploring the Atlantic coast in leaky boats and no compass to guide its way.

This depiction might seem pretentious and overly critical, considering the pride with which natural experimentalists take in the results of their studies (though no objective verification of validity can be undertaken.) Yet looking back at the substantive empirical examples listed by Imbens, one cannot but wonder how much more credible those studies could have been with graphical tools to guide the way. These include a friendly language to communicate assumptions, powerful means to test their implications, and ample opportunities to uncover new natural experiments (Brito and Pearl, 2002).

Summary and Recommendation 

The thrust of my reaction to Imbens’s article is simple:

It is unreasonable to prefer an “approach” that rejects tools over one that invites and embraces tools.

Technical comparisons of the PO and SCM approaches, using concrete examples, have been published since 1993 in dozens of articles and books in computer science, statistics, epidemiology, and social science, yet none in the econometric literature. Economics students are systematically deprived of even the most elementary graphical tools available to other researchers, for example, to determine if one variable is independent of another given a third, or if a variable is a valid IV given a set S of observed variables.

This avoidance can no longer be justified by appealing to “We have not found this [graphical] approach to aid the drawing of causal inferences” (Imbens and Rubin, 2015, page 25).

To open an effective dialogue and a genuine comparison between the two approaches, I call on Professor Imbens to assume leadership in his capacity as Editor in Chief of Econometrica and invite a comprehensive survey paper on graphical methods for the front page of his Journal. This is how creative editors move their fields forward.

References
Balke, A. and Pearl, J. “Probabilistic Evaluation of Counterfactual Queries,” In Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA, Volume I, 230-237, July 31 – August 4, 1994.

Brito, C. and Pearl, J. “General instrumental variables,” In A. Darwiche and N. Friedman (Eds.), Uncertainty in Artificial Intelligence, Proceedings of the Eighteenth Conference, Morgan Kaufmann: San Francisco, CA, 85-93, August 2002.

Bareinboim, E. and Pearl, J. “Causal inference and the data-fusion problem,” Proceedings of the National Academy of Sciences, 113(27): 7345-7352, 2016.

Greenland, S., Pearl, J., and Robins, J. “Causal diagrams for epidemiologic research,” Epidemiology, Vol. 1, No. 10, pp. 37-48, January 1999.

Imbens, G. “Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics,” arXiv.19071v1 [stat.ME] 16 Jul 2019.

Imbens, G. and Rubin, D. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge, MA: Cambridge University Press; 2015.

Imbens, Guido W. Instrumental Variables: An Econometrician’s Perspective. Statist. Sci. 29 (2014), no. 3, 323–358. doi:10.1214/14-STS480. https://projecteuclid.org/euclid.ss/1411437513

Pearl, J. “Causal diagrams for empirical research,” (With Discussions), Biometrika, 82(4): 669-710, 1995.

Pearl, J. “Understanding Propensity Scores” in J. Pearl’s Causality: Models, Reasoning, and Inference, Section 11.3.5, Second edition, NY: Cambridge University Press, pp. 348-352, 2009a.

Pearl, J. “Myth, confusion, and science in causal analysis,” University of California, Los Angeles, Computer Science Department, Technical Report R-348, May 2009b.

Pearl, J. “Causal inference in statistics: An overview”  Statistics Surveys, Vol. 3, 96–146, 2009c.


Pearl, J. “Are economists smarter than epidemiologists? (Comments on Imbens’s recent paper),” Causal Analysis in Theory and Practice Blog, October 27, 2014.

Pearl, J. “Trygve Haavelmo and the Emergence of Causal Calculus,” Econometric Theory, 31: 152-179, 2015.

Pearl, J. “Does obesity shorten life? Or is it the Soda? On non-manipulable causes,” Journal of Causal Inference, Causal, Casual, and Curious Section, 6(2), online, September 2018.

Pearl, J. “On the interpretation of do(x),” Journal of Causal Inference, Causal, Casual, and Curious Section, 7(1), online, March 2019.

July 23, 2015

Indirect Confounding and Causal Calculus (On three papers by Cox and Wermuth)

Filed under: Causal Effect,Definition,Discussion,do-calculus — eb @ 4:52 pm

1. Introduction

This note concerns three papers by Cox and Wermuth (2008; 2014; 2015 (hereforth WC‘08, WC‘14 and CW‘15)) in which they call attention to a class of problems they named “indirect confounding,” where “a much stronger distortion may be introduced than by an unmeasured confounder alone or by a selection bias alone.” We will show that problems classified as “indirect confounding” can be resolved in just a few steps of derivation in do-calculus.

This in itself would not have led me to post a note on this blog, for we have witnessed many difficult problems resolved by formal causal analysis. However, in their three papers, Cox and Wermuth also raise questions regarding the capability and/or adequacy of the do-operator and do-calculus to accurately predict effects of interventions. Thus, a second purpose of this note is to reassure students and users of do-calculus that they can continue to apply these tools with confidence, comfort, and scientifically grounded guarantees.

Finally, I would like to invite the skeptic among my colleagues to re-examine their hesitations and accept causal calculus for what it is: A formal representation of interventions in real world situations, and a worthwhile tool to acquire, use and teach. Among those skeptics I must include colleagues from the potential-outcome camp, whose graph-evading theology is becoming increasing anachronistic (see discussions on this blog, for example, here).

2 Indirect Confounding – An Example

To illustrate indirect confounding, Fig. 1 below depicts the example used in WC‘08, which involves two treatments, one randomized (X), and the other (Z) taken in response to an observation (W) which depends on X. The task is to estimate the direct effect of X on the primary outcome (Y), discarding the effect transmitted through Z.

As we know from elementary theory of mediation (e.g., Causality, p. 127) we cannot block the effect transmitted through Z by simply conditioning on Z, for that would open the spurious path X → W ← U → Y , since W is a collider whose descendant (Z) is instantiated. Instead, we need to hold Z constant by external means, through the do-operator do(Z = z). Accordingly, the problem of estimating the direct effect of X on Y amounts to finding P(y|do(x, z)) since Z is the only other parent of Y (see Pearl (2009, p. 127, Def. 4.5.1)).


Figure 1: An example of “indirect confounding” from WC‘08. Z stands for a treatment taken in response to a test W, whose outcome depend ends on a previous treatment X. U is unobserved. [WC‘08 attribute this example to Robins and Wasserman (1997); an identical structure is treated in Causality, p. 119, Fig. 4.4, as well as in Pearl and Robins (1995).]

Solution:
     P(y|do(x,z))
    =P(y|x, do(z))                             (since X is randomized)
    = ∑w P(Y|x,w,do(z))P(w|x, do(z))         (by Rule 1 of do-calculus)
    = ∑w P(Y|x,w,z)P(w|x)               (by Rule 2 and Rule 3 of do-calculus)

We are done, because the last expression consists of estimable factors. What makes this problem appear difficult in the linear model treated by WC‘08 is that the direct effect of X on Y (say α) cannot be identified using a simple adjustment. As we can see from the graph, there is no set S that separates X from Y in Gα. This means that α cannot be estimated as a coefficient in a regression of Y on X and S. Readers of Causality, Chapter 5, would not panic by such revelation, knowing that there are dozens of ways to identify a parameter, going way beyond adjustment (surveyed in Chen and Pearl (2014)). WC‘08 identify α using one of these methods, and their solution coincides of course with the general derivation given above.

The example above demonstrates that the direct effect of X on Y (as well as Z on Y ) can be identified nonparametrically, which extends the linear analysis of WC‘08. It also demonstrates that the effect is identifiable even if we add a direct effect from X to Z, and even if there is an unobserved confounder between X and W – the derivation is almost the same (see Pearl (2009, p. 122)).

Most importantly, readers of Causality also know that, once we write the problem as “Find P(y|do(x, z))” it is essentially solved, because the completeness of the do-calculus together with the algorithmic results of Tian and Shpitser can deliver the answer in polynomial time, and, if terminated with failure, we are assured that the effect is not estimable by any method whatsoever.

3 Conclusions

It is hard to explain why tools of causal inference encounter slower acceptance than tools in any other scientific endeavor. Some say that the difference comes from the fact that humans are born with strong causal intuitions and, so, any formal tool is perceived as a threatening intrusion into one’s private thoughts. Still, the reluctance shown by Cox and Wermuth seems to be of a different kind. Here are a few examples:

Cox and Wermuth (CW’15) write:
“…some of our colleagues have derived a ‘causal calculus’ for the challenging
process of inferring causality; see Pearl (2015). In our view, it is unlikely that
a virtual intervention on a probability distribution, as specified in this calculus,
is an accurate representation of a proper intervention in a given real world
situation.” (p. 3)

These comments are puzzling because the do-operator and its associated “causal calculus” operate not “on a probability distribution,” but on a data generating model (i.e., the DAG). Likewise, the calculus is used, not for “inferring causality” (God forbid!!) but for predicting the effects of interventions from causal assumptions that are already encoded in the DAG.

In WC‘14 we find an even more puzzling description of “virtual intervention”:
“These recorded changes in virtual interventions, even though they are often
called ‘causal effects,’ may tell next to nothing about actual effects in real interventions
with, for instance, completely randomized allocation of patients to
treatments. In such studies, independence result by design and they lead to
missing arrows in well-fitting graphs; see for example Figure 9 below, in the last
subsection.” [our Fig. 1]

“Familiarity is the mother of acceptance,” say the sages (or should have said). I therefore invite my colleagues David Cox and Nanny Wermuth to familiarize themselves with the miracles of do-calculus. Take any causal problem for which you know the answer in advance, submit it for analysis through the do-calculus and marvel with us at the power of the calculus to deliver the correct result in just 3–4 lines of derivation. Alternatively, if we cannot agree on the correct answer, let us simulate it on a computer, using a well specified data-generating model, then marvel at the way do-calculus, given only the graph, is able to predict the effects of (simulated) interventions. I am confident that after such experience all hesitations will turn into endorsements.

BTW, I have offered this exercise repeatedly to colleagues from the potential outcome camp, and the response was uniform: “we do not work on toy problems, we work on real-life problems.” Perhaps this note would entice them to join us, mortals, and try a small problem once, just for sport.

Let’s hope,

Judea

References

Chen, B. and Pearl, J. (2014). Graphical tools for linear structural equation modeling. Tech. Rep. R-432, , Department of Com- puter Science, University of California, Los Angeles, CA. Forthcoming, Psychometrika.
Cox, D. and Wermuth, N. (2015). Design and interpretation of studies: Relevant concepts from the past and some extensions. Observational Studies This issue.
Pearl, J. (2009). Causality: Models, Reasoning, and Inference. 2nd ed. Cambridge Uni- versity Press, New York.
Pearl, J. (2015). Trygve Haavelmo and the emergence of causal calculus. Econometric Theory 31 152–179. Special issue on Haavelmo Centennial.
Pearl, J. and Robins, J. (1995). Probabilistic evaluation of sequential plans from causal models with hidden variables. In Uncertainty in Artificial Intelligence 11 (P. Besnard and S. Hanks, eds.). Morgan Kaufmann, San Francisco, 444–453.
Robins, J. M. and Wasserman, L. (1997). Estimation of effects of sequential treatments by reparameterizing directed acyclic graphs. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI ‘97). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 409–420.
Wermuth, N. and Cox, D. (2008). Distortion of effects caused by indirect confounding. Biometrika 95 17–33.
Wermuth, N. and Cox, D. (2014). Graphical Markov models: Overview. ArXiv: 1407.7783.

April 24, 2015

Flowers of the First Law of Causal Inference (3)

Flower 3 — Generalizing experimental findings

Continuing our examination of “the flowers of the First Law” (see previous flowers here and here) this posting looks at one of the most crucial questions in causal inference: “How generalizable are our randomized clinical trials?” Readers of this blog would be delighted to learn that one of our flowers provides an elegant and rather general answer to this question. I will describe this answer in the context of transportability theory, and compare it to the way researchers have attempted to tackle the problem using the language of ignorability. We will see that ignorability-type assumptions are fairly limited, both in their ability to define conditions that permit generalizations, and in our ability to justify them in specific applications.

1. Transportability and Selection Bias
The problem of generalizing experimental findings from the trial sample to the population as a whole, also known as the problem of “sample selection-bias” (Heckman, 1979; Bareinboim et al., 2014), has received wide attention lately, as more researchers come to recognize this bias as a major threat to the validity of experimental findings in both the health sciences (Stuart et al., 2015) and social policy making (Manski, 2013).

Since participation in a randomized trial cannot be mandated, we cannot guarantee that the study population would be the same as the population of interest. For example, the study population may consist of volunteers, who respond to financial and medical incentives offered by pharmaceutical firms or experimental teams, so, the distribution of outcomes in the study may differ substantially from the distribution of outcomes under the policy of interest.

Another impediment to the validity of experimental finding is that the types of individuals in the target population may change over time. For example, as more individuals become eligible for health insurance, the types of individuals seeking services would no longer match the type of individuals that were sampled for the study. A similar change would occur as more individuals become aware of the efficacy of the treatment. The result is an inherent disparity between the target population and the population under study.

The problem of generalizing across disparate populations has received a formal treatment in (Pearl and Bareinboim, 2014) where it was labeled “transportability,” and where necessary and sufficient conditions for valid generalization were established (see also Bareinboim and Pearl, 2013). The problem of selection bias, though it has some unique features, can also be viewed as a nuance of the transportability problem, thus inheriting all the theoretical results established in (Pearl and Bareinboim, 2014) that guarantee valid generalizations. We will describe the two problems side by side and then return to the distinction between the type of assumptions that are needed for enabling generalizations.

The transportability problem concerns two dissimilar populations, Π and Π, and requires us to estimate the average causal effect P(yx) (explicitly: P(yx) ≡ P(Y = y|do(X = x)) in the target population Π, based on experimental studies conducted on the source population Π. Formally, we assume that all differences between Π and Π can be attributed to a set of factors S that produce disparities between the two, so that P(yx) = P(yx|S = 1). The information available to us consists of two parts; first, treatment effects estimated from experimental studies in Π and, second, observational information extracted from both Π and Π. The former can be written P(y|do(x),z), where Z is set of covariates measured in the experimental study, and the latters are written P(x, y, z) = P (x, y, z|S = 1), and P (x, y, z) respectively. In addition to this information, we are also equipped with a qualitative causal model M, that encodes causal relationships in Π and Π, with the help of which we need to identify the query P(yx). Mathematically, identification amounts to transforming the query expression

P(yx) = P(y|do(x),S = 1)

into a form derivable from the available information ITR, where

ITR = { P(y|do(x),z),  P(x,y,z|S = 1),   P(x,y,z) }.

The selection bias problem is slightly different. Here the aim is to estimate the average causal effect P(yx) in the Π population, while the experimental information available to us, ISB, comes from a preferentially selected sample, S = 1, and is given by P (y|do(x), z, S = 1). Thus, the selection bias problem calls for transforming the query P(yx) to a form derivable from the information set:

ISB = { P(y|do(x),z,S = 1), P(x,y,z|S = 1), P(x,y,z) }.

In the Appendix section, we demonstrate how transportability problems and selection bias problems are solved using the transformations described above.

The analysis reported in (Pearl and Bareinboim, 2014) has resulted in an algorithmic criterion (Bareinboim and Pearl, 2013) for deciding whether transportability is feasible and, when confirmed, the algorithm produces an estimand for the desired effects. The algorithm is complete, in the sense that, when it fails, a consistent estimate of the target effect does not exist (unless one strengthens the assumptions encoded in M).

There are several lessons to be learned from this analysis when considering selection bias problems.

1. The graphical criteria that authorize transportability are applicable to selection bias problems as well, provided that the graph structures for the two problems are identical. This means that whenever a selection bias problem is characterizes by a graph for which transportability is feasible, recovery from selection bias is feasible by the same algorithm. (The Appendix demonstrates this correspondence).

2. The graphical criteria for transportability are more involved than the ones usually invoked in testing treatment assignment ignorability (e.g., through the back-door test). They may require several d-separation tests on several sub-graphs. It is utterly unimaginable therefore that such criteria could be managed by unaided human judgment, no matter how ingenious. (See discussions with Guido Imbens regarding computational barriers to graph-free causal inference, click here). Graph avoiders, should reckon with this predicament.

3. In general, problems associated with external validity cannot be handled by balancing disparities between distributions. The same disparity between P (x, y, z) and P(x, y, z) may demand different adjustments, depending on the location of S in the causal structure. A simple example of this phenomenon is demonstrated in Fig. 3(b) of (Pearl and Bareinboim, 2014) where a disparity in the average reading ability of two cities requires two different treatments, depending on what causes the disparity. If the disparity emanates from age differences, adjustment is necessary, because age is likely to affect the potential outcomes. If, on the other hand the disparity emanates from differences in educational programs, no adjustment is needed, since education, in itself, does not modify response to treatment. The distinction is made formal and vivid in causal graphs.

4. In many instances, generalizations can be achieved by conditioning on post-treatment variables, an operation that is frowned upon in the potential-outcome framework (Rosenbaum, 2002, pp. 73–74; Rubin, 2004; Sekhon, 2009) but has become extremely useful in graphical analysis. The difference between the conditioning operators used in these two frameworks is echoed in the difference between Qc and Qdo, the two z-specific effects discussed in a previous posting on this blog (link). The latter defines information that is estimable from experimental studies, whereas the former invokes retrospective counterfactual that may or may not be estimable empirically.

In the next Section we will discuss the benefit of leveraging the do-operator in problems concerning generalization.

2. Ignorability versus Admissibility in the Pursuit of Generalization

A key assumption in almost all conventional analyses of generalization (from sample-to-population) is S-ignorability, written Yx ⊥ S|Z where Yx is the potential outcome predicated on the intervention X = x, S is a selection indicator (with S = 1 standing for selection into the sample) and Z a set of observed covariates. This condition, sometimes written as a difference Y1 − Y0 ⊥ S|Z, and sometimes as a conjunction {Y1, Y0} ⊥ S|Z, appears in Hotz et al. (2005); Cole and Stuart (2010); Tipton et al. (2014); Hartman et al. (2015), and possibly other researchers committed to potential-outcome analysis. This assumption says: If we succeed in finding a set Z of pre-treatment covariates such that cross-population differences disappear in every stratum Z = z, then the problem can be solved by averaging over those strata. (Lacking a procedure for finding Z, this solution avoids the harder part of the problem and, in this sense, it somewhat borders on the circular. It amounts to saying: If we can solve the problem in every stratum Z = z then the problem is solved; hardly an informative statement.)

In graphical analysis, on the other hand, the problem of generalization has been studied using another condition, labeled S-admissibility (Pearl and Bareinboim, 2014), which is defined by:

P (y|do(x), z) = P (y|do(x), z, s)

or, using counterfactual notation,

P(yx|zx) = P (yx|zx, sx)

It states that in every treatment regime X = x, the observed outcome Y is conditionally independent of the selection mechanism S, given Z, all evaluated at that same treatment regime.

Clearly, S-admissibility coincides with S-ignorability for pre-treatment S and Z; the two notions differ however for treatment-dependent covariates. The Appendix presents scenarios (Fig. 1(a) and (b)) in which post-treatment covariates Z do not satisfy S-ignorability, but satisfy S-admissibility and, thus, enable generalization to take place. We also present scenarios where both S-ignorability and S-admissibility hold and, yet, experimental findings are not generalizable by standard procedures of post-stratification. Rather the correct procedure is uncovered naturally from the graph structure.

One of the reasons that S-admissibility has received greater attention in the graph-based literature is that it has a very simple graphical representation: Z and X should separate Y from S in a mutilated graph, from which all arrows entering X have been removed. Such a graph depicts conditional independencies among observed variables in the population under experimental conditions, i.e., where X is randomized.

In contrast, S-ignorability has not been given a simple graphical interpretation, but it can be verified from either twin networks (Causality, pp. 213-4) or from counterfactually augmented graphs (Causality, p. 341), as we have demonstrated in an earlier posting on this blog (link). Using either representation, it is easy to see that S-ignorability is rarely satisfied in transportability problems in which Z is a post-treatment variable. This is because, whenever S is a proxy to an ancestor of Z, Z cannot separate Yx from S.

The simplest result of both PO and graph-based approaches is the re-calibration or post-stratification formula. It states that, if Z is a set of pre-treatment covariates satisfying S-ignorability (or S-admissibility), then the causal effect in the population at large can be recovered from a selection-biased sample by a simple re-calibration process. Specifically, if P(yx|S = 1,Z = z) is the z-specific probability distribution of Yx in the sample, then the distribution of Yx in the population at large is given by

P(yx) = ∑z  P(yx|S = 1,z)   P(z)  (*)

where P(z) is the probability of Z = z in the target population (where S = 0). Equation (*) follows from S-ignorability by conditioning on z and, adding S = 1 to the conditioning set – a one-line proof. The proof fails however when Z is treatment dependent, because the counterfactual factor P(yx|S = 1,z) is not normally estimable in the experimental study. (See Qc vs. Qdo discussion here).

As noted in (Keiding, 1987) this re-calibration formula goes back to 18th century demographers (Dale, 1777; Tetens, 1786) facing the task of predicting overall mortality (across populations) from age-specific data. Their reasoning was probably as follows: If the source and target populations differ in distribution by a set of attributes Z, then to correct for these differences we need to weight samples by a factor that would restore similarity to the two distributions. Some researchers view Eq. (*) as a version of Horvitz and Thompson (1952) post-stratification method of estimating the mean of a super-population from un-representative stratified samples. The essential difference between survey sampling calibration and the calibration required in Eq. (*) is that the calibrating covariates Z are not just any set by which the distributions differ; they must satisfy the S-ignorability (or admissibility) condition, which is a causal, not a statistical condition. It is not discernible therefore from distributions over observed variables. In other words, the re-calibration formula should depend on disparities between the causal models of the two populations, not merely on distributional disparities. This is demonstrated explicitly in Fig. 4(c) of (Pearl and Bareinboim, 2014), which is also treated in the Appendix (Fig. 1(a)).

While S-ignorability and S-admissibility are both sufficient for re-calibrating pre-treatment covariates Z, S-admissibility goes further and permits generalizations in cases where Z consists of post-treatment covariates. A simple example is the bio-marker model shown in Fig. 4(c) (Example 3) of (Pearl and Bareinboim, 2014), which is also discussed in the Appendix.

Conclusions

1. Many opportunities for generalization are opened up through the use of post-treatment variables. These opportunities remain inaccessible to ignorability-based analysis, partly because S-ignorability does not always hold for such variables but, mainly, because ignorability analysis requires information in the form of z-specific counterfactuals, which is often not estimable from experimental studies.

2. Most of these opportunities have been chartered through the completeness results for transportability (Bareinboim et al., 2014), others can be revealed by simple derivations in do-calculus as shown in the Appendix.

3. There is still the issue of assisting researchers in judging whether S-ignorability (or S-admissibility) is plausible in any given application. Graphs excel in this dimension because graphs match the format in which people store scientific knowledge. Some researchers prefer to do it by direct appeal to intuition; they do so at their own peril.

For references and appendix, click here.

January 22, 2015

Flowers of the First Law of Causal Inference (2)

Flower 2 — Conditioning on post-treatment variables

In this 2nd flower of the First Law, I share with readers interesting relationships among various ways of extracting information from post-treatment variables. These relationships came up in conversations with readers, students and curious colleagues, so I will present them in a question-answers format.

Question-1
Rule 2 of do-calculus does not distinguish post-treatment from pre-treatment variables. Thus, regardless of the nature of Z, it permits us to replace P (y|do(x), z) with P (y|x, z) whenever Z separates X from Y in a mutilated graph GX (i.e., the causal graph, from which arrows emanating from X are removed). How can this rule be correct, when we know that one should be careful about conditioning on a post treatment variables Z?

Example 1 Consider the simple causal chain X → Y → Z. We know that if we condition on Z (as in case control studies) selected units cease to be representative of the population, and we cannot identify the causal effect of X on Y even when X is randomized. Applying Rule-2 however we get P (y|do(x), z) = P (y|x, z). (Since X and Y are separated in the mutilated graph X Y → Z). This tells us that the causal effect of X on Y IS identifiable conditioned on Z. Something must be wrong here.

To read more, click here.

November 10, 2013

Reflections on Heckman and Pinto’s “Causal Analysis After Haavelmo”

Filed under: Announcement,Counterfactual,Definition,do-calculus,General — moderator @ 4:50 am

A recent article by Heckman and Pinto (HP) (link: http://www.nber.org/papers/w19453.pdf) discusses the do-calculus as a formal operationalization of Haavelmo’s approach to policy intervention. HP replace the do-operator with an equivalent operator, called “fix,” which simulates a Fisherian experiment with randomized “do”. They advocate the use of “fix,” discover limitations in “do,” and inform readers that those limitations disappear in “the Haavelmo approach.”

I examine the logic of HP’s paper, its factual basis, and its impact on econometric research and education (link: http://ftp.cs.ucla.edu/pub/stat_ser/r420.pdf).

August 4, 2012

Causation in Psychological Research

Filed under: Discussion,do-calculus,General — eb @ 3:30 pm

The European Journal of Personality just published an article by James Lee, titled
“Correlation and Causation in the Study of Personality”
European Journal of Personality, Eur.J.Pers. 26: 372-390 (2012) DOI:10.1002/per.1863.
Link: http://onlinelibrary.wiley.com/doi/10.1002/per.1863/pdf,
or here.

Lee’s article is followed by Open Peer Commentaries
http://onlinelibrary.wiley.com/doi/10.1002/per.1865/full,
or here.

(Strikingly, the commentary by Rolf Steyer declares the do-operator to be self-contradictory. I trust readers of this blog to spot Steyer’s error right away. If not, I will post.)

Another recent paper on causation in psychological research is the one by Shadish and Sullivan,
“Theories of Causation in Psychological Science”
In Harris Cooper (Ed-in-Chief), APA Handbook of Research Methods in Psychology, Volume 1, pp. 23-52, 2012.
http://www.cs.ucla.edu/~kaoru/shadish-sullivan12.pdf

While these papers indicate a healthy awakening of psychological researchers to recent advances in causal inference, the field is still dominated by authors who have not heard about model-based covariate selection, testable implications, nonparametric identification, bias amplification, mediation formulas and more.

Much to do, much to discuss,
Judea

February 16, 2004

Submodel for subgraphs of direct effect removal

Filed under: do-calculus — moderator @ 12:00 am

From Susan Scott, Australia

In the do-calculus inference rules, I understand how the subgraph is generated from the submodel do(X = x), Gx, the removal of direct causes and therefore d-separation is a valid test for conditional independence. However I don't understand the submodel for subgraphs representing the removal of direct effects. Would you please explain the submodel I could use to explain this subgraph and what distribution it represents.

December 26, 2000

Has causality been defined?

Filed under: do-calculus,General — moderator @ 12:00 am

From Professor Nozer Singpurwalla, from The George Washington University:

My basic point is that since causality has not been defined, the causal calculus is a technology which could use a foundation. However, the calculus does give useful insights and is thus valuable. Finally, according to my understanding of the causal calculus, I am inclined to state that the calculus of probability is the calculus of causality, notwithstanding Dennis' [Lindley] concerns about Suppes probabilistic causality.

December 20, 2000

Is the do(x) operator universal?

Filed under: do-calculus — moderator @ 12:00 am

From Bill Shipley, Universite de Sherbrooke, (Quebec) CANADA:

In most experiments, the external manipulation consists of adding (or subtracting) some amount from X without removing pre-existing causes of X. For example, adding 5 kg/h of fertilizer to a field, adding 5 mg/l of insulin to subjects etc. Here, the pre-existing causes of the manipulated variable still exert effects but a new variable (M) is added.
… The problem that I see with the do(x) operator as a general operator of external manipulation is that it requires two things: (1) removing any pre-existing causes of x and (2) setting x to some value. This corresponds to some types of external manipulations, but not all (or even most) external manipulations. I would introduce an add(x=n) operator, meaning "add, external to the pre-existing causal process, an amount 'n' of x''. Graphically, this consists of augmenting the pre-existing causal graph with a new edge, namely M-n–>X. Algebraically, this would consist of adding a new term –n– as a cause of X.

November 25, 2000

Can do(x) represent practical experiments?

Filed under: do-calculus — moderator @ 12:00 am

L.B.S., from University of Arizona, questioned whether the do(x) operator can represent realistic actions or experiments: "Even an otherwise perfectly executed randomized experiment may yield perfectly misleading conclusions if, for example, the construct validity of the treatment is zero, e.g., there is a serious confounding. A good example is a study involving injected vitamin E as a treatment for incubated children at risk for retrolental fibroplasia. The randomized experiment indicated efficacy for the injections, but it was soon discovered that the actual effective treatment was opening the pressurized, oxygen-saturated incubators several times per day to give the injections, thus lowering the barometric pressure and oxygen levels in the blood of the infants (Leonard, Major Medical Mistakes). Any statistical analysis would have been misleading in that case."

S.M., from Georgia Institute of Technology, adds:
"Your example of the misleading causal effect, shows the kind of thing that troubles me about the do(x) concept. You do(x) or don't do(x) but something else, and this seems correlated with an effect. But it may be something else that is correlated with do(x) that is the cause and not the do(x) per se."

Powered by WordPress