Causal Analysis in Theory and Practice

November 29, 2014

On the First Law of Causal Inference

Filed under: Counterfactual,Definition,Discussion,General — judea @ 3:53 am

In several papers and lectures I have used the rhetorical title “The First Law of Causal Inference” when referring to the structural definition of counterfactuals:

The more I talk with colleagues and students, the more I am convinced that the equation deserves the title. In this post, I will explain why.

As many readers of Causality (Ch. 7) would recognize, Eq. (1) defines the potential-outcome, or counterfactual, Y_x(u) in terms of a structural equation model M and a submodel, M_x, in which the equations determining X is replaced by a constant X=x. Computationally, the definition is straightforward. It says that, if you want to compute the counterfactual Y_x(u), namely, to predict the value that Y would take, had X been x (in unit U=u), all you need to do is, first, mutilate the model, replace the equation for X with X=x and, second, solve for Y. What you get IS the counterfactual Y_x(u). Nothing could be simpler.

So, why is it so “fundamental”? Because from this definition we can also get probabilities on counterfactuals (once we assign probabilities, P(U=u), to the units), joint probabilities of counterfactuals and observables, conditional independencies over counterfactuals, graphical visualization of potential outcomes, and many more. [Including, of course, Rubin’s “science”, Pr(X,Y(0),(Y1))]. In short, we get everything that an astute causal analyst would ever wish to define or estimate, given that he/she is into solving serious problems in causal analysis, say policy analysis, or attribution, or mediation. Eq. (1) is “fundamental” because everything that can be said about counterfactuals can also be derived from this definition.
[See the following papers for illustration and operationalization of this definition:
also, Causality chapter 7.]

However, it recently occurred on me that the conceptual significance of this definition is not fully understood among causal analysts, not only among “potential outcome” enthusiasts, but also among structural equations researchers who practice causal analysis in the tradition of Sewall Wright, O.D. Duncan, and Trygve Haavelmo. Commenting on the flood of methods and results that emerge from this simple definition, some writers view it as a mathematical gimmick that, while worthy of attention, need to be guarded with suspicion. Others labeled it “an approach” that need be considered together with “other approaches” to causal reasoning, but not as a definition that justifies and unifies those other approaches.

Even authors who advocate a symbiotic approach to causal inference — graphical and counterfactuals — occasionally fail to realize that the definition above provides the logic for any such symbiosis, and that it constitutes in fact the semantical basis for the potential-outcome framework.

I will start by addressing the non-statisticians among us; i.e., economists, social scientists, psychometricians, epidemiologists, geneticists, metereologists, environmental scientists and more, namely, empirical scientists who have been trained to build models of reality to assist in analyzing data that reality generates. To these readers I want to assure that, in talking about model M, I am not talking about a newly invented mathematical object, but about your favorite and familiar model that has served as your faithful oracle and guiding light since college days, the one that has kept you cozy and comfortable whenever data misbehaved. Yes, I am talking about the equation

that you put down when your professor asked: How would household spending vary with income, or, how would earning increase with education, or how would cholesterol level change with diet, or how would the length of the spring vary with the weight that loads it. In short, I am talking about innocent equations that describe what we assume about the world. They now call them “structural equations” or SEM in order not to confuse them with regression equations, but that does not make them more of a mystery than apple pie or pickled herring. Admittedly, they are a bit mysterious to statisticians, because statistics textbooks rarely acknowledge their existence [Historians of statistics, take notes!] but, otherwise, they are the most common way of expressing our perception of how nature operates: A society of equations, each describing what nature listens to before determining the value it assigns to each variable in the domain.

Why am I elaborating on this perception of nature? To allay any fears that what is put into M is some magical super-smart algorithm that computes counterfactuals to impress the novice, or to spitefully prove that potential outcomes need no SUTVA, nor manipulation, nor missing data imputation; M is none other but your favorite model of nature and, yet, please bear with me, this tiny model is capable of generating, on demand, all conceivable counterfactuals: Y(0),Y(1), Y_x, Y_{127}, X_z, Z(X(y)) etc. on and on. Moreover, every time you compute these potential outcomes using Eq. (1) they will obey the consistency rule, and their probabilities will obey the laws of probability calculus and the graphoid axioms. And, if your model justifies “ignorability” or “conditional ignorability,” these too will be respected in the generated counterfactuals. In other words, ignorability conditions need not be postulated as auxiliary constraints to justify the use of available statistical methods; no, they are derivable from your own understanding of how nature operates.

In short, it is a miracle.

Not really! It should be self evident. Couterfactuals must be built on the familiar if we wish to explain why people communicate with counterfactuals starting at age 4 (“Why is it broken?” “Lets pretend we can fly”). The same applies to science; scientists have communicated with counterfactuals for hundreds of years, even though the notation and mathematical machinery needed for handling counterfactuals were made available to them only in the 20th century. This means that the conceptual basis for a logic of counterfactuals resides already within the scientific view of the world, and need not be crafted from scratch; it need not divorce itself from the scientific view of the world. It surely should not divorce itself from scientific knowledge, which is the source of all valid assumptions, or from the format in which scientific knowledge is stored, namely, SEM.

Here I am referring to people who claim that potential outcomes are not explicitly represented in SEM, and explicitness is important. First, this is not entirely true. I can see (Y(0), Y(1)) in the SEM graph as explicitly as I see whether ignorability holds there or not. [See, for example, Fig. 11.7, page 343 in Causality]. Second, once we accept SEM as the origin of potential outcomes, as defined by Eq. (1), counterfactual expressions can enter our mathematics proudly and explicitly, with all the inferential machinery that the First Law dictates. Third, consider by analogy the teaching of calculus. It is feasible to teach calculus as a stand-alone symbolic discipline without ever mentioning the fact that y'(x) is the slope of the function y=f(x) at point x. It is feasible, but not desirable, because it is helpful to remember that f(x) comes first, and all other symbols of calculus, e.g., f'(x), f”(x), [f(x)/x]’, etc. are derivable from one object, f(x). Likewise, all the rules of differentiation are derived from interpreting y'(x) as the slope of y=f(x).

Where am I heading?
First, I would have liked to convince potential outcome enthusiasts that they are doing harm to their students by banning structural equations from their discourse, thus denying them awareness of the scientific basis of potential outcomes. But this attempted persuasion has been going on for the past two decades and, judging by the recent exchange with Guido Imbens (link), we are not closer to an understanding than we were in 1995. Even an explicit demonstration of how a toy problem would be solved in the two languages (link) did not yield any result.

Second, I would like to call the attention of SEM practitioners, including of course econometricians, quantitative psychologists and political scientists, and explain the significance of Eq. (1) in their fields. To them, I wish to say: If you are familiar with SEM, then you have all the mathematical machinery necessary to join the ranks of modern causal analysis; your SEM equations (hopefully in nonparametric form) are the engine for generating and understanding counterfactuals.; True, your teachers did not alert you to this capability; it is not their fault, they did not know of it either. But you can now take advantage of what the First Law of causal inference tells you. You are sitting on a gold mine, use it.

Finally, I would like to reach out to authors of traditional textbooks who wish to introduce a chapter or two on modern methods of causal analysis. I have seen several books that devote 10 chapters on SEM framework: identification, structural parameters, confounding, instrumental variables, selection models, exogeneity, model misspecification, etc., and then add a chapter to introduce potential outcomes and cause-effect analyses as useful new comers, yet alien to the rest of the book. This leaves students to wonder whether the first 10 chapters were worth the labor. Eq. (1) tells us that modern tools of causal analysis are not new comers, but follow organically from the SEM framework. Consequently, one can leverage the study of SEM to make causal analysis more palatable and meaningful.

Please note that I have not mentioned graphs in this discussion; the reason is simple, graphical modeling constitutes The Second Law of Causal Inference.

Enjoy both,

November 16, 2014

On DAGs, Instruments and Social Networks

Filed under: General — eb @ 4:26 pm

Apropos the lively discussion we have had here on graphs and IV models, Felix Elwert writes:

“Dear Judea, here’s an IV paper using DAGs that we recently published. We use DAGs to evaluate genes as instrumental variables for causal peer effects in social networks. The substantive question is whether obesity is contagious among friends. We evaluated both single-IV and IV-set candidates. We found DAGs especially helpful for evaluating variations within a large class of qualitative data-generating processes by playing through a lots of increasingly realistic variations of our main DGPs (Table 1). Being able to discuss identification purely in terms of qualitative causal statements (e.g., “fat genes may affect latent causes of friendship formation”) was very helpful. One of our goals was to check far one can relax the DGP before identification would break down. After permitting a host of potential hurdles (i.e., eliminating exclusions left and right by drawing lots of additional arrows), we concluded that genes alone won’t realistically work, but that time-varying gene expression might give valid IVs for peer effects. Under linearity, we found suggestive evidence for the transmission of obesity in social networks.”

All the best,


November 9, 2014

Causal inference without graphs

Filed under: Counterfactual,Discussion,Economics,General — moderator @ 3:45 am

In a recent posting on this blog, Elias and Bryant described how graphical methods can help decide if a pseudo-randomized variable, Z, qualifies as an instrumental variable, namely, if it satisfies the exogeneity and exclusion requirements associated with the definition of an instrument. In this note, I aim to describe how inferences of this type can be performed without graphs, using the language of potential outcome. This description should give students of causality an objective comparison of graph-less vs. graph-based inferences. See my exchange with Guido Imbens [here].

Every problem of causal inference must commence with a set of untestable, theoretical assumptions that the modeler is prepared to defend on scientific grounds. In structural modeling, these assumptions are encoded in a causal graph through missing arrows and missing latent variables. Graphless methods encode these same assumptions symbolically, using two types of statements:

1. Exclusion restrictions, and
2. Conditional independencies among observable and potential outcomes.

For example, consider the causal Markov chain which represents the structural equations:

with and being omitted factors such that X, , are mutually independent.

These same assumptions can also be encoded in the language of counterfactuals, as follows:

(3) represents the missing arrow from X to Z, and (4)-(6) convey the mutual independence of X, , and .
[Remark: General rules for translating graphical models to counterfactual notation are given in Pearl (2009, pp. 232-234).]

Assume now that we are given the four counterfactual statements (3)-(6) as a specification of a model; What machinery can we use to answer questions that typically come up in causal inference tasks? One such question is, for example, is the model testable? In other words, is there an empirical test conducted on the observed variables X, Y, and Z that could prove (3)-(6) wrong? We note that none of the four defining conditions (3)-(6) is testable in isolation, because each invokes an unmeasured counterfactual entity. On the other hand, the fact the equivalent graphical model advertises the conditional independence of X and Z given Y, X _||_ Z | Y, implies that the combination of all four counterfactual statements should yield this testable implication.

Another question often posed to causal inference is that of identifiability, for example, whether the
causal effect of X on Z is estimable from observational studies.

Whereas graphical models enjoy inferential tools such as d-separation and do-calculus, potential-outcome specifications can use the axioms of counterfactual logic (Galles and Pearl 1998, Halpern, 1998) to determine identification and testable implication. In a recent paper, I have combined the graphoid and counterfactual axioms to provide such symbolic machinery (link).

However, the aim of this note is not to teach potential outcome researchers how to derive the logical consequences of their assumptions but, rather, to give researchers the flavor of what these derivation entail, and the kind of problems the potential outcome specification presents vis a vis the graphical representation.

As most of us would agree, the chain appears more friendly than the 4 equations in (3)-(6), and the reasons are both representational and inferential. On the representational side we note that it would take a person (even an expert in potential outcome) a pause or two to affirm that (3)-(6) indeed represent the chain process he/she has in mind. More specifically, it would take a pause or two to check if some condition is missing from the list, or whether one of the conditions listed is redundant (i.e., follows logically from the other three) or whether the set is consistent (i.e., no statement has its negation follows from the other three). These mental checks are immediate in the graphical representation; the first, because each link in the graph corresponds to a physical process in nature, and the last two because the graph is inherently consistent and non-redundant. As to the inferential part, using the graphoid+counterfactual axioms as inference rule is computationally intractable. These axioms are good for confirming a derivation if one is proposed, but not for finding a derivation when one is needed.

I believe that even a cursory attempt to answer research questions using (3)-(5) would convince the reader of the merits of the graphical representation. However, the reader of this blog is already biased, having been told that (3)-(5) is the potential-outcome equivalent of the chain X—>Y—>Z. A deeper appreciation can be reached by examining a new problem, specified in potential- outcome vocabulary, but without its graphical mirror.

Assume you are given the following statements as a specification.

It represents a familiar model in causal analysis that has been throughly analyzed. To appreciate the power of graphs, the reader is invited to examine this representation above and to answer a few questions:

a) Is the process described familiar to you?
b) Which assumption are you willing to defend in your interpretation of the story.
c) Is the causal effect of X on Y identifiable?
d) Is the model testable?

I would be eager to hear from readers
1. if my comparison is fair.
2. which argument they find most convincing.

Powered by WordPress