On Imbens’s Comparison of Two Approaches to Empirical Economics
Many readers have asked for my reaction to Guido Imbens’s recent paper, titled, “Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics,” arXiv.19071v1 [stat.ME] 16 Jul 2019.
The note below offers brief comments on Imbens’s five major claims regarding the superiority of potential outcomes [PO] vis a vis directed acyclic graphs [DAGs].
These five claims are articulated in Imbens’s introduction (pages 1-3). [Quoting]:
” … there are five features of the PO framework that may be behind its current popularity in economics.”
I will address them sequentially, first quoting Imbens’s claims, then offering my counterclaims.
I will end with a comment on Imbens’s final observation, concerning the absence of empirical evidence in a “realistic setting” to demonstrate the merits of the DAG approach.
Before we start, however, let me clarify that there is no such thing as a “DAG approach.” Researchers using DAGs follow an approach called Structural Causal Model (SCM), which consists of functional relationships among variables of interest, and of which DAGs are merely a qualitative abstraction, spelling out the arguments in each function. The resulting graph can then be used to support inference tools such as d-separation and do-calculus. Potential outcomes are relationships derived from the structural model and several of their properties can be elucidated using DAGs. These interesting relationships are summarized in chapter 7 of (Pearl, 2009a) and in a Statistical Survey overview (Pearl, 2009c)
Imbens’s Claim # 1
“First, there are some assumptions that are easily captured in the PO framework relative to the DAG approach, and these assumptions are critical in many identification strategies in economics. Such assumptions include
monotonicity ([Imbens and Angrist, 1994]) and other shape restrictions such as convexity or concavity ([Matzkin et al.,1991, Chetverikov, Santos, and Shaikh, 2018, Chen, Chernozhukov, Fernández-Val, Kostyshak, and Luo, 2018]). The instrumental variables setting is a prominent example, and I will discuss it in detail in Section 4.2.”
Pearl’s Counterclaim # 1
It is logically impossible for an assumption to be “easily captured in the PO framework” and not simultaneously be “easily captured” in the “DAG approach.” The reason is simply that the latter embraces the former and merely enriches it with graph-based tools. Specifically, SCM embraces the counterfactual notation Yx that PO deploys, and does not exclude any concept or relationship definable in the PO approach.
Take monotonicity, for example. In PO, monotonicity is expressed as
Yx (u) ≥ Yx’ (u) for all u and all x > x’
In the DAG approach it is expressed as:
Yx (u) ≥ Yx’ (u) for all u and all x > x’
(Taken from Causality pages 291, 294, 398.)
The two are identical, of course, which may seem surprising to PO folks, but not to DAG folks who know how to derive the counterfactuals Yx from structural models. In fact, the derivation of counterfactuals in
terms of structural equations (Balke and Pearl, 1994) is considered one of the fundamental laws of causation in the SCM framework see (Bareinboim and Pearl, 2016) and (Pearl, 2015).
Imbens’s Claim # 2
“Second, the potential outcomes in the PO framework connect easily to traditional approaches to economic models such as supply and demand settings where potential outcome functions are the natural primitives. Related to this, the insistence of the PO approach on manipulability of the causes, and its attendant distinction between non-causal attributes and causal variables has resonated well with the focus in empirical work on policy relevance ([Angrist and Pischke, 2008, Manski, 2013]).”
Pearl’s Counterclaim #2
Not so. The term “potential outcome” is a late comer to the economics literature of the 20th century, whose native vocabulary and natural primitives were functional relationships among variables, not potential outcomes. The latters are defined in terms of a “treatment assignment” and hypothetical outcome, while the formers invoke only observable variables like “supply” and “demand”. Don Rubin cited this fundamental difference as sufficient reason for shunning structural equation models, which he labeled “bad science.”
While it is possible to give PO interpretation to structural equations, the interpretation is both artificial and convoluted, especially in view of PO insistence on manipulability of causes. Haavelmo, Koopman and Marschak would not hesitate for a moment to write the structural equation:
Damage = f (earthquake intensity, other factors).
PO researchers, on the other hand, would spend weeks debating whether earthquakes have “treatment assignments” and whether we can legitimately estimate the “causal effects” of earthquakes. Thus, what Imbens perceives as a helpful distinction is, in fact, an unnecessary restriction that suppresses natural scientific discourse. See also (Pearl, 2018; 2019).
Imbens’s Claim #3
“Third, many of the currently popular identification strategies focus on models with relatively few (sets of) variables, where identification questions have been worked out once and for all.”
Pearl’s Counterclaim #3
First, I would argue that this claim is actually false. Most IV strategies that economists use are valid “conditional on controls” (see examples listed in Imbens (2014)) and the criterion that distinguishes “good controls” from “bad controls” is not trivial to articulate without the help of graphs. (See, A Crash Course in Good and Bad Control). It can certainly not be discerned “once and for all”.
Second, even if economists are lucky to guess “good controls,” it is still unclear whether they focus on relatively few variables because, lacking graphs, they cannot handle more variables, or do they refrain from using graphs to hide the opportunities missed by focusing on few pre-fabricated, “once and for all” identification strategies.
I believe both apprehensions play a role in perpetuating the graph-avoiding subculture among economists. I have elaborated on this question here: (Pearl, 2014).
Imbens’s Claim # 4
“Fourth, the PO framework lends itself well to accounting for treatment effect heterogeneity in estimands ([Imbens and Angrist, 1994, Sekhon and Shem-Tov, 2017]) and incorporating such heterogeneity in estimation and the design of optimal policy functions ([Athey and Wager, 2017, Athey, Tibshirani, Wager, et al., 2019, Kitagawa and Tetenov, 2015]).”
Pearl’s Counterclaim #4
Indeed, in the early 1990s, economists felt ecstatic liberating themselves from the linear tradition of structural equation models and finding a framework (PO) that allowed them to model treatment effect heterogeneity.
However, whatever role treatment heterogeneity played in this excitement should have been amplified ten-fold in 1995, when completely non parametric structural equation models came into being, in which non-linear interactions and heterogeneity were assumed a priori. Indeed, the tools developed in the econometric literature cover only a fraction of the treatment-heterogeneity tasks that are currently managed by SCM. In particular, the latter includes such problems as “necessary and sufficient” causation, mediation, external validity, selection bias and more.
Speaking more generally, I find it odd for a discipline to prefer an “approach” that rejects tools over one that invites and embraces tools.
Imbens’s claim #5
“Fifth, the PO approach has traditionally connected well with design, estimation, and inference questions. From the outset Rubin and his coauthors provided much guidance to researchers and policy makers for practical implementation including inference, with the work on the propensity score ([Rosenbaum and Rubin, 1983b]) an influential example.”
Pearl’s Counterclaim #5
The initial work of Rubin and his co-authors has indeed provided much needed guidance to researchers and policy makers who were in a state of desperation, having no other mathematical notation to express causal questions of interest. That happened because economists were not aware of the counterfactual content of structural equation models, and of the non-parametric extension of those models.
Unfortunately, the clumsy and opaque notation introduced in this initial work has become a ritual in the PO framework that has prevailed, and the refusal to commence the analysis with meaningful assumptions has led to several blunders and misconceptions. One such misconception has been propensity score analysis which researchers have taken as a tool for reducing confounding bias. I have elaborated on this misguidance in Causality, Section 11.3.5, “Understanding Propensity Scores” (Pearl, 2009a).
Imbens’s final observation: Empirical Evidence
“Separate from the theoretical merits of the two approaches, another reason for the lack of adoption in economics is that the DAG literature has not shown much evidence of the benefits for empirical practice in settings that are important in economics. The potential outcome studies in MACE, and the chapters in [Rosenbaum, 2017], CISSB and MHE have detailed empirical examples of the various identification strategies proposed. In realistic settings they demonstrate the merits of the proposed methods and describe in detail the corresponding estimation and inference methods. In contrast in the DAG literature, TBOW, [Pearl, 2000], and [Peters, Janzing, and Schölkopf, 2017] have no substantive empirical examples, focusing largely on identification questions in what TBOW refers to as “toy” models. Compare the lack of impact of the DAG literature in economics with the recent embrace of regression discontinuity designs imported from the psychology literature, or with the current rapid spread of the machine learning methods from computer science, or the recent quick adoption of synthetic control methods [Abadie, Diamond, and Hainmueller, 2010]. All came with multiple concrete examples that highlighted their benefits over traditional methods. In the absence of such concrete examples the toy models in the DAG literature sometimes appear to be a set of solutions in search of problems, rather than a set of solutions for substantive problems previously posed in social sciences.”
Pearl’s comments on: Empirical Evidence
There is much truth to Imbens’s observation. The PO excitement that swept natural experimentalists in the 1990s came with outright rejection of graphical models. The hundreds, if not thousands, of empirical economists who plunged into empirical work, were warned repeatedly that graphical models may be “ill-defined,” “deceptive,” and “confusing,” and structural models have no scientific underpinning (see (Pearl, 1995; 2009b)). Not a single paper in the econometric literature has acknowledged the existence of SCM as an alternative or complementary approach to PO.
The result has been the exact opposite of what has taken place in epidemiology where DAGs became a second language to both scholars and field workers, [Due in part to the influential 1999 paper by Greenland, Pearl and Robins.] In contrast, PO-led economists have launched a massive array of experimental programs lacking graphical tools for guidance. I would liken it to a Phoenician armada exploring the Atlantic coast in leaky boats and no compass to guide its way.
This depiction might seem pretentious and overly critical, considering the pride with which natural experimentalists take in the results of their studies (though no objective verification of validity can be undertaken.) Yet looking back at the substantive empirical examples listed by Imbens, one cannot but wonder how much more credible those studies could have been with graphical tools to guide the way. These include a friendly language to communicate assumptions, powerful means to test their implications, and ample opportunities to uncover new natural experiments (Brito and Pearl, 2002).
Summary and Recommendation
The thrust of my reaction to Imbens’s article is simple:
It is unreasonable to prefer an “approach” that rejects tools over one that invites and embraces tools.
Technical comparisons of the PO and SCM approaches, using concrete examples, have been published since 1993 in dozens of articles and books in computer science, statistics, epidemiology, and social science, yet none in the econometric literature. Economics students are systematically deprived of even the most elementary graphical tools available to other researchers, for example, to determine if one variable is independent of another given a third, or if a variable is a valid IV given a set S of observed variables.
This avoidance can no longer be justified by appealing to “We have not found this [graphical] approach to aid the drawing of causal inferences” (Imbens and Rubin, 2015, page 25).
To open an effective dialogue and a genuine comparison between the two approaches, I call on Professor Imbens to assume leadership in his capacity as Editor in Chief of Econometrica and invite a comprehensive survey paper on graphical methods for the front page of his Journal. This is how creative editors move their fields forward.
Brito, C. and Pearl, J. “General instrumental variables,” In A. Darwiche and N. Friedman (Eds.), Uncertainty in Artificial Intelligence, Proceedings of the Eighteenth Conference, Morgan Kaufmann: San Francisco, CA, 85-93, August 2002.
Bareinboim, E. and Pearl, J. “Causal inference and the data-fusion problem,” Proceedings of the National Academy of Sciences, 113(27): 7345-7352, 2016.
Greenland, S., Pearl, J., and Robins, J. “Causal diagrams for epidemiologic research,” Epidemiology, Vol. 1, No. 10, pp. 37-48, January 1999.
Pearl, J. “Causal diagrams for empirical research,” (With Discussions), Biometrika, 82(4): 669-710, 1995.
Pearl, J. “Understanding Propensity Scores” in J. Pearl’s Causality: Models, Reasoning, and Inference, Section 11.3.5, Second edition, NY: Cambridge University Press, pp. 348-352, 2009a.
Pearl, J. “Myth, confusion, and science in causal analysis,” University of California, Los Angeles, Computer Science Department, Technical Report R-348, May 2009b.
Pearl, J. “Causal inference in statistics: An overview” Statistics Surveys, Vol. 3, 96–146, 2009c.
Pearl, J. “Are economists smarter than epidemiologists? (Comments on Imbens’s recent paper),” Causal Analysis in Theory and Practice Blog, October 27, 2014.
Pearl, J. “Trygve Haavelmo and the Emergence of Causal Calculus,” Econometric Theory, 31: 152-179, 2015.
Pearl, J. “Does obesity shorten life? Or is it the Soda? On non-manipulable causes,” Journal of Causal Inference, Causal, Casual, and Curious Section, 6(2), online, September 2018.
Pearl, J. “On the interpretation of do(x),” Journal of Causal Inference, Causal, Casual, and Curious Section, 7(1), online, March 2019.
Hi Judea,
Many interesting comments here, thanks for this post. If you will humour me, I want to take up just one of your points, since it is one that comes up often and we touched on it recently on twitter.
You write:
“The term “potential outcome” is a late comer to the economics literature of the 20th century, whose native vocabulary and natural primitives were functional relationships among variables, not potential outcomes. The latters are defined in terms of a “treatment assignment” and hypothetical outcome, while the formers invoke only observable variables like “supply” and “demand”.”
That’s not correct. Supply and demand are not observable, they encode potential outcomes are causal concepts, not merely functional relations.
Here is a quick primer on the basic model:
There are many firms and many consumers, all of whom take the price of some good as parametric. At any price P, each consumer decides how much to purchase. The sum of those purchases is D(P). This function has a causal interpretation: if the price is changed to P’, then the causal effect on units demanded is [ D(P’) – D(P) ]. Put another way, this schedule represents a continuum of potential outcomes, one for each potential price P. Of course, all but one of these potential outcomes are not observed.
Similarly, at every potential price P, each firm decides how much it wants to produce, and the sum of production is supply, S(P). Supply is likewise a causal object representing a continuum of potential outcomes.
Finally, an equilibrium condition is imposed: through some unmodeled emergent process, the realized price, P*, satisfies D(P*) = S(P*). A linear version is then,
D = a + bP
S = c + dP
D = S.
Note this is not a cyclical relation, it’s not something like `consumers choose Q (which affects P) and firms choose P (which affects Q),’ despite the impression some econometrics textbooks give.
A well-known (relatively) modern treatment of this sort of model, cast explicitly in terms of potential outcomes, can be found in Angrist, Graddy, and Imbens (2000):
https://academic.oup.com/restud/article/67/3/499/1547484
but it is important to emphasize that adding the jargon “potential outcomes” doesn’t change the core (causal) concepts, which go back to the 19th century.
On Twitter you suggested that this model can be represented graphically using an “=” operator, which I haven’t seen before and would be interested in hearing more about.
Thanks, Chris.
Comment by Chris Auld — December 17, 2020 @ 6:21 pm
Hi Chris,
We agree conceptually on the “schedule” interpretation of structural equations, but we differ on notation. The modern notation is:
D := a + bP
S := c + dP
Where D, S and P are observed variables and := is an assignment operator, to be distinguished from algebraic equality. It means: An agent (possibly Nature)
observes P and, accordingly, assigns D the value a + bP.
[In your example, each consumer observes the realized price P and decides how much to purchase. At any given time, the consumer does not observe “potential price P(D), nor
can he determine the “potential demand” D(P)”]
If we wish to translate this assignment process to potential outcome notation, we get (according to the First Law):
D(p) = a + bp for every P=p
But there is no real need to translate.
We can leave it the way economists wrote it (eg. Haavelmo 1943) and the way we think
about it naturally, namely, using observed variables in the equations, and interpreting the equality sign as an assignment operator. This is precisely the role that arrows play in Sewall Wright’s path diagrams.
The notation := (for assignment) is standard in computer science and, as far as I know, was first used in economics by Nancy Cartwright (cant find the first reference)
Comment by Judea Pearl — January 20, 2021 @ 12:48 am
Thanks for your response, Judea.
I don’t think I understand the distinction you’re drawing, however. I said that this equation:
D = a + bP
represents potential outcomes. You disagree, if I read you correctly. You say instead that this equation represents potential outcomes:
D(p) = a + bp for every P=p
which I don’t understand. How are these not equivalent?
Perhaps you mean the equation I wrote only applies to observables, hence “the consumer does not observe “potential price P(D), nor can he determine the “potential demand” D(P)” and “economists wrote it… using observed variables in the equations.”
Suppose that price could take only one of two values, P_0 or P_1. Then we read off quantity demanded from the demand equation at these two prices: it is D_0 = a + bP_0 or D_1 = a + bP_1. The people in the model, as it were, can decide how much they’d like to purchase in the each possible world. D_0 and D_1 are potential outcomes, and the causal effect of changing price from P_0 to P_1 on quantity demanded is their difference. This is exactly the Neyman-Rubin notion of potential outcomes.
That is, I would say that the demand equation tells us the quantity demanded at any price, not just at observed prices. Is this where we disagree?
Thanks, Chris.
Comment by Chris Auld — January 21, 2021 @ 6:08 pm