Causal Analysis in Theory and Practice

October 26, 2013

Comments on Kenny’s Summary of Causal Mediation

Filed under: Counterfactual,Indirect effects,Mediated Effects — moderator @ 12:00 am

David Kenny’s website <> has recently been revised to include a section on the Causal Inference Approach to Mediation. As many readers know, Kenny has pioneered mediation analysis in the social sciences through his seminal papers with Judd (1981) and Baron(1986) and has been an active leader in this field. His original approach, often referred to as the “Baron and Kenny (BK) approach,” is grounded in conservative Structural Equation Modeling (SEM) analysis, in which causal relationships are asserted with extreme caution and the boundaries between statistical and causal notions vary appreciably among researchers.

It is very significant therefore that Kenny has decided to introduce causal mediation analysis to the community of SEM researchers which, until very recently, felt alienated from recent advances in causal mediation analysis, primarily due to the counterfactual vocabulary in which it was developed and introduced. With Kenny’s kind permission, I am posting his description below, because it is one of the few attempts to explain causal inference in the language of traditional SEM mediation analysis and, thus, it may serve to bridge the barriers between the two communities.

Next you can find Kenny’s new posting, annotated with my comments. In these comments, I have attempted to further clarify the bridges between the two cultures; the “traditional” and the “causal.” I will refer to the former as “BK” (for Baron and Kenny) and to the latter as “causal” (for lack of a better word) although, conceptually, both BK and SEM are fundamentally causal.

Causal Inference Approach to Mediation
(a section from Kenny’s website)

A group of researchers have developed an approach that has several emphases that are different from the traditional SEM approach, the approach that is emphasized on this page. The approach is commonly called the Causal Inference approach, and I provide here a brief and relatively non-technical summary in which I attempt to explain the approach to those more familiar with Structural Equation Modeling. Robins and Greenland (1992) conceptualized the approach and more recent papers within this tradition are Pearl (2001; 2011) and Imai et al. (2010). Somewhat more accessible is the paper by Valeri and VanderWeele (2013). Unfortunately, SEMers know relatively little about this approach and, I believe also that Causal Inference researchers fail to appreciate the insights of SEM.

The Causal Inference Approach uses the same basic causal structure (see diagram) as the SEM approach, albeit usually with different symbols for variables and paths. The two key differences are that the relationships between variables need not be linear and the variables need not be interval. In fact, typically the variables of X, Y, and M are presumed to be binary and that X and M are presumed to interact to cause Y.

Similar to SEM, the Causal Inference approach attempts to develop a formal basis for causal inference in general and mediation in particular.

The term “attempts” is a relic of the days when SEM researchers disavowed any connection to causation and, to protect themselves from criticism, had to qualify claims as “attempts”. Today we know that SEM in fact provides a formal basis for causal inference; no other formalism can compete with SEM’s clarity, coherence, and precision (Pearl, 2009, chapter 7; pp. 368-374; Bollen and Pearl, 2013).

Typically counterfactuals or potential outcomes are used. The potential outcome for person i on Y for whom X = 1 would be denoted as Yi(1). The potential outcome of Yi(0) can be defined even though person i did not score 0 on X. Thus, it is a potential outcome or a counterfactual. The averages of these potential outcomes across persons are denoted as E[Y(0)] and E[Y(1)]. To an SEM modeler, potential outcomes can be viewed as predicted values of a structural equation. Consider
the “Step 1” structural equation:
Yi = d + c Xi + ei
If for individual i for whom Xi equals 1, then Yi(1) = d + c + ei equals his or her score on Y. We can determine what the score of person i would have been had his or her score on Xi been equal to 0, i.e., the potential outcome for person i, by taking the structural equation and setting Xi to zero to yield d + ei. Although the term is new, potential outcomes are not really new to SEMers. They simply equal the predicted value for endogenous variable, once we fix the values of its causal variables. The Causal Inference approach also employs directed acyclic graphs or DAGs, which are similar to, though not identical to, path diagrams. DAGs typically do not include disturbances but they are implicit. The curved lines of path diagrams between exogenous variables are also not drawn but are implicit.

DAGs and path diagrams are essentially the same. DAGs do include disturbances, but only when they are correlated with other disturbances. Otherwise, they are redundant and omitted from the DAG. The curved lines of path diagrams between exogenous variables are shown in DAGs (as in my book) whenever those variables are correlated.  Curved lines are also used between endogenous variables whenever the disturbances of those variables are correlated. See (Bollen and Pearl, 2013) for comparison.


Earlier, the assumptions necessary for mediation were stated using
structural equation modeling terms. Within the Causal Inference approach, there are essentially the same assumptions, but they are stated somewhat differently. Note that the term confounder is used where earlier the term omitted
variable was used.

Condition 1: No unmeasured confounding of the XY relationship; that is, any variable that causes both X and Y must be included in the model.

Condition 2: No unmeasured confounding of the MY relationship.

Condition 3: No unmeasured confounding of the XM relationship.

Condition 4: Variable X must not cause any confounder of the MY relationship.

This fourth condition is added because certain effects can be estimated without making this assumption and other effects require this assumption. Note also that these assumptions are sufficient but not necessary. That is, if these conditions are met the mediational paths are identified, but there are some special cases where mediational paths are identified even if the assumptions are violated (Pearl, 2013). For instance, consider the case that M ← Z1 ← Z2 → Y but Z1 and not Z2 is measured and included in the model. Note that Z2 is a MY confounder and thus violates Condition 2, but it is sufficient to control for only Z1.

It is for this reason that I prefer the phrase “There is a set of measured variables that deconfounds the XY relationship”, instead of “No unmeasured confounding of the XY relationship”. But this is only one of the reasons why Conditions 1-4  above are much too stringent, not only in special cases.  In particular, Condition 1 and 3 are not necessary because identification can be achieved even in cases where the XY and XM relationships remain confounded (Pearl, 2013).

The Causal Inference approach emphasizes sensitivity analyses: These are analyses that ask the question such as, “What would happen to the results if there was a MY confounder that had both a moderate effect on M and Y?” SEMers would benefit by considering these analyses more often.

Definitions of the Direct, Indirect, and Total Effects

Because effects involve variables not necessarily at the interval level and because interactions are allowed, the direct, indirect, and total effects need to be redefined. These effects are defined using counterfactuals, not using structural equations.

This difference is purely notational; the definitions can easily be converted to structural equations. As Kenny clearly shows, counterfactuals (or potential outcomes) are merely a short hand notation for structural equations; instead of carrying the entire equations Yi(Xi = 1) = d +c +ei, and Yi(Xi = 0) = d + ei, we abbreviate them with Yi(1) and Yi(0) and keep in mind where they came from. There is nothing else to it — potential outcome are simply structural equations abbreviated. In my 2013 paper,<> I show explicitly the SEM formulation (of mediation) side by side the counterfactual formulation.

Recall from above that for person i, it can be asked: What would i’s score on Y be if i had scored 0 on X? That value, called the potential outcome, is denoted Yi(0). The population average of these potential outcomes across persons is denoted as E[Y(0)]. We can then define the effect of X on Y as
E[Y(1)] – E[Y(0)]
This looks strange to an SEMer,

Again, the strangeness disappears when we realize that the two potential outcomes, Y(0) and Y(1), are but structural equations abbreviated.

but it is useful to remember effects can be viewed as a difference between what the outcome would be when the causal variable differs by one unit. Consider path c in mediation. We can view c as the difference between what it would be expected that Y would equal when X was 1 and equal to 0, the difference between the two potential outcomes, E[Y(1)] – E[Y(0)].

In the Causal Inference approach, there is the Controlled Direct Effect or CDE for the mediator equal to a particular value, denoted as M (not to be confused with the variable M):
CDE(M) = E[Y(1,M)] – E[Y(0,M)]
where M is a particular value of the mediator. (Note that it is E[Y(1,M)] and not E[Y(1|M)], the expected value of Y given that X equals 1 “controlling for M.” The variable M is not “fixed” or “conditioned” in this approach.

This is the key, and most profound difference between the BK and the causal approach; it requires a longer comment to explain (below), and can be skipped by readers familiar with the difference between “fixing” and “conditioning.”

Examine the basic mediation model (Fig. 1) with M mediating between X and Y. Why are we tempted to “control” for M when we wish to estimate the direct effect of X on Y? The reason is that, if we succeed in preventing M from changing then whatever changes we measure in Y are attributable solely to variations in X and we are justified then in proclaiming the effect observed as “direct effect of X on Y”. Unfortunately, the language of probability theory does not possess the notation to express the idea of “preventing M from changing” or “physically holding M constant”. The only operator probability allows us to use is “conditioning” which is what we do when we “control for M” in the conventional way. In other words, instead of  physically holding M constant (say at M = m) and comparing Y for units under X=1 to those under X = 0, we allow M to vary  but ignore all units except those in which M achieves a given value M=m. These two operations are totally different, and give totally different results, except in the case of no omitted variables. To illustrate, assume that there is a latent variable L causing both M and Y and, to simplify the discussion, assume that the structural equations are Y = 0 * X + 0 * M + L and M = X + L. Obviously, the direct effect of X on Y in this case is zero, but this is not what we would get if we “control for M” and compare subjects under X = 1 and M = 0 to those under X = 0 and M = 0. In the former group we would find Y = L = M – X = 0 – 1 = -1 whereas in the latter group we would find Y = L = M-X = 0 -0 = 0. In other words, we are comparing apples and oranges (i.e., subjects for which L = -1 to those with L = 0) and, not surprisingly, we obtain an erroneous estimate of (-1) for a direct effect that, in reality is zero.

Now let us examine now what we obtain from the counterfactual expression

CDE(M) = E[Y(1,M)] – E[Y(0,M)]

for M = 0 (same for M = 1). Substituting the structural equation for the counterfactuals, we get

CDE(M = 0) = E[Y(1,0)] – E[Y(0,0)]
= E[0 * 1 + 0 * 0 + L] – E[0 * 0 + 0 * 0 + L]
= E[L – L] = 0

as expected. The reason we obtained the correct result is that we simulated correctly what we set out to do, namely, to physically hold M constant, rather than “conditioning on M”. In the former case L is kept constant, because the physical operation of holding M constant does not affect L  (L is a cause of M). In the latter, when we “condition” on a constant M, L must vary to satisfy the equation M = X + L. In short, counterfactual conditioning reflects a physical intervention while statistical conditioning reflects passive observation. To avoid confusion between the two, I used the notation E{Y|do(X = x)] as distinguished from ordinary conditional expectation, E[Y|X = x] (Pearl, 2009, chapter 3).

The habit of translating “hold M constant” into “condition on M” is deeply entrenched in the statistical culture (see Lindley 2002) and is responsible for the lingering confusion between regression and structural equations (Chen and Pearl, 2013). This habit is a consequence, not of a deliberate negligence of statisticians but of the coarseness of their language (probability theory) which fails to provide an appropriate operator for “holding M constant.”  Absent such operator, statisticians were pressed to use the only operator available to them: conditioning, and a century of confusion came into being.

Traditional mediation analysts of the BK school were not unaware of the dangers lurking from conditioning (Judd and Kenny 1981; 2010). However, lacking an appropriate operator for “fixing M,” they settled  on “restricted conditioning.” They defined direct effect as

c’ = E[Y|X = 1,M = 0)] − E[Y|X = 0,M = 0)]

and accompanied this definition with a warning that it is valid only under the assumption of no omitted variables.

What causal analysts have discovered is that the operator needed for “fixing M”, do(M = m) or Y(1,M),  while undefinable in probability theory, is well defined in SEM. (Pearl 1993b, Balke and Pearl 1995a). Thus, what they have been telling SEM traditionalists is the following: “Fear not the `fixing’ operator that you have in mind when you say: `control,’ it is a mathematical operation that by now is well defined and well explored, and permits researchers to express what they really mean using CDE(M) = Y(1,M) − Y(0,M). In other words, if you want to `fix M,’ either plug M in the antecedent of the counterfactual, or write E(Y|do(X = x),do(M = m)].” I use both notations interchangeably, the former for population effects, the
latter for individual effects. The formal counterfactual treatment of direct and indirect effects owes its development to this notational provision and its SEM semantics (Pearl 2001, 2013).

If X and M interact the CDE (M) changes for different values of M. To obtain a single measure of the direct effect, several different suggestions have been made.

Although the suggestions are different, all of these measures are called “Natural.”

As the one who coined the term “Natural”, I wish to clarify the motivations, and explain why it was not chosen only for the convenience of obtaining a single measure. (We can easily obtain a single number by averaging over M.) Natural effects define something more meaningful than fixing M uniformly over the entire population, which is artificial and, in many cases, does not represent policy options. The word “natural” came from my reading how law makers define  “discrimination” (e.g., in salary or hiring). They compare the salary of an individual to what it “would have been” had he/she not been a male or a member of minority group, but everything else would be the same. This is where the word “Natural” came from; the need exists to compare the expected outcome under treatment, to the expected outcome under no  treatment while “freezing” M at the level each individual had under “natural” condition, e.g., under no treatment. In other words, we allow M to vary from individual to individual in a “natural” way, i.e., as it is distributed naturally in the population, prior to treatment. This need was first recognized by Robins and Greenland (1992) and then formulated mathematically in Pearl (2001).

I recommend that readers look into the meaning of NIE and TE − NDE as representing two aspects of mediation, sufficient and necessary. The former represents the portion of cases whose response can be “explained” by mediation alone, while the latter represent the portion of cases whose response cannot be explained “but for” mediation. The two are generally not equal.

One idea is a determine the Natural Direct Effect as follows:

NDE = E[Y(1,M0)] − E[Y(0, M0)]

where M0 is M(0) which is the value on the mediator would take if X equals 0.  Thus, within this approach, there needs to be a meaningful “baseline” value for X which becomes the zero value.  For instance, if X is experimental versus control, then the control group would have a score of 0.  However, if X is level of self-esteem, it might be more arbitrary to define the zero value.   The parallel Natural Indirect Effect is defined as

NIE = E[Y(1,M1)] − E[Y(1,M0)]

where M1 is M(1) or the potential outcome for M when X equals 1.  The Total Effect becomes the sum of the two:

TE = NIE + NDE = E[Y(1,M1)] = E[Y(1,M0)] = E[Y(1)] − E[Y(0)]

Some might benefit from Muthe’n’s discussion of these measures of mediation effects in his paper “Applications of Causally Defined Direct and Indirect Effects in Mediation Analysis using SEM in Mplus.”

Note that both CDE and NDE would equal the regression slope or what was earlier called path c’ if the model is linear, assumptions are met, and there is no XM interaction affecting Y,

The beauty of CDE and NDE is that they coincide with path c’ even when some of the assumptions are not met.  Specifically, linearity is all we need; the equality c’ = CDE = NDE holds even when omitted variables are present and even when there is a XM interaction affecting Y. (See Pearl 2012.)

the NIE would equal ab, and the TE would equal ab + c’. In the case in which the specifications made by traditional mediation approach (e.g., linearity, no omitted variables, no XM interaction), the estimates would be the same.  Thus the definition of effects within the Casual Inference approach are more general.

The generality is two folds. First, in linear systems, the counterfactual definition applies to models for which no definition exists in the traditional approach, namely, models in which the assumptions of no omitted variables and no XM interaction do not hold. Second, the counterfactual definition applies to models in which the functional form is unknown and may include arbitrary nonlinear functions with arbitrary interactions including discrete and continuous variables.


J. Pearl


Balke, A. and Pearl, J. (1995). Counterfactuals and policy analysis in structural models. In Uncertainty in Artificial Intelligence 11 (P. Besnard and S. Hanks. eds.). Morgan Kaufmann, San Francisco, 11-18.

Baron, R. and Kenny, D. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology 51 1173-1182.

Bollen, K. and Pearl, J. (2013). Eight myths about causality and structural equation models. In Handbook of Causal Analysis for Social Research (S. Morgan, ed.). Springer, Dordrecht, Netherlands, 301-328.

Chen, B. and Pearl, J. (2013). Regression and Causation: A Critical Examination of Six Econometrics Textbooks. Real-World Economics Review, Issue No. 65, 2-20.

Judd, C. and Kenny, D. (1981). Estimating the Effects of Social Interactions. Cambridge University Press, Cambridge, England.

Judd, C. and Kenny, D. (2010). Data analysis in social psychology: Recent and recurring issues. In The handbook of social psychology (E. Gilbert, S.T. Fiske and G. Lindzey, eds.), 5th ed. McGraw-Hill, Boston, MA 115-139.

Lindley, D.V. (2002). Seeing and Doing: The Concept of Causation. International Statistical Review 70 191-214.

Pearl, J. (1993). Comment: Graphical models, causality and intervention. Statistical Science 8 266-269.

Pearl, J. (2001). Direct and indirect effects. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, San Francisco, CA, 411-420.

Pearl, J. (2009). Causality: Models, Reasoning, and Inference. 2nd ed. Cambridge University Press, New York.

Pearl, J. (2012). The mediation formula: A guide to the assessment of causal pathways in nonlinear models. In Causality: Statistical Perspectives and Applications (C. Berzuini, P. Dawid and L. Bernardinelli, eds.). John Wiley and Sons, Ltd, Chichester, UK, 151-179.

Pearl, J. (2013). Interpretation and Identification of Causal Mediation. UCLA Cognitive Systems Laboratory, Technical Report (R-389), September 2013. Forthcoming, Psychological Methods.

Robins, J. and Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology 3 143-155.

Valeri, L. and VanderWeele, T. (2013). Mediation analysis allowing for exposure-mediator interactions and causal interpretation: Theoretical assumptions and implementation with SAS and SPSS macros. Psychological Methods 18 137-150.

The text of this page (as in 06/19/2014) is available for modification and reuse under the terms of the Creative Commons Attribution-Sharealike 3.0 Unported License and the GNU Free Documentation License (unversioned, with no invariant sections, front-cover texts, or back-cover texts).

Excuse the legal language, but this is what the Wikipedia needs to permit us to post portions of this page. Rolling with the punches.


  1. Extremely helpful Judea. Much appreciated! – Jim Grace

    Comment by Jim Grace — October 30, 2013 @ 2:01 pm

  2. […] take an average over m. We know the pitfalls of this “adjustment” (see Wikipedia, or this blog Oct. 26, 2013, for concrete examples), and we know that these pitfalls led to CM, so let us go to the magical […]

    Pingback by Causal Analysis in Theory and Practice » Who Needs Causal Mediation? — July 14, 2014 @ 7:51 pm

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress