Counterfactuals in linear systems
What do we know about counterfactuals in linear models?
Here is a neat result concerning the testability of counterfactuals in linear systems.
We know that counterfactual queries of the form P(Yx=y|e) may or may not be empirically identifiable, even in experimental studies. For example, the probability of causation, P(Yx=y|x',y') is in general not identifiable from experimental data (Causality, p. 290, Corollary 9.2.12) when X and Y are binary.1 (Footnote-1: A complete graphical criterion for distinguishing testable from nontestable counterfactuals is given in Shpitser and Pearl (2007, upcoming)).
This note shows that things are much friendlier in linear analysis:
Claim A. Any counterfactual query of the form E(Yx |e) is empirically identifiable in linear causal models, with e an arbitrary evidence.
Claim B. E(Yx|e) is given by
E(Yx|e) = E(Y|e) + T [x – E(X|e)] (1)
where T is the total effect coefficient of X on Y, i.e.,
T = d E[Yx]/dx = E(Y|do(x+1)) – E(Y|do(x)) (2)
Thus, whenever the causal effect T is identified, E(Yx|e) = is identified as well.
Claim A is not surprising. It has been established in generality by Balke and Pearl (1994b) where expressions involving the covariance matrix were used for the various terms in (1).
Claim B offers an intuitively compelling interpretation of (1) that reads as follows: Given evidence e, to calculate E(Yx |e), (i.e., the expectation of Y under the hypothetical assumption that X were x, rather than its current value), first calculate the best estimate of Y conditioned on the evidence e, E(Y|e), then add to it whatever change is expected in Y when X undergoes a forced increase from its current best estimate, E(X|e), to its hypothetical value X=x. That last addition is none other but the effect coefficient T, times the expected change in X, i.e., T[x – E(X|e)]
Note: Eq. (1) can also be written in do(x) notation as
E(Yx|e) = E(Y|e) + E(Y|do(x)) – E[Y|do(X=E(X|e))] (1')
Proof:
(with help from Ilya Shpitser)
Assume, without loss of generality, that we are dealing with a zero-mean model. Since the model is linear, we can write the relation between X and Y as:
Y = TX + I + U (3)
where T is the total effect of X on Y, given in (2), I represents terms containing other variables in the model, nondescendants of X, and U representing exogenous variables.
It is always possible to bring the function determining Y into the form (3) by recursively substituting the functions for each rhs variable that has X as an ancestor, and grouping all the X terms together to form TX. Clearly, T is the Wright-rule sum of the path costs originating from X and ending in Y (Wright, 1921).
From (3) we can write:
Yx = Tx + I + U (4)
since I and U are not affected by hypothetical change from X=x and, moreover,
E(Yx|e) = Tx + E(I+U|e) (5)
since x is a constant.
The last term in (5) can be evaluated by taking expectations on both sides of (3), giving:
E(I+U|x) = EY|e) – TE(X|e) (6)
and, substituted into (5), yields E(Yx|e) = Tx + E(Y|e) – E(X|e) (7)
and proves our target formula (1).
——————– QED
Some Familiar Problems Cast in Linear Outfits
Three Special cases of e are worth noting:
Example-1. e: X =x', Y = y'
(The linear equivalent of the probability of causation) From (1) we obtain directly
E(Yx|Y=y', X=x') = y' + T (x – x')
This is intuitively compelling. The hypothetical expectation of Y is simply the observed value of Y, y', plus the anticipated change in Y due to the change x-x' in X.
Example-2. e: X = x' (effect of treatment on treated)
E(Yx|X=x') = E(Y|x') + T (x – x')
= rx' + T (x – x')
= rx' + E(Y|do(x)) – E(Y|do(x')) where r is the regression coefficient of Y on X.
Example-3. e; Y = y'
(Gee, my temperature is Y=y', what if I had taken x tablets of aspirin. How many did you take? Don't remember.)
E(Yx |Y=y') = y' + T [x – E(X|y')]
= y' + E(Y|do(x)) – E[Y|do(X=r'y')]
where r' is the regression coefficient of X on Y.
Example-4. Let us consider the non-recursive, supply-demand model of page 215 in Causality (2000). Eqs. (7.9)-(7.10) read:
q = b1p + d1i +u1
p = b2q + d2w +u2
Our counterfactual problem (page 216) reads: Given that the current price is P=p0, what would be the expected value of the demand Q if we were to control the price at P = p1? Making the correspondence P = X, Q = Y, e = {P=p0, i, w}, we see that this problem is identical to Example 2 above (effect of treatment on the treated), subject to conditioning on i and w. Hence, since T = b1, we can immediately write
E(Qp1 | p0, i, w) = E(Y|p0,i,w) + b1(p1 – p0)
&
nbsp; = rp p0 + ri i + rw w + b1(p1-p0) (8)
where rp, ri and rw are the coefficient of P, i and w, respectively, in the regression of Q on P, i and w.
Eq. (8) replaces Eq. (7.17) on page (217). Note that the parameters of the price equation
p = b2q + d2w +u2
only enter (8) via the regression coefficients. Thus, they need not be calculated explicitly in case the are estimated directly by least square.
Remark 1:
Example 1 is not really surprising; we know that the probability of causation is empirically identifiable under the assumption of monotonicity (Causality, p. 293). But examples 2 and 3 trigger the following conjecture:
Conjecture
Any counterfactual query of the form P(Yx |e) is empirically identifiable when Y is monotonic relative to X.
It is good to end on a challenging note.
Best wishes,
========Judea Pearl
Dear Professor Pearl,
If I have well understood, the evidence “e” represents an observed “context” that we want to be true also in the counterfactual hypothesis.
In the definition of the output Y (Y = TX + I + U), as well in the definition of the counterfactual Y [x] (Y [x] = Tx + I + U ), the total causal effect of X on Y (T) does not depend on the evidence “e”.
Question: why does the total causal effect of X on Y (T) used in the formula not be dependent on the evidence “e”?
VA
PS: I apologize for my English.
Comment by Vincenzo Adamo — March 11, 2018 @ 10:28 am