Causal Analysis in Theory and Practice

February 27, 2007

Counterfactuals in linear systems

Filed under: Counterfactual,Linear Systems — judea @ 4:08 pm

What do we know about counterfactuals in linear models?

Here is a neat result concerning the testability of counterfactuals in linear systems.
We know that counterfactual queries of the form P(Yx=y|e) may or may not be empirically identifiable, even in experimental studies. For example, the probability of causation, P(Yx=y|x',y') is in general not identifiable from experimental data (Causality, p. 290, Corollary 9.2.12) when X and Y are binary.1 (Footnote-1: A complete graphical criterion for distinguishing testable from nontestable counterfactuals is given in Shpitser and Pearl (2007, upcoming)).

This note shows that things are much friendlier in linear analysis:

Claim A. Any counterfactual query of the form E(Yx |e) is empirically identifiable in linear causal models, with e an arbitrary evidence.

Claim B. E(Yx|e) is given by

E(Yx|e) = E(Y|e) + T [xE(X|e)]      (1)

where T is the total effect coefficient of X on Y, i.e.,

T = d E[Yx]/dx = E(Y|do(x+1)) – E(Y|do(x))      (2)

Thus, whenever the causal effect T is identified, E(Yx|e) = is identified as well.


Claim A is not surprising. It has been established in generality by Balke and Pearl (1994b) where expressions involving the covariance matrix were used for the various terms in (1).

Claim B offers an intuitively compelling interpretation of (1) that reads as follows: Given evidence e, to calculate E(Yx |e), (i.e., the expectation of Y under the hypothetical assumption that X were x, rather than its current value), first calculate the best estimate of Y conditioned on the evidence e, E(Y|e), then add to it whatever change is expected in Y when X undergoes a forced increase from its current best estimate, E(X|e), to its hypothetical value X=x. That last addition is none other but the effect coefficient T, times the expected change in X, i.e., T[xE(X|e)]

Note: Eq. (1) can also be written in do(x) notation as

E(Yx|e) = E(Y|e) + E(Y|do(x)) – E[Y|do(X=E(X|e))]      (1')

Proof:
(with help from Ilya Shpitser)

Assume, without loss of generality, that we are dealing with a zero-mean model. Since the model is linear, we can write the relation between X and Y as:

Y = TX + I + U      (3)

where T is the total effect of X on Y, given in (2), I represents terms containing other variables in the model, nondescendants of X, and U representing exogenous variables.

It is always possible to bring the function determining Y into the form (3) by recursively substituting the functions for each rhs variable that has X as an ancestor, and grouping all the X terms together to form TX. Clearly, T is the Wright-rule sum of the path costs originating from X and ending in Y (Wright, 1921).

From (3) we can write:

Yx = Tx + I + U      (4)

since I and U are not affected by hypothetical change from X=x and, moreover,

E(Yx|e) = Tx + E(I+U|e)      (5)

since x is a constant.

The last term in (5) can be evaluated by taking expectations on both sides of (3), giving:

E(I+U|x) = EY|e) – TE(X|e)      (6)

and, substituted into (5), yields E(Yx|e) = Tx + E(Y|e) – E(X|e)      (7)
and proves our target formula (1).
——————– QED

Some Familiar Problems Cast in Linear Outfits
Three Special cases of e are worth noting:
Example-1. e: X =x', Y = y'
(The linear equivalent of the probability of causation) From (1) we obtain directly

E(Yx|Y=y', X=x') = y' + T (x – x')

This is intuitively compelling. The hypothetical expectation of Y is simply the observed value of Y, y', plus the anticipated change in Y due to the change x-x' in X.

Example-2. e: X = x' (effect of treatment on treated)

E(Yx|X=x') = E(Y|x') + T (xx')
            = rx' + T (x – x')
                                  = rx' + E(Y|do(x)) – E(Y|do(x')) where r is the regression coefficient of Y on X.

Example-3. e; Y = y'
(Gee, my temperature is Y=y', what if I had taken x tablets of aspirin. How many did you take? Don't remember.)

E(Yx |Y=y') = y' + T [x – E(X|y')]
                                      = y' + E(Y|do(x)) – E[Y|do(X=r'y')]

where r' is the regression coefficient of X on Y.

Example-4. Let us consider the non-recursive, supply-demand model of page 215 in Causality (2000). Eqs. (7.9)-(7.10) read:

q = b1p + d1i +u1
p = b2q + d2w +u2

Our counterfactual problem (page 216) reads: Given that the current price is P=p0, what would be the expected value of the demand Q if we were to control the price at P = p1? Making the correspondence P = X, Q = Y, e = {P=p0, i, w}, we see that this problem is identical to Example 2 above (effect of treatment on the treated), subject to conditioning on i and w. Hence, since T = b1, we can immediately write

E(Qp1 | p0, i, w) = E(Y|p0,i,w) + b1(p1 – p0)
        &
nbsp;                                  = rp p0 + ri i + rw w + b1(p1-p0)     (8)

where rp, ri and rw are the coefficient of P, i and w, respectively, in the regression of Q on P, i and w.

Eq. (8) replaces Eq. (7.17) on page (217). Note that the parameters of the price equation

p = b2q + d2w +u2

only enter (8) via the regression coefficients. Thus, they need not be calculated explicitly in case the are estimated directly by least square.

Remark 1:
Example 1 is not really surprising; we know that the probability of causation is empirically identifiable under the assumption of monotonicity (Causality, p. 293). But examples 2 and 3 trigger the following conjecture:

Conjecture
Any counterfactual query of the form P(Yx |e) is empirically identifiable when Y is monotonic relative to X.

It is good to end on a challenging note.

Best wishes,
========Judea Pearl

1 Comment »

  1. Dear Professor Pearl,
    If I have well understood, the evidence “e” represents an observed “context” that we want to be true also in the counterfactual hypothesis.
    In the definition of the output Y (Y = TX + I + U), as well in the definition of the counterfactual Y [x] (Y [x] = Tx + I + U ), the total causal effect of X on Y (T) does not depend on the evidence “e”.

    Question: why does the total causal effect of X on Y (T) used in the formula not be dependent on the evidence “e”?

    VA
    PS: I apologize for my English.

    Comment by Vincenzo Adamo — March 11, 2018 @ 10:28 am

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress