Causal Analysis in Theory and Practice

February 27, 2007

Counterfactuals in linear systems

Filed under: Counterfactual,Linear Systems — judea @ 4:08 pm

What do we know about counterfactuals in linear models?

Here is a neat result concerning the testability of counterfactuals in linear systems.
We know that counterfactual queries of the form P(Y_x=y|e) may or may not be empirically identifiable, even in experimental studies. For example, the probability of causation, P(Y_x=y|x',y') is in general not identifiable from experimental data (Causality, p. 290, Corollary 9.2.12) when X and Y are binary.¹ (Footnote-1: A complete graphical criterion for distinguishing testable from nontestable counterfactuals is given in Shpitser and Pearl (2007, upcoming)).

This note shows that things are much friendlier in linear analysis:

Claim A. Any counterfactual query of the form E(Y_x |e) is empirically identifiable in linear causal models, with e an arbitrary evidence.

Claim B. E(Y_x|e) is given by

E(Y_x|e) = E(Y|e) + T [x – E(X|e)] (1)

where T is the total effect coefficient of X on Y, i.e.,

T = d E[Y_x]/dx = E(Y|do(x+1)) – E(Y|do(x)) (2)

Thus, whenever the causal effect T is identified, E(Y_x|e) = is identified as well.

Claim A is not surprising. It has been established in generality by Balke and Pearl (1994b) where expressions involving the covariance matrix were used for the various terms in (1).

Claim B offers an intuitively compelling interpretation of (1) that reads as follows: Given evidence e, to calculate E(Y_x |e), (i.e., the expectation of Y under the hypothetical assumption that X were x, rather than its current value), first calculate the best estimate of Y conditioned on the evidence e, E(Y|e), then add to it whatever change is expected in Y when X undergoes a forced increase from its current best estimate, E(X|e), to its hypothetical value X=x. That last addition is none other but the effect coefficient T, times the expected change in X, i.e., T[x – E(X|e)]

Note: Eq. (1) can also be written in do(x) notation as

E(Y_x|e) = E(Y|e) + E(Y|do(x)) – E[Y|do(X=E(X|e))] (1')

Proof:
(with help from Ilya Shpitser)

Assume, without loss of generality, that we are dealing with a zero-mean model. Since the model is linear, we can write the relation between X and Y as:

Y = TX + I + U (3)

where T is the total effect of X on Y, given in (2), I represents terms containing other variables in the model, nondescendants of X, and U representing exogenous variables.

It is always possible to bring the function determining Y into the form (3) by recursively substituting the functions for each rhs variable that has X as an ancestor, and grouping all the X terms together to form TX. Clearly, T is the Wright-rule sum of the path costs originating from X and ending in Y (Wright, 1921).

From (3) we can write:

Y_x = Tx + I + U (4)

since I and U are not affected by hypothetical change from X=x and, moreover,

E(Y_x|e) = Tx + E(I+U|e) (5)

since x is a constant.

The last term in (5) can be evaluated by taking expectations on both sides of (3), giving:

E(I+U|x) = EY|e) – TE(X|e) (6)

and, substituted into (5), yields E(Y_x|e) = Tx + E(Y|e) – E(X|e) (7)
and proves our target formula (1).
——————– QED

Some Familiar Problems Cast in Linear Outfits
Three Special cases of e are worth noting:
Example-1. e: X =x', Y = y'
(The linear equivalent of the probability of causation) From (1) we obtain directly

E(Y_x|Y=y', X=x') = y' + T (x – x')

This is intuitively compelling. The hypothetical expectation of Y is simply the observed value of Y, y', plus the anticipated change in Y due to the change x-x' in X.

Example-2. e: X = x' (effect of treatment on treated)

E(Y_x|X=x') = E(Y|x') + T (x – x')
= rx' + T (x – x')
= rx' + E(Y|do(x)) – E(Y|do(x')) where r is the regression coefficient of Y on X.

Example-3. e; Y = y'
(Gee, my temperature is Y=y', what if I had taken x tablets of aspirin. How many did you take? Don't remember.)

E(Y_x |Y=y') = y' + T [x – E(X|y')]
= y' + E(Y|do(x)) – E[Y|do(X=r'y')]

where r' is the regression coefficient of X on Y.

Example-4. Let us consider the non-recursive, supply-demand model of page 215 in Causality (2000). Eqs. (7.9)-(7.10) read:

q = b₁p + d₁i +u₁
p = b₂q + d₂w +u₂

Our counterfactual problem (page 216) reads: Given that the current price is P=p₀, what would be the expected value of the demand Q if we were to control the price at P = p₁? Making the correspondence P = X, Q = Y, e = {P=p₀, i, w}, we see that this problem is identical to Example 2 above (effect of treatment on the treated), subject to conditioning on i and w. Hence, since T = b₁, we can immediately write

E(Q_p₁ | p₀, i, w) = E(Y|p₀,i,w) + b₁(p₁ – p₀)
&
nbsp; = r_p p₀ + r_i i + r_w w + b₁(p₁-p₀) (8)

where r_p, r_i and r_w are the coefficient of P, i and w, respectively, in the regression of Q on P, i and w.

Eq. (8) replaces Eq. (7.17) on page (217). Note that the parameters of the price equation

p = b₂q + d₂w +u₂

only enter (8) via the regression coefficients. Thus, they need not be calculated explicitly in case the are estimated directly by least square.

Remark 1:
Example 1 is not really surprising; we know that the probability of causation is empirically identifiable under the assumption of monotonicity (Causality, p. 293). But examples 2 and 3 trigger the following conjecture:

Conjecture
Any counterfactual query of the form P(Y_x |e) is empirically identifiable when Y is monotonic relative to X.

It is good to end on a challenging note.

Best wishes,
========Judea Pearl

Comments (1)

1 Comment »

Dear Professor Pearl,
If I have well understood, the evidence “e” represents an observed “context” that we want to be true also in the counterfactual hypothesis.
In the definition of the output Y (Y = TX + I + U), as well in the definition of the counterfactual Y [x] (Y [x] = Tx + I + U ), the total causal effect of X on Y (T) does not depend on the evidence “e”.

Question: why does the total causal effect of X on Y (T) used in the formula not be dependent on the evidence “e”?

VA
PS: I apologize for my English.

Comment by Vincenzo Adamo — March 11, 2018 @ 10:28 am

RSS feed for comments on this post. TrackBack URI

February 27, 2007

Counterfactuals in linear systems

1 Comment »

Leave a comment