### Counterfactuals in linear systems

**What do we know about counterfactuals in linear models?**

**Here is a neat result concerning the testability of counterfactuals in linear systems.**

We know that counterfactual queries of the form *P*(*Y _{x}*=

*y*|

*e*) may or may not be empirically identifiable, even in experimental studies. For example, the probability of causation,

*P*(

*Y*=

_{x}*y*|

*x',y'*) is in general not identifiable from experimental data (

*Causality*, p. 290, Corollary 9.2.12) when

*X*and

*Y*are binary.

^{1}(Footnote-1: A complete graphical criterion for distinguishing testable from nontestable counterfactuals is given in Shpitser and Pearl (2007, upcoming)).

This note shows that things are much friendlier in linear analysis:

Claim A. Any counterfactual query of the form *E*(*Y _{x}* |

*e*) is empirically identifiable in linear causal models, with

*e*an arbitrary evidence.

Claim B. *E*(*Y _{x}*|

*e*) is given by

*E*(*Y _{x}*|

*e*) =

*E*(

*Y*|

*e*) +

*T*[

*x*–

*E*(

*X*|

*e*)] (1)

where *T* is the total effect coefficient of *X* on *Y*, i.e.,

*T* = *d E*[*Y _{x}*]/

*dx*=

*E*(

*Y*|

*do*(

*x+1*)) –

*E*(

*Y*|

*do*(

*x*)) (2)

Thus, whenever the causal effect *T* is identified, *E*(*Y _{x}*|

*e*) = is identified as well.

Claim A is not surprising. It has been established in generality by Balke and Pearl (1994b) where expressions involving the covariance matrix were used for the various terms in (1).

Claim B offers an intuitively compelling interpretation of (1) that reads as follows: Given evidence *e*, to calculate *E*(*Y _{x}* |

*e*), (i.e., the expectation of

*Y*under the hypothetical assumption that

*X*were

*x*, rather than its current value), first calculate the best estimate of

*Y*conditioned on the evidence

*e*,

*E*(

*Y*|

*e*), then add to it whatever change is expected in

*Y*when

*X*undergoes a forced increase from its current best estimate,

*E*(

*X*|

*e*), to its hypothetical value

*X=x*. That last addition is none other but the effect coefficient

*T*, times the expected change in

*X*, i.e.,

*T*[

*x*–

*E*(

*X*|

*e*)]

Note: Eq. (1) can also be written in *do*(*x*) notation as

*E*(*Y _{x}*|

*e*) =

*E*(

*Y*|

*e*) +

*E*(

*Y*|

*do*(

*x*)) –

*E*[

*Y*|

*do*(

*X*=

*E*(

*X*|

*e*))] (1')

**Proof:**

(with help from Ilya Shpitser)

Assume, without loss of generality, that we are dealing with a zero-mean model. Since the model is linear, we can write the relation between *X* and *Y* as:

*Y = TX + I + U* (3)

where *T* is the total effect of *X* on *Y*, given in (2), *I* represents terms containing other variables in the model, nondescendants of *X*, and *U* representing exogenous variables.

It is always possible to bring the function determining *Y* into the form (3) by recursively substituting the functions for each rhs variable that has *X* as an ancestor, and grouping all the *X* terms together to form *TX*. Clearly, *T* is the Wright-rule sum of the path costs originating from *X* and ending in *Y* (Wright, 1921).

From (3) we can write:

*Y _{x} = Tx + I + U* (4)

since *I* and *U* are not affected by hypothetical change from *X=x* and, moreover,

*E*(*Y _{x}*|e) =

*Tx + E*(

*I+U*|

*e*) (5)

since *x* is a constant.

The last term in (5) can be evaluated by taking expectations on both sides of (3), giving:

*E*(*I+U*|*x*) = *EY*|*e*) – *TE*(*X*|*e*) (6)

and, substituted into (5), yields *E*(*Y _{x}*|

*e*) =

*Tx + E*(

*Y*|

*e*) –

*E*(

*X*|

*e*) (7)

and proves our target formula (1).

——————– QED

**Some Familiar Problems Cast in Linear Outfits**

Three Special cases of *e* are worth noting:

Example-1. *e*: *X =x', Y = y'*

(The linear equivalent of the probability of causation) From (1) we obtain directly

*E*(*Y _{x}*|

*Y=y', X=x'*) =

*y' + T*(

*x – x'*)

This is intuitively compelling. The hypothetical expectation of *Y* is simply the observed value of *Y, y'*, plus the anticipated change in *Y* due to the change *x-x'* in *X*.

Example-2. *e*: *X = x'* (effect of treatment on treated)

*E*(*Y _{x}*|

*X=x'*) =

*E*(

*Y*|

*x'*) +

*T*(

*x*–

*x'*)

=

*rx' + T*(

*x – x'*)

=

*rx' + E*(

*Y*|

*do*(

*x*)) –

*E*(

*Y*|

*do*(

*x'*)) where

*r*is the regression coefficient of

*Y*on

*X*.

Example-3. *e*; *Y = y'*

(Gee, my temperature is *Y=y'*, what if I had taken *x* tablets of aspirin. How many did you take? Don't remember.)

*E*(*Y _{x} *|

*Y=y'*) =

*y' + T*[

*x – E*(

*X*|

*y'*)]

=

*y' + E*(

*Y*|

*do*(

*x*)) –

*E*[

*Y*|

*do*(

*X=r'y'*)]

where *r'* is the regression coefficient of *X* on *Y*.

Example-4. Let us consider the non-recursive, supply-demand model of page 215 in *Causality* (2000). Eqs. (7.9)-(7.10) read:

*q = b _{1}p + d_{1}i +u_{1}*

*p = b*

_{2}q + d_{2}w +u_{2}Our counterfactual problem (page 216) reads: Given that the current price is *P=p _{0}*, what would be the expected value of the demand

*Q*if we were to control the price at

*P = p*? Making the correspondence

_{1}*P = X, Q = Y, e =*{

*P=p*}, we see that this problem is identical to Example 2 above (effect of treatment on the treated), subject to conditioning on

_{0}, i, w*i*and

*w*. Hence, since

*T = b*, we can immediately write

_{1}*E*(*Q _{p1}* |

*p*) =

_{0}, i, w*E*(

*Y*|

*p*) +

_{0},i,w*b*(

_{1}*p*)

_{1}– p_{0}&

nbsp; =

*r*(

_{p}p_{0}+ r_{i}i + r_{w}w + b_{1}*p*) (8)

_{1}-p_{0}where *r _{p}, r_{i}* and

*r*are the coefficient of

_{w}*P, i*and

*w*, respectively, in the regression of

*Q*on

*P, i*and

*w*.

Eq. (8) replaces Eq. (7.17) on page (217). Note that the parameters of the price equation

*p = b _{2}q + d_{2}w +u_{2}*

only enter (8) via the regression coefficients. Thus, they need not be calculated explicitly in case the are estimated directly by least square.

**Remark 1:**

Example 1 is not really surprising; we know that the probability of causation is empirically identifiable under the assumption of monotonicity (*Causality*, p. 293). But examples 2 and 3 trigger the following conjecture:

**Conjecture**

Any counterfactual query of the form *P*(*Y _{x}* |

*e*) is empirically identifiable when

*Y*is monotonic relative to

*X*.

It is good to end on a challenging note.

Best wishes,

========Judea Pearl

Dear Professor Pearl,

If I have well understood, the evidence “e” represents an observed “context” that we want to be true also in the counterfactual hypothesis.

In the definition of the output Y (Y = TX + I + U), as well in the definition of the counterfactual Y [x] (Y [x] = Tx + I + U ), the total causal effect of X on Y (T) does not depend on the evidence “e”.

Question: why does the total causal effect of X on Y (T) used in the formula not be dependent on the evidence “e”?

VA

PS: I apologize for my English.

Comment by Vincenzo Adamo — March 11, 2018 @ 10:28 am