{"id":34,"date":"2007-02-27T16:08:01","date_gmt":"2007-02-28T00:08:01","guid":{"rendered":"http:\/\/www.mii.ucla.edu\/causality\/?p=43"},"modified":"2007-02-27T16:08:01","modified_gmt":"2007-02-28T00:08:01","slug":"counterfactuals-in-linear-systems-2","status":"publish","type":"post","link":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/2007\/02\/27\/counterfactuals-in-linear-systems-2\/","title":{"rendered":"Counterfactuals in linear systems"},"content":{"rendered":"<p><font><strong>What do we know about counterfactuals in linear models?<\/strong><\/font><\/p>\n<p><strong>Here is a neat result concerning the testability of counterfactuals in linear systems.<\/strong><br \/> We know that counterfactual queries of the form <em>P<\/em>(<em>Y<sub>x<\/sub><\/em>=<em>y<\/em>|<em>e<\/em>) may or may not be empirically identifiable, even in experimental studies. For example, the probability of causation,   <em>P<\/em>(<em>Y<sub>x<\/sub><\/em>=<em>y<\/em>|<em>x&#39;,y&#39;<\/em>) is in  general not identifiable from experimental data (<em>Causality<\/em>, p. 290, Corollary 9.2.12) when <em>X<\/em> and <em>Y<\/em>  are binary.<sup>1<\/sup>  (Footnote-1: A complete graphical criterion for distinguishing testable from nontestable counterfactuals is given in Shpitser and Pearl (2007, upcoming)).<\/p>\n<p>This note shows that things are  much friendlier in linear analysis:<\/p>\n<p>Claim A. Any counterfactual query of the form <em>E<\/em>(<em>Y<sub>x<\/sub><\/em> |<em>e<\/em>) is empirically identifiable  in linear causal models, with <em>e<\/em> an arbitrary evidence.<\/p>\n<p>Claim B. <em>E<\/em>(<em>Y<sub>x<\/sub><\/em>|<em>e<\/em>) is given by <\/p>\n<p><em>E<\/em>(<em>Y<sub>x<\/sub><\/em>|<em>e<\/em>) = <em>E<\/em>(<em>Y<\/em>|<em>e<\/em>) + <em>T<\/em> [<em>x<\/em> &#8211; <em>E<\/em>(<em>X<\/em>|<em>e<\/em>)]&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;(1)<\/p>\n<p>where <em>T<\/em> is the total effect coefficient of <em>X<\/em> on <em>Y<\/em>, i.e.,<\/p>\n<p><em>T<\/em> = <em>d E<\/em>[<em>Y<sub>x<\/sub><\/em>]\/<em>dx<\/em>  = <em>E<\/em>(<em>Y<\/em>|<em>do<\/em>(<em>x+1<\/em>)) &#8211; <em>E<\/em>(<em>Y<\/em>|<em>do<\/em>(<em>x<\/em>))&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;(2)<\/p>\n<p>Thus, whenever the causal effect <em>T<\/em> is identified, <em>E<\/em>(<em>Y<sub>x<\/sub><\/em>|<em>e<\/em>) = is identified as well.<\/p>\n<p><!--more--><br \/> Claim A is not surprising. It has been established in  generality by Balke and Pearl (1994b) where expressions involving the covariance matrix were used for the various terms in (1).<\/p>\n<p>Claim B offers an intuitively compelling interpretation of (1) that reads  as follows: Given evidence <em>e<\/em>, to calculate <em>E<\/em>(<em>Y<sub>x<\/sub><\/em> |<em>e<\/em>), (i.e., the expectation of <em>Y<\/em> under the hypothetical assumption  that <em>X<\/em> were <em>x<\/em>, rather than its current value), first calculate the best estimate of <em>Y<\/em> conditioned on the evidence <em>e<\/em>, <em>E<\/em>(<em>Y<\/em>|<em>e<\/em>), then add to it whatever change is expected in <em>Y<\/em> when <em>X<\/em> undergoes a forced increase from its current best estimate, <em>E<\/em>(<em>X<\/em>|<em>e<\/em>), to its hypothetical value <em>X=x<\/em>. That last addition is none other but the effect coefficient <em>T<\/em>, times the expected change in <em>X<\/em>, i.e., <em>T<\/em>[<em>x<\/em> &#8211; <em>E<\/em>(<em>X<\/em>|<em>e<\/em>)]<\/p>\n<p>Note: Eq. (1) can also be written in <em>do<\/em>(<em>x<\/em>) notation as<\/p>\n<p> <em>E<\/em>(<em>Y<sub>x<\/sub><\/em>|<em>e<\/em>) = <em>E<\/em>(<em>Y<\/em>|<em>e<\/em>) + <em>E<\/em>(<em>Y<\/em>|<em>do<\/em>(<em>x<\/em>)) &#8211; <em>E<\/em>[<em>Y<\/em>|<em>do<\/em>(<em>X<\/em>=<em>E<\/em>(<em>X<\/em>|<em>e<\/em>))]&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;(1&#39;) <\/p>\n<p><strong>Proof:<\/strong><br \/> (with help from Ilya Shpitser)<\/p>\n<p>Assume, without loss of generality, that we are dealing with a  zero-mean model.  Since the model is linear, we can write the relation between <em>X<\/em> and <em>Y<\/em> as:<\/p>\n<p><em>Y = TX + I + U<\/em>&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;(3)<\/p>\n<p>where <em>T<\/em> is the total effect of <em>X<\/em> on <em>Y<\/em>, given in (2), <em>I<\/em> represents terms containing other variables  in the model, nondescendants of <em>X<\/em>, and <em>U<\/em> representing  exogenous variables.<\/p>\n<p>It is always possible to bring the function determining <em>Y<\/em> into the form (3) by recursively substituting the functions for each rhs variable that has <em>X<\/em> as an ancestor, and grouping all the <em>X<\/em> terms together to form <em>TX<\/em>.  Clearly, <em>T<\/em> is the Wright-rule sum of the path  costs originating from <em>X<\/em> and ending in <em>Y<\/em>  (Wright, 1921).<\/p>\n<p>From (3) we can write:<\/p>\n<p><em>Y<sub>x<\/sub> = Tx + I + U<\/em>&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;(4)<\/p>\n<p>since <em>I<\/em> and <em>U<\/em> are not affected by hypothetical change from  <em>X=x<\/em> and, moreover,<\/p>\n<p><em>E<\/em>(<em>Y<sub>x<\/sub><\/em>|e) = <em>Tx + E<\/em>(<em>I+U<\/em>|<em>e<\/em>)&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;(5)<\/p>\n<p>since <em>x<\/em> is a constant.<\/p>\n<p>The last term in (5) can be evaluated by taking expectations on both sides of (3), giving:<\/p>\n<p><em>E<\/em>(<em>I+U<\/em>|<em>x<\/em>) = <em>EY<\/em>|<em>e<\/em>) &#8211; <em>TE<\/em>(<em>X<\/em>|<em>e<\/em>)&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;(6)<\/p>\n<p>and, substituted into (5), yields  <em>E<\/em>(<em>Y<sub>x<\/sub><\/em>|<em>e<\/em>) = <em>Tx + E<\/em>(<em>Y<\/em>|<em>e<\/em>) &#8211; <em>E<\/em>(<em>X<\/em>|<em>e<\/em>)&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;(7)<br \/> and proves our target formula (1).<br \/> &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211; QED<\/p>\n<p><strong>Some Familiar Problems Cast in Linear Outfits<\/strong><br \/> Three Special cases of <em>e<\/em> are worth noting:<br \/> Example-1.  <em>e<\/em>: <em>X =x&#39;, Y = y&#39;<\/em><br \/> (The linear equivalent of the probability of causation) From (1) we obtain directly<\/p>\n<p> <em>E<\/em>(<em>Y<sub>x<\/sub><\/em>|<em>Y=y&#39;, X=x&#39;<\/em>) = <em>y&#39; + T<\/em> (<em>x &#8211; x&#39;<\/em>) <\/p>\n<p>This is intuitively compelling. The hypothetical expectation of <em>Y<\/em> is simply the observed value of <em>Y, y&#39;<\/em>, plus the anticipated change in <em>Y<\/em> due to the change <em>x-x&#39;<\/em> in <em>X<\/em>.<\/p>\n<p>Example-2.  <em>e<\/em>: <em>X = x&#39;<\/em>  (effect of treatment on treated)<\/p>\n<p> <em>E<\/em>(<em>Y<sub>x<\/sub><\/em>|<em>X=x&#39;<\/em>) = <em>E<\/em>(<em>Y<\/em>|<em>x&#39;<\/em>) + <em>T<\/em> (<em>x<\/em> &#8211; <em>x&#39;<\/em>)<br \/> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =<em> rx&#39; + T <\/em>(<em>x &#8211; x&#39;<\/em>)<br \/> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;= <em>rx&#39; + E<\/em>(<em>Y<\/em>|<em>do<\/em>(<em>x<\/em>)) &#8211; <em>E<\/em>(<em>Y<\/em>|<em>do<\/em>(<em>x&#39;<\/em>))   where <em>r<\/em> is the regression coefficient of <em>Y<\/em> on <em>X<\/em>. <\/p>\n<p>Example-3. <em>e<\/em>; <em>Y = y&#39;<\/em><br \/> (Gee, my temperature is <em>Y=y&#39;<\/em>, what if I had taken <em>x<\/em> tablets of aspirin. How many did you take? Don&#39;t remember.)<\/p>\n<p><em>E<\/em>(<em>Y<sub>x<\/sub> <\/em>|<em>Y=y&#39;<\/em>) = <em>y&#39; + T <\/em>[<em>x &#8211; E<\/em>(<em>X<\/em>|<em>y&#39;<\/em>)]<br \/> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;= <em>y&#39; + E<\/em>(<em>Y<\/em>|<em>do<\/em>(<em>x<\/em>)) &#8211; <em>E<\/em>[<em>Y<\/em>|<em>do<\/em>(<em>X=r&#39;y&#39;<\/em>)]<\/p>\n<p>where <em>r&#39;<\/em> is the regression coefficient of <em>X<\/em> on <em>Y<\/em>.<\/p>\n<p>Example-4. Let us consider the non-recursive, supply-demand model  of page 215 in <em>Causality<\/em> (2000). Eqs. (7.9)-(7.10) read:<\/p>\n<p><em>q = b<sub>1<\/sub>p + d<sub>1<\/sub>i +u<sub>1<\/sub><\/em><br \/> <em>p = b<sub>2<\/sub>q + d<sub>2<\/sub>w +u<sub>2<\/sub><\/em><\/p>\n<p>Our counterfactual problem (page 216) reads: Given that the current price is <em>P=p<sub>0<\/sub><\/em>, what would be the expected value of the demand <em>Q<\/em> if we were to control the price at <em>P = p<sub>1<\/sub><\/em>? Making the correspondence <em>P = X, Q = Y, e =<\/em> {<em>P=p<sub>0<\/sub>,  i, w<\/em>}, we see that this problem is identical to Example 2 above (effect of treatment on the treated), subject to conditioning on  <em>i<\/em> and <em>w<\/em>. Hence, since <em>T = b<sub>1<\/sub><\/em>, we can immediately write<\/p>\n<p><em>E<\/em>(<em>Q<sub>p<sub>1<\/sub><\/sub><\/em> | <em>p<sub>0<\/sub>, i, w<\/em>) = <em>E<\/em>(<em>Y<\/em>|<em>p<sub>0<\/sub>,i,w<\/em>) + <em>b<sub>1<\/sub><\/em>(<em>p<sub>1<\/sub> &#8211; p<sub>0<\/sub><\/em>)<br \/> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&#038;<br \/>\nnbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;= <em>r<sub>p<\/sub> p<sub>0<\/sub> + r<sub>i<\/sub> i + r<sub>w<\/sub> w + b<sub>1<\/sub><\/em>(<em>p<sub>1<\/sub>-p<sub>0<\/sub><\/em>)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(8)<\/p>\n<p>where <em>r<sub>p<\/sub>, r<sub>i<\/sub><\/em> and <em>r<sub>w<\/sub><\/em>  are the coefficient of <em>P, i<\/em> and <em>w<\/em>, respectively,  in the regression of <em>Q<\/em> on <em>P, i<\/em> and <em>w<\/em>.<\/p>\n<p>Eq. (8) replaces Eq. (7.17) on page (217).  Note that the parameters of the price equation<\/p>\n<p><em>p = b<sub>2<\/sub>q + d<sub>2<\/sub>w +u<sub>2<\/sub><\/em><\/p>\n<p>only enter (8) via the regression coefficients. Thus,  they need not be calculated explicitly in case the are estimated directly by least square.<\/p>\n<p><strong>Remark 1:<\/strong><br \/> Example 1 is not really surprising; we know that the probability of causation is empirically identifiable under the assumption of monotonicity (<em>Causality<\/em>, p. 293). But examples 2 and 3 trigger the following conjecture:<\/p>\n<p><strong>Conjecture<\/strong><br \/> Any counterfactual query of the form <em>P<\/em>(<em>Y<sub>x<\/sub><\/em> |<em>e<\/em>) is empirically identifiable  when <em>Y<\/em> is monotonic relative to  <em>X<\/em>.<\/p>\n<p>It is good to end on a challenging note.<\/p>\n<p><font>Best wishes,<br \/> ========Judea Pearl <\/font><\/p>\n","protected":false},"excerpt":{"rendered":"<p>What do we know about counterfactuals in linear models? Here is a neat result concerning the testability of counterfactuals in linear systems. We know that counterfactual queries of the form P(Yx=y|e) may or may not be empirically identifiable, even in experimental studies. For example, the probability of causation, P(Yx=y|x&#39;,y&#39;) is in general not identifiable from [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,22],"tags":[],"class_list":["post-34","post","type-post","status-publish","format-standard","hentry","category-counterfactual","category-linear-systems"],"_links":{"self":[{"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/posts\/34","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=34"}],"version-history":[{"count":0,"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/posts\/34\/revisions"}],"wp:attachment":[{"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=34"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=34"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=34"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}