### The validity of G-estimation

**From a previous correspondence with ****Eliezer S. Yudkowsky, Research Fellow, Singularity Institute for Artificial Intelligence, Santa Clara, CA **

The following paragraph appears on p. 103, shortly after eq. 3.63 in my copy of *Causality*:

"To place this result in the context of our analysis in this chapter, we note that the class of semi-Markovian models satisfying assumption (3.62) corresponds to complete DAGs in which all arrowheads pointing to *X _{k}* originate from observed variables."

It looks to me like this is a sufficient, but not necessary, condition to satisfy 3.62. It appears to me that the necessary condition is that no confounder exist between any *X _{i}* and

*L*with

_{j}*i < j*and that no confounder exist between any

*X*and the outcome variable

_{i}*Y*. However, a confounding arc between any

*X*and

_{i}*X*, or a confounding arc between

_{j}*L*and

_{i}*X*with

_{j}*i*<=

*j*, should not render the causal effect non-identifiable. For example, even if a confounding arc exists between

*X*

_{2}and

*X*

_{3}(but no other confounding arcs exist in the model), the causal effect on

*Y*of setting

*X*

_{2}=

*x*

_{2}and

*X*

_{3}=

*x*

_{3}should be the same as the distribution on

*Y*if we observe

*x*

_{2}and

*x*

_{3}.

It is also not necessary that the DAG be complete.

You are right that the DAG need not be complete, and that the condition cited in p. 103 is sufficient but not necessary for either

(3.62)

or the

G-estimation formula(3.63)

to hold. Corrections to the wordings of page 103 were posted on this website.

Your suggestion to allow confounding arcs beween

XiandXj, is valid. However, allowing a confounding arc betweenLiandXj(withi<j) is too permissive, as can be seen by the non-identified models of Figure 3.9 (b), (c), (d) and (g) inCausality.In general, condition (3.62) is both over-restrictive and lacks intuitive basis. A more general and intuitive condition leading to (3.63) is formulated in (4.5) (

Causality, p 122), which reads as follows:(3.62*) General condition forg-estimationP(y|g=x) is identifiable and is given by (3.63) if every action-avoiding back-door path fromXktoYis blocked by some subsetLkof non-descendants ofXk. (By "action-avoiding" we mean a path containing no arrows entering anXvariable later thanXk.)Comment 1:The new definition leads to improvements over (3,62), namely, there are cases where theg-formula (3.63) is valid with a subsetLkof the past but not with the entire past.Example 1:Assuming

U1 andU2 are unobserved, and temporal order:U1,Z, X1,U2,Ywe see that (3.62*), hence (3.63), are satisfied withL1 = 0, while taking the whole pastL1 =Zwould violate both.(3.62) is also satisfied with the choice

L1=0, but not withL1=Z.Comment 2:DefiningLkas the set of "nondescendants" ofXk(as opposed to temporal predecessors ofXk) also broadens (3.62).Example 2:with temporal order:

U1,X1,S,YBoth (3.62) and (3.62*) are satisfied with

L1 =S, but not withL1 = 0.Comment 3:There are cases where (3.62) will not be satisfied even with the new interpretation ofLk, but the graphical condition (3.62*) is.Example 3:(constructed by Ilya Shpitser)It is easy to see that (3.62*) is satisfied; all back-door action-avoiding paths from

X1 toYare blocked byX0,Z, Z'.At the same time, it is possible to show, though by a rather intricate method (see the Twin Network Method, page 213) that

Y{x1,x2} is not independent ofX1, givenZ, Z'andX0.(In the twin network model there is a

d-connected path fromX1 toYx, as follows:X1 <–>Z<–>Z*–>Z'*–>Y*) Therefore, (3.62) is not satisfied forY{x1,x2} andX1.)This example demonstrates one weakness of the Potential Response approach initially taken by Robins in deriving (3.63). The counterfactual condition (3.62) that legitimizes the use of the

g-estimation formula is void of intuitive support, hence, epidemiologists who apply this formula are doing so under no guidance of substantive medical knowledge. Fortunately, graphical methods are slowly making their way into epidemiological practice, and more and more people begin to understand the assumptions behindg-estimation.(Warning: Those who currently reign causal analysis in statistics are incurably graph-o-phobic and ruthlessly resist attempts to enlighten their students, readers and co-workers with graphical methods. This slows down progress in statistical research, but will eventually be overrun by commonsense.)

Best wishes,

========Judea Pearl

Comment by judea — February 23, 2007 @ 1:35 am