Causal Analysis in Theory and Practice

September 2, 2000

Indirect effects in nonlinear models

Filed under: Indirect effects — moderator @ 12:00 am

(Quoted from Jacques A. Hagenaars' comments on my SMR paper (Pearl, 1998a), dated February 24, 2000. Full text accessible through http://www.knaw.nl/09public/rm/koster1.pdf.)

In general, researchers are interested in the nature and sizes of direct, total and indirect effects. In a way (but see below), Pearl shows how to compute direct and total effects in the general (nonparametric) model, but is silent about indirect effects. …. indirect effects do occupy an important place in substantive theories. Many social science theories `agree' on the input (background characteristics) and output (behavioral) variables, but differ exactly with regard to the intervening mechanisms. To take a simple example, we know that the influence of Education on Political Preferences is mediated through `economic status' (higher educated people get the better jobs and earn more money) and through a `cultural mechanism' (having to do with the contents of the education and the accompanying socialization processes at school). It is important what the causal directions (signs) of these two processes are and which one is the dominant one (at least in The Netherlands they did tend to go into different directions, one leading to a right wing preference, the other to a left wing). We need to know and separate the nature and consequences of these two different processes, that is, we want to know the signs and the magnitudes of the indirect effects. In the parametric linear version of structural equation models, there exists a `calculus of path coefficients' in which we can write total effects in terms of direct and several indirect effects. But this is not possible in the general nonparametric cases and not, e.g., in the loglinear parametric version. For systems of logit models there does not exist a comparable `calculus of path coefficients' as has been remarked long ago. However, given its overriding theoretical importance, the issue of indirect effects cannot be simply neglected.

In line with my own proposals (Hagenaars, 1993, see below), maybe something might be derived from collapsing tables over one but not the other intervening variable; formulated in terms of the `do-operator', maybe some assessment of indirect effects might be obtained by setting not only the `causal factor' (here: Education) to a particular value, but also one of the two intervening variables. Or maybe we must simply conclude that only given particular definitions (parameterizations) of causal effects it makes sense to talk about indirect effects, e.g., only if we take differences between distributions as our causal measure, but not when we use ratios.

Reference: Hagenaars, Jacques A., Loglinear Models with Latent Variables, Sage University Papers Series, Newbury Park, CA: Sage, 49–50, 1993.

July 25, 2000

General criterion for parameter identification

Filed under: Identification — moderator @ 12:00 am

The parameter identification method described in Section 5.3.1 rests on two criteria: (1) The single door criterion of Theorem 5.3.1, and the back-door criterion of Theorem 5.3.2. This method may require appreciable bookkeeping in combining results from various segments of the graph. Is there a single graphical criterion of identification that unifies the two Theorems and thus avoids much of the bookkeeping involved?

June 28, 2000

On causality and decision trees

Filed under: Decision Trees,General — moderator @ 12:00 am

From Dennis Lindley:

If your assumption, that controlling X at x is equivalent to removing the function for X and putting X=x elsewhere, is applicable, then it makes sense because, from my last paragraph, we need past information to select the correct function. What I do not understand at the moment is the relevance of this to decision trees. At a decision node, one conditions on the quantities known at the time of the decision. At a random node, one includes all relevant uncertain quantities under known conditions. Nothing more than the joint distributions (and utility considerations) are needed. For example, in the medical case, the confounding factor may either be known or not at the time the decision about treatment is made, and this determines the structure of the tree. Where causation may enter is when the data are used to assess the probabilities needed in the tree, and it is here that Novick and I used exchangeability. The Bayesian paradigm makes a sharp distinction between probability as belief and probability as frequency, calling the latter, chance. If I understand causation, it would be reasonable that our concept could conveniently be replaced by yours in this context.

June 10, 2000

On functional models for predicting the effect of actions

Filed under: Book (J Pearl),General — moderator @ 12:00 am

From Dennis Lindley:

In the part of Chapter 1 that you kindly sent me, a functional, causal model is clearly defined by a set of equations in (1.40). The set provides a joint probability distribution of the variables using a specific order. That distribution may be manipulated to obtain an equivalent probability specification in any other order. I showed in my note that this probability structure could be described by a set of equations in an order different from that of (1.40). (That proof may be wrong, though on p. 31 you suggest the result was known in '93.) Consequently (1.40) can be replaced by a different set of equations. You tell us now to see what happens were a variable to be controlled; this in terms of the set, and I showed that different consequences flowed if different sets were used. How do I decide which set is correct?

May 18, 2000

Counterfactual notation

Filed under: Book (J Pearl),Counterfactual — moderator @ 12:00 am

From Jos Lehmann (University of Amsterdam):

Jos Lehmann noticed potential ambiguity in the notation used for counterfactual propositions. Capital letters, like "A" or "B," are sometimes used to denote propositional variables, and sometimes to denote propositions. For example, in the function A = C (Model M, page 209) "A" stands for the variable "whether rifleman-A shoots", and takes on values in {true, false}, while in statements S1-S5 (page 208), A stands for a proposition (e.g., "Fireman-A shot").

May 11, 2000

Reversing Statistical Time

Filed under: Statistical Time — moderator @ 12:00 am

From Keith A. Markus, John Jay College of Criminal Justice, CUNY 

Can you provide a general method for solving for the parameters a, b, c, and d, to achieve the exact reversal of the alignment of physical and statistical time? In other words, what is the general principle behind the example on page 59 for selecting the alternative coordinate system that will have the intended effect?

April 24, 2000

Simpson’s paradox and decision trees

Filed under: Decision Trees,Simpson's Paradox — moderator @ 12:14 am

From Nimrod Megiddo (IBM Almaden)

I do not agree that "causality" is the key to resolving the paradox (but this is also a matter of definition) and that tools for looking at it did not exist twenty years ago. Coming from game theory, I think the issue is not difficult for people who like to draw decision trees with "decision" nodes distinguished from "chance" nodes.

I drew two such trees on the attached Word document which I think clarify the correct decision in different circumstances.
Click here for viewing the trees.

Causality and the mystical error terms

Filed under: General,structural equations — moderator @ 12:00 am

From David Kenny (University of Connecticut) 

Let me just say that it is very gratifying to see a philosopher give the problem of causality some serious attention. Moreover, you discuss the concept as it used in contemporary social sciences. I have bothered by the fact that all too many social scientist try to avoid saying "cause" when that is clearly what they mean to say. Thank you!

I have not finished your book, but I cannot resist making one point to you. In 5.4, you discuss the meaning of structural coefficients, but you spend a good deal of time discussing the meaning of epsilon or e. It seems to me that e has a very straight-forward meaning in SEM. If the true equation for y is

y = Bx + Cz + Dq + etc + r where is r is meant to allow for some truly random component, then e = Cz + Dq + etc + r or the sum of the omitted variables. The difficulty in SEM is that usually, though not always, for identification purposes it must be assumed that e and x have a zero correlation. Perhaps this is the standard "omitted variables" explanation of e that you allude to, but it does not seem at all mysterious, at least to me.

March 22, 2000

Bertrand Russell on Causality

Filed under: General — moderator @ 12:12 am

From David Bessler (Texas A&M University)

David Bessler pointed out that Bertrand Russell changed his views on causality relative the those he expressed in 1913 (see Epilogue, page 337). In his book Human Knowledge: Its Scope and Limits (Simon and Schuster, 1948) Russell states: "The power of science is its discovery of causal laws" (page 308).

January 1, 2000

d-Separation Without Tears

Filed under: d-separation — moderator @ 12:10 am

Introduction

d-separation is a criterion for deciding, from a given a causal graph, whether a set X of variables is independent of another set Y, given a third set Z. The idea is to associate "dependence" with "connectedness" (i.e., the existence of a connecting path) and "independence" with "unconnected-ness" or "separation". The only twist on this simple idea is to define what we mean by "connecting path", given that we are dealing with a system of directed arrows in which some vertices (those residing in Z) correspond to measured variables, whose values are known precisely. To account for the orientations of the arrows we use the terms "d-separated" and "d-connected" (d connotes "directional").

We start by considering separation between two singleton variables, x and y; the extension to sets of variables is straightforward (i.e., two sets are separated if and only if each element in one set is separated from every element in the other).

1. Unconditional separation

Rule 1: x and y are d-connected if there is an unblocked path between them.

By a "path" we mean any consecutive sequence of edges, disregarding their directionalities. By "unblocked path" we mean a path that can be traced without traversing a pair of arrows that collide "head-to-head". In other words, arrows that meet head-to-head do not constitute a connection for the purpose of passing information, such a meeting will be called a "collider".

Example 1

This graph contains one collider, at t. The path x-r-s-t is unblocked, hence x and t are d-connected. So is also the path t-u-v-y, hence t and y are d-connected, as well as the pairs u and y, t and v, t and u, x and s etc…. However, x and y are not d-connected; there is no way of tracing a path from x to y without traversing the collider at t. Therefore, we conclude that x and y are d-separated, as well as x and v, s and u, r and u, etc. (The ramification is that the covariance terms corresponding to these pairs of variables will be zero, for every choice of model parameters).

1.2 blocking by conditioning

Motivation: When we measure a set Z of variables, and take their values as given, the conditional distribution of the remaining variables changes character; some dependent variables become independent, and some independent variables become dependent. To represent this dynamics in the graph, we need the notion of "conditional d-connectedness" or, more concretely, "d-connectedness, conditioned on a set Z of measurements".

Rule 2: x and y are d-connected, conditioned on a set Z of nodes, if there is a collider-tree path between x and y that traverses no member of Z. If no such path exists, we say that x and y are d-separated by Z, We also say then that every path between x and y is "blocked" by Z.

Example 2

Let Z be the set {r, v} (marked by circles in the figure). Rule 2 tells us that x and y are d-separated by Z, and so are also x and s, u and y, s and u etc. The path x-r-s is blocked by Z, and so are also the paths u-v-y and s-t-u. The only pairs of unmeasured nodes that remain d-connected in this example, conditioned on Z, are s and t and u and t. Note that, although t is not in Z, the path s-t-u is nevertheless blocked by Z, since t is a collider, and is blocked by Rule 1.

1.3. Conditioning on colliders

Motivation: When we measure a common effect of two independent causes, the causes becomes dependent, because finding the truth of one makes the other less likely (or "explained away"), and refuting one implies the truth of the other. This phenomenon (known as Berkson paradox, or "explaining away") requires a slightly special treatment when we condition on colliders (representing common effects) or their descendants (representing effects of common effects).

Rule 3: If a collider is a member of the conditioning set Z, or has a descendant in Z, then it no longer blocks any path that traces this collider.

Example 3

Let Z be the set {r, p} (again, marked with circles). Rule 3 tells us that s and y are d-connected by Z, because the collider at t has a descendant (p) in Z, which unblocks the path s-t-u-v-y. However, x and u are still d-separated by Z, because although the linkage at t is unblocked, the one at r is blocked by Rule 2 (since r is in Z).

This completes the definition of d-separation, and the reader is invited to try it on some more intricate graphs, such as those shown in Figure 1.3

Typical application:
Suppose we consider the regression of y on p, r and x,

y = c1 p + c2 r + c3x and suppose we wish to predict which coefficient in this regression is zero. From the discussion above we can conclude immediately that c3 is zero, because y and x are d-separated given p and r, hence the partial correlation between y and x, conditioned on p and r, must vanish. c1 and c2, on the other hand, will in general not be zero, as can be seen from the graph: Z={r, x} does not d-separate y from p, and Z={p, x} does not d-separate y from r.

Remark on correlated errors:
Correlated exogenous variables (or error terms) need no special treatment. These are represented by bi-directed arcs (double-arrowed) and their arrowheads are treated as any other arrowhead for the purpose of path tracing. For example, if we add to the graph above a bi-directed arc between x and t, then y and x will no longer be d-separated (by Z={r, p}), because the path x-t-u-v-y is d-connected — the collider at t is unblocked by virtue of having a descendant, p, in Z.

« Previous PageNext Page »

Powered by WordPress