On causality and decision trees
From Dennis Lindley:
If your assumption, that controlling X at x is equivalent to removing the function for X and putting X=x elsewhere, is applicable, then it makes sense because, from my last paragraph, we need past information to select the correct function. What I do not understand at the moment is the relevance of this to decision trees. At a decision node, one conditions on the quantities known at the time of the decision. At a random node, one includes all relevant uncertain quantities under known conditions. Nothing more than the joint distributions (and utility considerations) are needed. For example, in the medical case, the confounding factor may either be known or not at the time the decision about treatment is made, and this determines the structure of the tree. Where causation may enter is when the data are used to assess the probabilities needed in the tree, and it is here that Novick and I used exchangeability. The Bayesian paradigm makes a sharp distinction between probability as belief and probability as frequency, calling the latter, chance. If I understand causation, it would be reasonable that our concept could conveniently be replaced by yours in this context.
Many decision analysts take the position that causality is not needed because: "Nothing more than the joint distributions (and utility considerations) are needed." (see discussion with Nimrod Megiddo posted on this page). I certainly agree that joint distributions is all that is needed, because P(y|do(x)) is indeed a well defined distribution function, and this distribution is the target of causal analysis. What is special about this distribution, however, is that it is not derivable from the joint distribution P(y,x), unless we add causal knowledge, such as the one provided by a causal graph.
Your next sentence says it all:
Where causation may enter is when the data are used to assess
the probabilities needed in the tree,…
This is precisely the way I think about causation, perhaps more daringly. I would not restrict myself to "data" but will include "beliefs", results of various experiments, and even plain scientific knowledge (Ohm's law, Newtonian mechanics, etc.)
I take the frequency/belief distinction to be tangential to discussions of causality. Let us assume that the tables in Simpson's story were not frequency tables, but summaries of one's subjective beliefs about the occurrence of various joint events, (C,E,F),(C,E,-F)… etc. My assertion remains that this summary of beliefs is not sufficient for constructing our decision tree. We need also to assess our belief in the hypothetical event "E would occur if a decision do(C) is taken" and, as I have emphasized (and demonstrated), temporal information alone is insufficient for deriving this assessment from the tabulated belief summaries, hence, we cannot construct the decision tree from this belief summary; We need an extra ingredient, which I call "causal" information and you choose to call "exchangeability" — I would not quarrel about nomenclature.
Best wishes,
========Judea Pearl
Comment by judea — February 21, 2007 @ 11:39 pm