Simpson’s paradox and decision trees
From Nimrod Megiddo (IBM Almaden)
I do not agree that "causality" is the key to resolving the paradox (but this is also a matter of definition) and that tools for looking at it did not exist twenty years ago. Coming from game theory, I think the issue is not difficult for people who like to draw decision trees with "decision" nodes distinguished from "chance" nodes.
I drew two such trees on the attached Word document which I think clarify the correct decision in different circumstances.
Click here for viewing the trees.
The fact that you have constructed two different decision trees for the same input tables implies that the key to the construction was not in the data, but in some information you obtained from the story behind the data, What is that information?
The literature of decision tree analysis has indeed been in existence for at least fifty years but, to the best of my knowledge, it has not dealt seriously with the problem posed above: "what information we use to guide us into setting up the correct decision tree?"
We agree that giving a robot the frequency tables ALONE, would not be sufficient for the job. But what else would Mr. robot (or a statistician) need? Changing the story from F= "female" to F= "Blood pressure" seems to be enough for people, because people understand informally the distinct rolls that gender and blood pressure play in the scheme of things. Can we characterize these rolls formally, so that our robot would be able to construct the correct decision tree?
My proposal: give the robot (or a statistician or a decision-tree expert) a pair (T, G), where T is the set of frequency tables and G is a causal graph and, lo and behold, the robot would be able to set up the correct decision tree automatically. This is what I meant by saying that the resolution of the paradox lies in causal considerations. Moreover, one can go further and argue: "if the information in (T, G) is sufficient, why not skip the construction of a decision tree altogether, and get the right answer directly from (T, G)?" This is the gist of chapters 3-4 in the book, which can be a topic for a separate discussion: Would the rich literature on decision tree analysis benefit from conversion to the more economical encoding of decision problems in the syntax of (T, G)? The introduction of influence diagrams (in 1981) was a step in this direction and, as Section 4.1.2 indicates, the second step might not be too far off.
Best wishes,
========Judea Pearl
Comment by judea — April 24, 2000 @ 12:00 pm