### External Validity and Extrapolations

Author: Judea Pearl

The July issue of the Proceedings of the National Academy of Sciences contains several articles on Causal Analysis in the age of Big Data, among them our (Bareinboim and Pearl’s) paper on data fusion and external validity. http://ftp.cs.ucla.edu/pub/stat_ser/r450-reprint.pdf Several nuances of this problem were covered earlier on this blog under titles such as transportability, generalizability, extrapolation and selection-bias, see http://ftp.cs.ucla.edu/pub/stat_ser/r400-reprint.pdf and http://ftp.cs.ucla.edu/pub/stat_ser/r425.pdf.

The PNAS paper has attracted the attention of the UCLA Newsroom which issued a press release with a very accessible description of the problem and its solution. You can find it here: http://newsroom.ucla.edu/releases/solving-big-datas-fusion-problem

A few remarks:

I consider the mathematical solution of the external validity problem to be one of the real gems of modern causal analysis. The problem has its roots in the writings of 18th century demographers and its more recent awareness is usually associated with Campbell (1957) and Cook and Campbell (1979) writings on quasi-experiments. Our formal treatment of the problem using do-calculus has reduced it to a puzzle in logic and graph theory (see http://ftp.cs.ucla.edu/pub/stat_ser/r402.pdf). Bareinboim has further given this puzzle a complete algorithmic solution.

I said it is a gem because solving any problem instance gives me as much pleasure as solving a puzzle in ancient Greek geometry. It is in fact more fun than solving geometry problems, for two reasons.

First, when you stare at any external validity problem you do not have a clue whether it has or does not have a solution (i.e., whether an externally valid estimate exists or not) yet after a few steps of analysis — Eureka — the answer shines at you with clarity and says: “how could you have missed me?”. It is like communicating secretly with the oracle of Delphi, who whispers in your ears: “trisecting an angle?” forget it; “trisecting a line segment?” I will show you how. A miracle!

Second, while geometrical construction problems reside in the province of recreational mathematics, external validity is a serious matter; it has practical ramifications in every branch of science.

My invitation to readers of this blog: Anyone with intellectual curiosity and a thrill for mathematical discovery, please join us in the excitement over the mathematical solution of the external validity problem. Try it, and please send us your impressions.

It is hard for me to predict when scientists who critically need solutions to real-life extrapolation problems would come to recognize that an elegant and complete solution now exists for them. Most of these scientists (e.g., Campbell’s disciples) do not read graphs and cannot therefore heed my invitation. Locked in a graph-deprived vocabulary, they are left to struggle with meta-analytic techniques or opaque re-calibration routines (see http://ftp.cs.ucla.edu/pub/stat_ser/r452-reprint.pdf) waiting perhaps for a more appealing invitation to discover the availability of a solution to their problems.

It will be interesting to see how long it would take, in the age of internet.