Comments on: Simpson’s Paradox: The riddle that would not die. (Comments on four recent papers)

By: judea pearl

judea pearl — Sat, 30 May 2020 08:58:37 +0000

My attention was called recently to another paper on Simpson’s paradox, linked here:
https://arxiv.org/abs/1712.08946

I posted two comments on Twitter:
5.25.2020 2:10pm – (Replying to @scmbradley @fitelson and @olehjortland) I havn’t seen it, thanks for the pointer. But considering that it was authored at Harvard-Stat, my guts tell me to expect the same consistent denial of the C-word as we have seen before (see https://ucla.in/2QYxKyY), namely, round-about denial of the logic of human thought.

5.25.2020 10:22pm – (Replying to @yudapearl @scmbradley and 2 others) I was right! Gong and Meng totally miss the paradoxical element of Simpson’s paradox. They attribute the surprise to incomplete specification of a probability function, forgetting that even with complete specification the surprise comes and goes depending on the causal structure of background STORY.

I am including these comments on this blog for two reasons. First, the paper by Gong and Meng belongs in this thread to inform those who wonder if Xiao-LI Meng has changed his mind since his causality-free paper of 2014 (see above). Second, the paper provides a clear demonstration of the mental barriers that statisticians can’t cross, and why it would take another generation for statistics to snap out of its data-centric mindcuffs.
The same concerns apply to “Data Science” as a discipline, though some data-science center embrace one or two causal inference researchers.

By: ojm

ojm — Wed, 08 Mar 2017 08:50:56 +0000

Hi Judea,

to very belatedly follow up on my comments/questions about notation above I have sketched a little elaboration here (as well as in the preceding post):

https://omaclaren.com/2017/03/02/converting-sample-statistics-to-parameters-passive-observation-vs-active-control/

If you did happen to read this I’d be very interested in any comments you had.

By: Judea Pearl

Judea Pearl — Sun, 18 Sep 2016 11:20:20 +0000

[Repeating article addendum, above]

Dear all,

My discussion of the four Simpson’s papers would be
incomplete without mentioning another paper,
which represents the thinking within the potential outcome
camp. The paper in question is “A Fruitful Resolution
to Simpson’s Paradox via Multiresolution Inference,”
by Keli Liu and Xiao-Li Meng (2014) ,
http://amstat.tandfonline.com/doi/pdf=/10.1080/00031305.2014.876842
which appeared in the same issue of Statistical Science as my
“Understanding Simpson’s Paradox”
http://ftp.cs.ucla.edu/pub/stat_ser/r414-reprint.pdf.

The intruiging feature of Liu and Meng’s paper is that they, too, do
not see any connection to causality. In their words:
“Peeling away the [Simpson’s] paradox is as easy (or hard) as avoiding a
comparison of apples and oranges, a concept requiring no
mention of causality” p.17
And again: ” The central issues of Simpson’s paradox can be addressed
adequately without necessarily invoking causality.” (p. 18)

Two comments:
1. Liu and Meng fail to see that the distinction between apples and
oranges must be made with causal considerations in mind — statistical
criteria alone cannot help us avoid a comparison of apples and oranges.
This has been shown again and again, even by Simpson himself.

2. Liu and Meng do not endorse the resolution offered by causal
modeling and, as a result, they end up with the wrong conclusion.
Quoting:
“Simpson’s Warning: less conditioning is most likely to lead to
serious bias when Simpson’s Paradox appears.” (p. 17).
Again, Simpson himself brings an example where
conditioning leads to more bias, not less.

Thus, in contrast to the data-only economists (Spanos), the
potential-outcome camp does not object to causal reasoning per-se,
this is their specialty. What they object to are attempts to
resolve Simpson’s paradox formally and completely, namely,
explicate formally what the differences are between
“apples and oranges” and deal squarely with the decision problem:
“What to do in case of reversal.”.

Why are they resisting the complete solution? Because
(and this is a speculation) the complete solution
requires graphical tools and we all know the attitude of
potential-outcome enthusiasts towards
graphs. We dealt with this cultural pecularity before so,
at this point, we should just add Simpson’s paradox
to their list of challenges, and resign humbly to the slow pace
with which Kuhn’s paradigms are shifting.

Judea

By: ojm

ojm — Tue, 13 Sep 2016 15:34:48 +0000

Certainly. This reminds me of the history of the definitions of concepts like ‘function’, ‘continuous function’, ‘limit’ etc and the interplay between mathematicians, engineers and physicists.

Eg historically some mathematicians (Euler I think for example, can’t remember) effectively used ‘function’ synonymously with ‘curve that can be drawn without lifting the pen’, which is obviously inadequate in the modern sense, but he was still one of the greatest mathematicians.

And the classic example of the Dirac delta (not a) function. And infinitesimals (now back by popular demand with increased rigour).

Coming from mathematics and engineering I have often been confused by the many apparently contradictory senses in which statisticians of different types use terms like ‘model’, ‘parameter’ etc etc. I now have to first ask exactly how they are using these terms in order to understand their views better.

So of course there is much value to be gained from formalising notions properly and making careful distinctions. There is also much value, I think, in understanding which informal concepts are currently useful and how they might be better formalised.

Anyway, thanks for answering my questions.

By: Judea Pearl

Judea Pearl — Tue, 13 Sep 2016 08:55:08 +0000

Hi OJM,
True, some statisticians call every model “statistical” or, more accurately, everything they do
is “statistical”. But I think there is great merit in distinguishing formally between models that statisticians
have actually been using since Karl Pearson (1890), as reflected in the homeworks assigned in 20th Century
statistics textbooks, and other models, which were meticulously excluded from those textbooks, and which people wish
to label “statistical” nowadays, perhaps because it is becoming fashionable. This tendency to become
inclusive is very healthy, but not before they fix their textbooks, else even astrology would become a “statistical
model” by assigning priors to unicorns.
Judea

By: ojm

ojm — Mon, 12 Sep 2016 12:28:59 +0000

Hi Judea,
Thanks for the response. I agree that the ‘hardcore’ statisticians and (especially it seems, for some reason) econometricians use ‘statistical model’ in your sense. And this is the standard/traditional definition of course.

It seems, however, that others use the term in a looser sense. For example my impression is that Andrew Gelman uses the term in a looser sense which includes many notions you would place outside of the definition of ‘statistical model’. For example in specifying logical or functional dependencies on or between unobserved/unobservable quantities.

By: Judea Pearl

Judea Pearl — Sun, 11 Sep 2016 23:57:15 +0000

Hi OJM,
– What do you mean by ‘statistical model’?
— What do you mean by ‘parameter’ in a statistical model?
By ‘statistical model’ I mean a mathematical object that statisticians defined
as ‘a specification of the probability distribution for a set of observations.”
The syntactic representation of the object does not matter, only the interpretation counts.
The parameters may represent unobservables or unobserved variables, or, they can be just
indices used to distinguish one distribution from another, etc. What matters is what
the object means and, according to all the gurus of statistical philosophy, what
it means is “a specification of the probability distribution for a set of observations.”
Namely, two statistical models are deemed “equivalent” if they specify the same
distribution for a set of observations.

Typical example, y=ax+eps is equivalent to x=by+eps* when X and Y are the observed
variables.

[Note, most statisticians distinguish between specifying ONE distribution vs.
a SET of distributions, as Sargan did. But all of them see the distribution(s) to
be the final judge(s) of the meaning of the model]
[BTW, the parameters are functionals of a probability distribution only when they are
identifiable, so this is not a general requirement]
— As to the distinction between | and ‘|’, this takes us to the old Bayes vs frequentist
debate, which is orthogonal to the statistical vs causal distinction that is at the heart
of Simpson’s paradox.

By: ojm

ojm — Fri, 02 Sep 2016 01:13:07 +0000

Hi Judea,

I’m wondering if you could clarify a couple of things related to this post for me.

Specifically

– What do you mean by ‘statistical model’?
For example, to you must a statistical model refer only to *observable/observed* quantities or can it refer to unobserved or unobservable quantities?

– What do you mean by ‘parameter’ in a statistical model?
Do you distinguish these from observable variables? Do they just label models or model components? Or are they e.g. functionals of a probability distribution (over observables?)?

– Do you see a relation between your approach and the Likelihoodist/sometimes Frequentist distinction captured in notation like p(y|x;theta) where ‘x’ is a random variable and ‘theta’ is a parameter and where ‘|’ represents probability conditioning of y on x while ‘;’ captures logical or functional dependence of the distribution p(y|x) on a parameter theta? (This notation also goes back a long way but is seldom made explicit).