https://arxiv.org/abs/1712.08946

I posted two comments on Twitter:

5.25.2020 2:10pm – (Replying to @scmbradley @fitelson and @olehjortland) I havn’t seen it, thanks for the pointer. But considering that it was authored at Harvard-Stat, my guts tell me to expect the same consistent denial of the C-word as we have seen before (see https://ucla.in/2QYxKyY), namely, round-about denial of the logic of human thought.

5.25.2020 10:22pm – (Replying to @yudapearl @scmbradley and 2 others) I was right! Gong and Meng totally miss the paradoxical element of Simpson’s paradox. They attribute the surprise to incomplete specification of a probability function, forgetting that even with complete specification the surprise comes and goes depending on the causal structure of background STORY.

I am including these comments on this blog for two reasons. First, the paper by Gong and Meng belongs in this thread to inform those who wonder if Xiao-LI Meng has changed his mind since his causality-free paper of 2014 (see above). Second, the paper provides a clear demonstration of the mental barriers that statisticians can’t cross, and why it would take another generation for statistics to snap out of its data-centric mindcuffs.

The same concerns apply to “Data Science” as a discipline, though some data-science center embrace one or two causal inference researchers.

to very belatedly follow up on my comments/questions about notation above I have sketched a little elaboration here (as well as in the preceding post):

If you did happen to read this I’d be very interested in any comments you had.

]]>Dear all,

My discussion of the four Simpson’s papers would be

incomplete without mentioning another paper,

which represents the thinking within the potential outcome

camp. The paper in question is “A Fruitful Resolution

to Simpson’s Paradox via Multiresolution Inference,”

by Keli Liu and Xiao-Li Meng (2014) ,

http://amstat.tandfonline.com/doi/pdf=/10.1080/00031305.2014.876842

which appeared in the same issue of Statistical Science as my

“Understanding Simpson’s Paradox”

http://ftp.cs.ucla.edu/pub/stat_ser/r414-reprint.pdf.

The intruiging feature of Liu and Meng’s paper is that they, too, do

not see any connection to causality. In their words:

“Peeling away the [Simpson’s] paradox is as easy (or hard) as avoiding a

comparison of apples and oranges, a concept requiring no

mention of causality” p.17

And again: ” The central issues of Simpson’s paradox can be addressed

adequately without necessarily invoking causality.” (p. 18)

Two comments:

1. Liu and Meng fail to see that the distinction between apples and

oranges must be made with causal considerations in mind — statistical

criteria alone cannot help us avoid a comparison of apples and oranges.

This has been shown again and again, even by Simpson himself.

2. Liu and Meng do not endorse the resolution offered by causal

modeling and, as a result, they end up with the wrong conclusion.

Quoting:

“Simpson’s Warning: less conditioning is most likely to lead to

serious bias when Simpson’s Paradox appears.” (p. 17).

Again, Simpson himself brings an example where

conditioning leads to more bias, not less.

Thus, in contrast to the data-only economists (Spanos), the

potential-outcome camp does not object to causal reasoning per-se,

this is their specialty. What they object to are attempts to

resolve Simpson’s paradox formally and completely, namely,

explicate formally what the differences are between

“apples and oranges” and deal squarely with the decision problem:

“What to do in case of reversal.”.

Why are they resisting the complete solution? Because

(and this is a speculation) the complete solution

requires graphical tools and we all know the attitude of

potential-outcome enthusiasts towards

graphs. We dealt with this cultural pecularity before so,

at this point, we should just add Simpson’s paradox

to their list of challenges, and resign humbly to the slow pace

with which Kuhn’s paradigms are shifting.

Judea

]]>Eg historically some mathematicians (Euler I think for example, can’t remember) effectively used ‘function’ synonymously with ‘curve that can be drawn without lifting the pen’, which is obviously inadequate in the modern sense, but he was still one of the greatest mathematicians.

And the classic example of the Dirac delta (not a) function. And infinitesimals (now back by popular demand with increased rigour).

Coming from mathematics and engineering I have often been confused by the many apparently contradictory senses in which statisticians of different types use terms like ‘model’, ‘parameter’ etc etc. I now have to first ask exactly how they are using these terms in order to understand their views better.

So of course there is much value to be gained from formalising notions properly and making careful distinctions. There is also much value, I think, in understanding which informal concepts are currently useful and how they might be better formalised.

Anyway, thanks for answering my questions.

]]>True, some statisticians call every model “statistical” or, more accurately, everything they do

is “statistical”. But I think there is great merit in distinguishing formally between models that statisticians

have actually been using since Karl Pearson (1890), as reflected in the homeworks assigned in 20th Century

statistics textbooks, and other models, which were meticulously excluded from those textbooks, and which people wish

to label “statistical” nowadays, perhaps because it is becoming fashionable. This tendency to become

inclusive is very healthy, but not before they fix their textbooks, else even astrology would become a “statistical

model” by assigning priors to unicorns.

Judea ]]>

Thanks for the response. I agree that the ‘hardcore’ statisticians and (especially it seems, for some reason) econometricians use ‘statistical model’ in your sense. And this is the standard/traditional definition of course.

It seems, however, that others use the term in a looser sense. For example my impression is that Andrew Gelman uses the term in a looser sense which includes many notions you would place outside of the definition of ‘statistical model’. For example in specifying logical or functional dependencies on or between unobserved/unobservable quantities.

]]>– What do you mean by ‘statistical model’?

— What do you mean by ‘parameter’ in a statistical model?

By ‘statistical model’ I mean a mathematical object that statisticians defined

as ‘a specification of the probability distribution for a set of observations.”

The syntactic representation of the object does not matter, only the interpretation counts.

The parameters may represent unobservables or unobserved variables, or, they can be just

indices used to distinguish one distribution from another, etc. What matters is what

the object means and, according to all the gurus of statistical philosophy, what

it means is “a specification of the probability distribution for a set of observations.”

Namely, two statistical models are deemed “equivalent” if they specify the same

distribution for a set of observations.

Typical example, y=ax+eps is equivalent to x=by+eps* when X and Y are the observed

variables.

[Note, most statisticians distinguish between specifying ONE distribution vs.

a SET of distributions, as Sargan did. But all of them see the distribution(s) to

be the final judge(s) of the meaning of the model]

[BTW, the parameters are functionals of a probability distribution only when they are

identifiable, so this is not a general requirement]

— As to the distinction between | and ‘|’, this takes us to the old Bayes vs frequentist

debate, which is orthogonal to the statistical vs causal distinction that is at the heart

of Simpson’s paradox.

I’m wondering if you could clarify a couple of things related to this post for me.

Specifically

– What do you mean by ‘statistical model’?

For example, to you must a statistical model refer only to *observable/observed* quantities or can it refer to unobserved or unobservable quantities?

– What do you mean by ‘parameter’ in a statistical model?

Do you distinguish these from observable variables? Do they just label models or model components? Or are they e.g. functionals of a probability distribution (over observables?)?

– Do you see a relation between your approach and the Likelihoodist/sometimes Frequentist distinction captured in notation like p(y|x;theta) where ‘x’ is a random variable and ‘theta’ is a parameter and where ‘|’ represents probability conditioning of y on x while ‘;’ captures logical or functional dependence of the distribution p(y|x) on a parameter theta? (This notation also goes back a long way but is seldom made explicit).

]]>