You got the point, it was exactly the question this blog has started with. The only answer I can propose (for a simple situation like “Does X causally affect Y or not?”, without more complicated causal graph) is either to use kind of empirical analysis at the end of the article https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2984045 or try something from this book https://mitpress.mit.edu/books/elements-causal-inference, where authors, in the beginning, apply two regression (X to Y and Y to X) to detect the directionality of the relationship. I may add, that I’m going to publish the results of the new experiments soon, which will contain some more practical recommendations about separating causal from non-causal relations.

]]>I don’t know how to add a separate post here, so I’m posting the question in this comment.

If the admin thinks appropriate, pls help move this comment to a separate post. Thanks for the help!

There is a seemingly fundamental question in causal inference.

Pls forgive my ignorance. Any answer / suggestion / reference / link would be helpful!

==============================

Question:

To decide whether X causes Y, one has to use the do-calculus to compute the average causal effect P(Y|do(X=1))-P(Y|do(X=0)). To use the do-calculus, one has to know the causal graph. But this becomes a chicken-or-egg question: If we know the causal graph, we already know X causes Y or not from the graph. So how to decide whether X causes Y?

]]>Could you explain me how to form a DAG of a negative expression?

like in “A=B*-C+B*-C”

how will “-C” be represented? ]]>

Igor

]]>Causal discovery algorithms are a big part of structural causal models/DAG literature, you can get started on it on Judea’s book itself — Chapter 2 of Causality.

The point here is just that any algorithm of causal discovery *must* impose *causal assumptions*.

Just a correction about your understanding of Scholkopf et al — their work *is* grounded on the DAG/SCM framework, you can check their new book below (with free download):

https://mitpress.mit.edu/books/elements-causal-inference

Best, Carlos

Best, Carlos

]]>It would take some time to explain to them that if we fail to express our query in a form of a probabilistic

estimand, we can forget about estimating it from data.

[In other words, if we cannot estimate our query from an infinite sample, we surely cannot estimate it from finite samples].

It is axiomatic to us, true, but not all researchers come to the problem with the same background.

Judea

Igor ]]>

I have a hard time understanding your proposal. So I will let other readers respond, if they can.

My main problem are sentences such as: “If we make regression estimation of the influence of S1 and X2 to Y”

I dont know of any method that can do that. What you probably mean is: If we thoughtlessly substitute the regression of Y on X2 for b , then….

But then, why do we need both X2 and X1 ? Why not just say : I have invented a method that distinguishes b = 0 from b > 0.

You seem to be claiming to have found such a method, and that it requires no extra information beside the data.

I can’t follow your referred paper, sorry, but this blog is wide open for you to expose your method in the language of modern causal inference, namely, the language or input-output, or assumptions–to-conclusions. It is an extremely effective language, so it should not take more than 3 sentences to convey the essential ideas

Friendly advice: When you describe your method, do not comment on other people’s works; it breaks the flow of logic. Focus on your proposal, and leave others to a later discussion.

Judea

]]>Let me clarify a bit my position about the distinction between two types of variables.

Let assume, we have X1 and X2, both numerical, and have also some Y=bX1 + R, where R is some unknown (not associated with any X) source for Y values. I would call b “a generative” parameter, in a sense, that via b any value of X generates the certain value of Y (1 g of carbs in a food example generates 4 calories, b=4. R, in this case, would be whatever values dictated by unobserved fats and protein; I would ignore measurement errors). X2, in turn, generates nothing, but it is correlated with Y (for whatever reasons). Parameter b, in common language, could be also called “causal”, but I would not discuss here the difference between causal and generative (I did in the article).

If we make regression estimation of the influence of X1 and X2 to Y, we’ll have Y` = b`X1 + c`X2, where Y` is an approximation of Y, b` – estimation of the real b, and c` – estimation of the real 0. The problem is: could we say, that b` estimates (whatever badly) the real b, but c` estimates (whatever good) just a zero? The difference between the two is fundamental – one generates Y, another – not, while estimates b` and c` could be both very misleading (or not).

In the article, different heuristic criteria for distinction were proposed. On many types of generated data, they distinguished coefficients like b` from those like c` successfully, but not with 100% accuracy. Fig. 15 there (which, unfortunately, I can’t technically reproduce here) shows clearly that some statistics behave very differently for b` and c`. All that is in a process of further verification and doesn’t seem hopeless. I would not be surprised if more strong formal criteria for distinction will be proposed.

And here, it looks, the main point is. You consider conditional probability like P(X, Y) as the main enemy of causality (as a representative of the damned statistical paradigm, etc., to be overcome). And, indeed, it is a huge and almost unpenetrable by alien ideas body, what your struggle for the last 30 years has shown clearly. But my modest food example (and countless other similar things) is not about P(X, Y). It is about Y being generated (in a rough physical sense) by X. In a certain sense, it is a literal (“All too literal”, as Nietzsche would say) interpretation of regression: if b stands for the link between X and Y, in each Y should be “a piece” born due to X, equal to bX (of course, this piece could be random, like in special regressions with random individualised coefficients). It looks like we address different things in a sense. But I recognize perfectly that this generative approach IS NOT the only interpretation of causality – and this is a reason, why I don’t believe, exactly like you, in a possibility of all-embracing causality theory based on data only. So, for the narrow class of causality-related problems (generative-like) I may expect the progress, but in general – no, unless another (imitational) paradigm will be fully applied.

]]>Back to the main point: “Can a DAG distinguish variables with zero causal effects (on Y) from those having non-zero effects. Of course not. no method in the world can do that without further assumption.” “It’s hard to say, ….”

I believe this is the source of our disconnect. It is actually NOT HARD to say. And the answer is plain NO; no if, no but, nor “Its hard to say etc etc.” but plain NO. I tried to explain why, writing:

We know from first principle that no causal query can be answered from data alone, without causal information that lies outside the data. “No causes in – no causes out (N. Cartwright 1993)”

But evidently my explanation was taken as a handwaving argument, not as a logical truth. So here is a hard proof.

Suppose you invent some criterion, CR, that distinguishes X->Y from X<-Y and, based on CR, you decide that X has a causal influence on Y. i.e., the arrow X<-Y is wrong. I will now construct a causal model containing the arrow X<-Y and a bunch of latent variables Z, that matches your data perfectly. I know that I can do it, because if the model's graph (on X, Y and Z) is a complete graph, we know that we can tweak the parameters of the graphs in such a way that it will perfectly fit EVERY joint probability function P(X,Y,Z), and certainly every observed marginal of P, like P(X,Y). This means that no criterion based on P(X, Y) is capable of distinguishing the model containing X->Y from the one containing X<-Y. Hence, if you have such a criterion, it must be based on information NOT in P(X,Y).
QED
True, many decent researchers who accept "correlation does not imply causation" have not yet internalized the crispness and generality of Cartwright's mantra: "No causes in - no causes out". From your hesitations: "it is hard to say, ... It is also hard to say... I still would not be that categorical; at least some attempts have been made..." I conclude that you are not convinced that the indistinguishability barrier is so tall and so impenetrable.
Well, I would invite you then to articulate carefully the criterion that you use in your linked article (which I could not parse) and apply it to the link X---->Y, and I can promise

you that I will find your criterion contaminated with some causal information, however mild and/or obscured.

The same happened in the case of faithfulness, which is a mild causal assumption, but still causal. The same is true for Scholkopf assumptions of additive noise or non-Gaussian noise. These

assumptions are imposed on the generative model, hence they are causal.

I now have the feeling that you (or perhaps others) would like to ask me: How do we distinguish between statistical and causal assumptions. Fortunately, we have a crisp Chinese Wall between the two, so we cannot be blamed for ambiguity or, God forbid, hand waving.

A statistical assumption is any assumption that can be expressed in terms of a joint distribution of observed variables.” Any assumption that cannot be expressed in terms of a joint distribution of observed variables is EXTRA-Statistical, in our context CAUSAL.” (See Causality page 38-40).

Many prominent and highly revered statisticians fumbled on this point. Some claimed that “Confounding” is a well defined statistical concepts. Others were ready to prove to me that randomization”, “instrumental variables” and so forth “have clear statistical definitions”. [Causality pp. 387-388]. I stopped getting these proposals in the past 15 years, after asking the proposers to express their definitions in terms of joint distribution of observed variables. But perhaps the time is ripe for another rehearsal. I am ready.

If you still believe that you have an assumption-free criterion for telling “a variable with zero causal effects (on Y) from those having non-zero effects”, please post it on this blog, and try to express it in terms of the joint distribution of observed variables.

Judea

]]>