I am answering the question that I posed above.

Yes, the probability function Pr(W,X, Y(0), Y(1)) does exist.

The fact that we have not seen it represented explicitly does not mean that it does not exist as an abstract

mathematical object, postulated for the purpose of maintaining coherence among properties, such as ignorability,

that are needed to justify the use of available statistical methods.

A simple proof that Pr(*) exists is that we can derive it, or its needed properties,

from structural equations (using the First Law, see eq. (1)) and be assured that those

derived properties cohere, as though they came directly from some Pr(*). This is nice.

What is still a puzzle to me is why authors who revere Pr(*) as “the science” (with or w/o quotes)

do not rejoice in glee at this capability of structural equations to represent “the science” so

compactly and meaningfully, and why they shun this capability with such zeal.

I have my own explanation for this puzzle, but I would rather leave it to future historians of statistics

to analyze and be mused by.

My agenda for the next week or so is to return to the miracle of the First Law and share with readers the

clarity and unification that shine from its wrinkles.

Judea

]]>Three brief comments

1. If I understand you correctly, you seem to be saying that, in order to move from Science-1 (classical econometrics)

to Science-2 (Potential Outcomes) all one needs to do is go over the classical expressions for expected-utility-maximization and change the action index of the utility term (U_x) to a potential-outcome parenthesis U(L). Easy! And, if this is so,

I would gladly join Science-2 (if deemed eligible). But, then, I do not understand why you and Rubin write chapters on the

advantages of Science-2 over Science-1; they seem identical save for a minor change in parenthesis.

2. In non-parametric settings, it does not matter if an agent maximizes her expected utility or minimizes

that utility or just obeys instructions. All that matters is that the agent responds to certain signals

and not to others, the response function itself need no be specified. Therefore, your equation

L^obs_i=argmax_L {p Q_i(L)-w_i L} might as well be written L^obs_i= f_i(p, W_i); the L is maximized over.

3. We still did not see a single example of Science-2; Pr(X,W,Y(0), Y(2)).

Does this probability function exist?

Trying to keep this discussion focused (on Pr),

Judea

This discussion is rapidly losing focus again, like the discussion in the previous thread. I will make one last attempt to clarify things.

In my previous comment I wrote that in Kitagawa’s example the observed labor input L^obs_i was determined by the key equation

L^obs_i=argmax_L {p Q_i(L)-w_i L}

Thus, in a characterization that makes perfect sense to economists, the level of the input is choosen to maximize profits p Q_i(L)-w_i L. The profits depend on the production function Q_i(L), which is the set of potential values for production as a function of the labor input. In other words, the realized value for the input depends on the full set of potential outcomes, that is, the Q_i(L), for all values of L, not just on the observed value Q_i(L^obs_i).

You write that in “Science-1 are equations expressed in terms of OBSERVABLES, X, Y, Z, W.” (as opposed to being expression in terms of potential outcomes). Your Science-1 definition clearly does not fit the above equation characterizing the value of the labor input, so your claim that my example supports your claim that “to specify a problem one needs to resort to structural equations, namely, to Science-1” rather than potential outcomes, makes so little sense that it is unlikely to convince the informed readers of your blog.

Sincerely,

Guido Imbens

The example you posted further supports my claim that it is cognitively impossible to work with Science-2,

(namely Pr(X,W,Y(0), Y(1))) and that, to specify a problem one needs to resort to structural equations,

namely, to Science-1.

Kitagawa clearly recognized this fact, when he said: “In econometrics terminology, equation (1.2) [in the

paper] is interpreted as a structural equation in the sense that it can generate any counterfactual outcomes of unit i

with respect to any manipulations of x.”

In other words, we can take ANY textbook structural equation y = f(x,u) , put an “(x)” after the y, then read it:

Y(x) = f(x,u). Lo and behold, as if by miracle, we obtained the potential outcome Y(x). This is indeed part of the

The First Law, for the single equation case.

The First Law miracle goes a bit further, guaranteeing that we can generate the entirety of Science-2

Pr(X,W,Y(0), Y(1)) from Science-1. But, at this point, it suffices to note that your example does not

start with Science 2, but in the language of Science-1. That a researcher may choose not to write W_i

(production activity) explicitly, but absorb it into U, does not negate the fact that

the equation is structural, namely, all its components are observables; the potential outcome

Y(x) is not IN the equation but is derived FROM the equation, precisely as dictated by the First Law.

Where are we now?

To be convinced that the potential outcome science (namely, Pr(X,W,Y(0), Y(1))) really

acts like a “Science”, namely, a mathematical object that represents a researcher’s perception of

reality, we need to choose a problem that can be presented and solved in Science-2 without borrowing

equations from Science-1.

May I suggest that we start with the IV setting, a setting that we all know fairly well, present it in

Science-2 language and we can then compare Science-1 to Science-2 on various dimensions of comparison.

Any other problem would do as well, but it needs to be presented in Science-2 language.

The distinction between Science-1 and Science-2 was made crispy clear by Don Rubin; Science-2 is

Pr(X,W,Y(0), Y(1))) and Science-1 are equations expressed in terms of OBSERVABLES, X, Y, Z, W…

as in classical econometric texts, where Y(0) and Y(1) do not appear explicitly, but are replaced with

“error terms”, “shocks” “disturbances” “omitted factors” “latent drivers” “exogenous variables”,

it terms of which economists encode what they know about the world.

We know a lot about Science-1, can you show us how to start a problem with Science-2?

Judea

Happy to oblige. Let me take a classic example from the comment by Toru Kitagawa on my Statistical Science paper that we discussed in the earlier thread (the comment, which I highly recommend, is also published in Statistical Science). Toru considers the “classical problem of estimation of a production function. Q denotes the quantity of a homogeneous good produced and L is a measure of an input used. For simplicity let us consider only a labor input (e.g., total hours worked by the employees.” In addition to the quantity produced Q_i and the labor input L_i we also observe the wage rate w_i that firm i faces for a number of firms, indexed by i running from 1 to N.

In line with my comments about my preferences for potential outcomes, Kitagawa does not start by specifying a model for the three variables (Q_i, L_i, w_i). He does not say why, but my guess would be that this would be very difficult, and unnatural for an economist.

Instead Kitagawa starts with the production function Q_i(L) which describes the potential outcomes for quantity produced as a function of the labor input. He writes down a model for these potential outcomes. Specifically, Kitagawa writes: “let us assume that the production technology of firm i is given by the following function: Q_i(L)=exp(b0+a_i)L^b1,” followed by: “This equation can be indeed interpreted as the causal relationship between output and input in the production process of firm i.” a_i here represents unobserved differences between the firms, e.g., the quality of the management or the fixed capital, for example soil quality if the firms were farms. This particular model is of course very simple, but we often have credible assumptions about the relation between the potential outcomes. For example, in the production function example it is generally reasonable to assume that the production function is monotone in its inputs. (Elias and Bryant in the earlier discussion argued there was no place for such assumption in their early “nonparametric’’ analysis – in contrast such assumptions are viewed as very natural in many economics settings.) These assumptions are part of what Don and I meant when we referred to “the science’’ (lowercase please!)

So, the starting point is this set of potential outcomes, Q_i(L). Kitagawa then considers the decision by the firm regarding the quantity of the labor input, and proposes that this decision follows the rule

L_i=arg max_L {p Q_i(L)-w_iL}

where p is the price of the good. In words, the firms choose the labor input to maximize profits. The value of the labor input variable is determined by the full set of potential outcomes and other variables. One may want to consider alternative decision rules by the firm, but this is a common one in economics. Unconfoundedness here would correspond to L_i being independent of all the Q_i(L) conditional on other stuff, but that would correspond to a very unusual and inefficient firm that would be unlikely to survive for long in a competitive environment.

This is precisely what I mean that it is easier or more natural for me, and I think for many economists, to think of a model for the potential outcomes than for the realized values. It would be difficult to specify directly the link between the observed labor input and the quantity produced, because the choice for the labor input depends on the entire set of potential outcomes. Although in many econometrics textbooks the potential outcomes are not explicitly introduced, they are explicit in the economic theory texts that all economists are exposed to and therefore resonate well with us.

Of course this way of thinking about these problems may be more natural in economics than in, say, epidemiology, and this is why I disagreed with Judea’s blithe dismissal of differences between the disciplines when he wrote that “ Or, are problems in economics different from those in epidemiology? I have examined the structure of typical problems in the two fields, the number of variables involved, the types of data available, and the nature of the research questions. The problems are strikingly similar.” No, they are not!

Sincerely,

guido

I think we are finally converging towards a substantive discussion of the real issue. We have two concepts

of “science” which are now displayed before us explicitly. Let us call them Science-1 and Science-2.

In Science-1 we have 2 or 3 structural equations, like

Z=h(U1)

X=f(Z,U2)

Y=g(X,U3)

In Science-2 we have the joint distribution of X, Z and the potential outcomes Y(0) Y(1):

Pr(X,Y(1), Y(0), Z)

I have already posted two pages on why Science-1 is computationally and cognitively more suitable for

causal inference and, by extension, more suitable to start econometric textbooks with.

You now have a golden opportunity to leverage the level of concreteness that we have achieved and show

why you “prefer starting with the potential outcomes and think about the joint distribution of

the potential outcomes and X.” (quoted from you last posting).

For example, you can tell us how you represent Pr() formally (if you do) or, if you do not represent it

explicitly, how you use a mental representation of it to decide on its properties, for example,

whether the following ignorability conditions hold in Pr():

X_||_{Y(0),Y(1)}

X_||_{Y(0),Y(1)} | Z

Z_||_{Y(0),Y(1)} | X

Z_||_{Y(0),Y(1)}

Again, I hope you would tell us WHY you prefer one science over the other, not merely that you happened

to prefer, or that “I think that is easier and less likely to lead to mistakes” [your quote]. Moreover, if

you want to invoke “agents making choices/decision based on (perception of) potential outcomes”, go ahead,

add those agents to Science-2 and proceed. But, eventually, we need to hear how you reason about Pr(),

and how you go about confirming or dis-confirming ignorability conditions such as those above,

because no inference can proceed without such conditions.

The floor is yours,

Judea

“I am sure one day you will appreciate the substance too.’’ I was always under the impression that even if I did not always agree with computer scientists on causality, that at least they were good at prediction. Now I am not even sure about that anymore!

I do think you need to be honest about your writing and that it is intended to be subtly mock people you disagree with. Even now you write that Don and I label the joint distribution “The Science.’’ I don’t know where you get that from. In our book we write about “the science.’’ Capitalizing “science’’ changes the tone. It is not capitalized in the book, and that is done for a reason. It’s not my style. Similarly I don’t personally like statements like “The First Law of Causal Inference’’ and agree with Hernando that it rubs people the wrong way. Being enthusiastic about your own work is one thing (and a good thing!). Accusing others of “doing harm to their students’’ is not necessary, and if you do so you should not be surprised if people take offense.

Re the last part. You write to Hernando “ just put down the equations that connect X, W and Y (all are observables)’’ Therein lies the rub. I think it is hard to write down that set of equations (“no need to sweat’’???) and prefer starting with the potential outcomes and think about the joint distribution of the potential outcomes and X. I think that is easier and less likely to lead to mistakes. In economics we often think about agents making choices/decisions based on (perceptions of) potential outcomes, which leads naturally to those formulations. Again, in my world view many roads lead to Rome, and if you want to do things differently that is fine, but I do not find your road as sweat-free as you make it out to be.

Sincerely,

Guido imbens