Causal Analysis in Theory and Practice

May 6, 2015

David Freedman, Statistics, and Structural Equation Models

Filed under: Causal Effect,Counterfactual,Definition,structural equations — moderator @ 12:40 am

(Re-edited: 5/6/15, 4 pm)

Michael A Lewis (Hunter College) sent us the following query:

Dear Judea,
I was reading a book by the late statistician David Freedman and in it he uses the term “response schedule” to refer to an equation which represents a causal relationship between variables. It appears that he’s using that term as a synonym for “structural equation” the one you use. In your view, am I correct in regarding these as synonyms? Also, Freedman seemed to be of the belief that response schedules only make sense if the causal variable can be regarded as amenable to manipulation. So variables like race, gender, maybe even socioeconomic status, etc. cannot sensibly be regarded as causes since they can’t be manipulated. I’m wondering what your view is of this manipulation perspective.
Michael


My answer is: Yes. Freedman’s “response schedule” is a synonym for “structural equation.” The reason why Freedman did not say so explicitly has to do with his long and rather bumpy journey from statistical to causal thinking. Freedman, like most statisticians in the 1980’s could not make sense of the Structural Equation Models (SEM) that social scientists (e.g., Duncan) and econometricians (e.g., Goldberger) have adopted for representing causal relations. As a result, he criticized and ridiculed this enterprise relentlessly. In his (1987) paper “As others see us,” for example, he went as far as “proving” that the entire enterprise is grounded in logical contradictions. The fact that SEM researchers at that time could not defend their enterprise effectively (they were as confused about SEM as statisticians — judging by the way they responded to his paper) only intensified Freedman criticism. It continued well into the 1990’s, with renewed attacks on anything connected with causality, including the causal search program of Spirtes, Glymour and Scheines.

I have had a long and friendly correspondence with Freedman since 1993 and, going over a file of over 200 emails, it appears that it was around 1994 when he began to convert to causal thinking. First through the do-operator (by his own admission) and, later, by realizing that structural equations offer a neat way of encoding counterfactuals.

I speculate that the reason Freedman could not say plainly that causality is based on structural equations was that it would have been too hard for him to admit that he was in error criticizing a model that he misunderstood, and, that is so simple to understand. This oversight was not entirely his fault; for someone trying to understand the world from a statistical view point, structural equations do not make any sense; the asymmetric nature of the equations and those slippery “error terms” stand outside the prism of the statistical paradigm. Indeed, even today, very few statisticians feel comfortable in the company of structural equations. (How many statistics textbooks do we know that discuss structural equations?)

So, what do you do when you come to realize that a concept you ridiculed for 20 years is the key to understanding causation? Freedman decided not to say “I erred”, but to argue that the concept was not rigorous enough for statisticians to understood. He thus formalized “response schedule” and treated it as a novel mathematical object. The fact is, however, that if we strip “response schedule” from its superlatives, we find that it is just what you and I call a “function”. i.e., a mapping between the states of one variable onto the states of another. Some of Freedman’s disciples are admiring this invention (See R. Berk’s 2004 book on regression) but most people that I know just look at it and say: This is what a structural equation is.

The story of David Freedman is the story of statistical science itself and the painful journey the field has taken through the causal reformation. Starting with the structural equations of Sewal Wright (1921), and going through Freedman’s “response schedule”, the field still can’t swallow the fundamental building block of scientific thinking, in which Nature is encoded as a society of sensing and responding variables. Funny, econometrics is yet to start its reformation, though it has been housing SEM since Haavelmo (1943). (How many econometrics textbooks do we know which teach students how to read counterfactuals from structural equations?).


I now go to your second question, concerning the mantra “no causation without manipulation.” I do not believe anyone takes this slogan as a restriction nowadays, including its authors, Holland and Rubin. It will remain a relic of an era when statisticians tried to define causation with the only mental tool available to them: the randomized controlled trial (RCT).

I summed it up in Causality, 2009, p. 361: “To suppress talk about how gender causes the many biological, social, and psychological distinctions between males an females is to suppress 90% of our knowledge about gender differences”

I further elaborated on this issue in (Bollen and Pearl 2014 p. 313) saying:

Pearl (2011) further shows that this restriction has led to harmful consequence by forcing investigators to compromise their research questions only to avoid the manipulability restriction. The essential ingredient of causation, as argued in Pearl (2009: 361), is responsiveness, namely, the capacity of some variables to respond to variations in other variables, regardless of how those variations came about.”

In (Causality 2009 p. 361) I also find this paragraph: “It is for that reason, perhaps, that scientists invented counterfactuals; it permit them to state and conceive the realization of antecedent conditions without specifying the physical means by which these conditions are established;”

All in all, you have touched on one of the most fascinating chapters in the history of science, featuring a respectable scientific community that clings desperately to an outdated dogma, while resisting, adamantly, the light that shines around it. This chapter deserves a major headline in Kuhn’s book on scientific revolutions. As I once wrote: “It is easier to teach Copernicus in the Vatican than discuss causation with a statistician.” But this was in the 1990’s, before causal inference became fashionable. Today, after a vicious 100-year war of reformation, things are begining to change (See http://www.nasonline.org/programs/sackler-colloquia/completed_colloquia/Big-data.html). I hope your upcoming book further accelerates the transition.

2 Comments »

  1. In a private email that I received, I was reminded that some people think that structural equation models are based on manipulations. Specifically, the perception is that in positing a structural equation model we imply that every variable on the graph has associated with it well defined counterfactuals that are obtained by setting that variable to specific values, and that assumes, so the argument goes, that all variables on the graph can be intervened on .
    ————-

    This is not the case. The structural equation model that we deal with says nothing about intervention. It is merely a society of sensors and responders. Each member of this society (a variable) can sense the state of others and has the capacity to respond to others. All responses are specified passively: If you see xxxx, respond with yyyy. No intervention whatsoever. From this system of specifications we can, if we want, read counterfactuals (using the First Law of causal inference), like we read partial derivatives from a set of equations. Again, no intervention involved. If it so happens that we have a policy in mind, and we believe that implementing that policy modifies the environment in a way that corresponds to one of those counterfactuals, we rejoice and say: ‘The causal effect of this policy is none other but the counterfactual that we computed earlier without thinking about the policy. If it so happens that the policy in mind does not corresponds to any of the computed counterfactuals, but to some mathematical combination of counterfactuals, we rejoice twice and use the combination. If it does not correspond to any combination, we say; “Sorry, the policy in question is not well defined, namely, we cannot predict the effect of this policy with the model at hand. But counterfactuals exist without the policy.

    For example, the sentence “Joe would be alive today if it were not for yesterday’s volcano eruption” is well defined, though I have no idea how to manipulate volcanos, nor can I imagine such manipulations. The sentence evaluates a counterfactual, not a policy. Related example: “Joe would be alive today had our prayers been answered” This sentence refers to a specific policy (ie praying) and it is subject to disputations; some people may pray for God to keep the volcano peaceful, others may pray to keep the volcano peaceful with a tiny side effect, that Joe dies in his sleep. Still others may doubt the power of prayers to prevent volcano eruptions. To settle these differences we need to augment the model with prayer-related information, not with information about how humans behave under volcano eruptions.

    Conclusion: I do not identify causation with manipulation, but with counterfactuals, which can be read from structural models by a well defined mathematical operation which sometimes matches a policy in question and sometimes does not. The counterfactual exists there whether or not we have a policy in mind. Some interventions can unearth properties of counterfactuals, this is what empirical science is all about, but Ohm’s Law exists in nature regardless of whether we manipulate the current and measure the voltage. The electrons obey the same laws regardless of what the experimenter does.

    I touched on these issues in my response to Dawid’s 2000 article http://ftp.cs.ucla.edu/pub/stat_ser/R269.pdf as well as in my comment on the consistency rule http://ftp.cs.ucla.edu/pub/stat_ser/r358.pdf

    So, anyone who believes that “no causation without manipulation” should be taken seriously should tell us how this restriction helps us assign truth value to the sentence: “Joe would have been alive if it were not for yesterday’s volcano eruption”.
    Judea

    Comment by judea pearl — May 7, 2015 @ 9:56 pm

  2. […] second part of our latest post “David Freedman, Statistics, and Structural Equation Models” (May 6, 2015) has stimulated a lively email discussion among colleagues from several disciplines. […]

    Pingback by Causal Analysis in Theory and Practice » Causation without Manipulation — May 14, 2015 @ 8:19 pm

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress