On model-based vs. ad-hoc methods
A lively discussion flared up early this month on Andrew Gelman’s blog (garnering 114 comments!) which should be of some interest to readers of this blog.
The discussion started by a quote from George Box (1979) on the advantages of model-based approaches, and drifted into related topics such as
(1) What is a model-based approach,
(2) Whether mainstream statistics encourages this approach,
(3) Whether statistics textbooks and education have given face to reality,
(4) Whether a practicing statistician should invest time learning causal modeling,
or wait till it “proves itself” in the real messy world?
I share highlights of this discussion here, because I believe many readers have faced similar disputations and misunderstandings in conversations with pre-causal statisticians.
The entire discussion can be accessed here
http://andrewgelman.com/2014/07/03/great-advantage-model-based-ad-hoc-ap-proach-seems-given-time-know/#comment-178421
My comments start with the statement:
“The great advantage of the model-based over the ad hoc approach, it seems to me, is that at any given time we know what we are doing” (George Box, 1979)
This is a great quote, which I wish will be taken seriously by those who think that “models” mean models of data as opposed to models of the “data-generation process”. Box wrote it in 1979, before people realized that the models that tell us “what we are doing” or what data to collect are not really models of the data but of the “data-generation process” . The latter, unfortunately, has not been given face in statistics textbooks or mainstream statistics literature.
My final comment deals with “Toys vs. Guns” and reads:
“There have been several discussions today on the usefulness of toy models and on whether toy tests are necessary and/or sufficient when the ultimate test is success in the wild world, with its large, real, messy, practical problems with incomplete noisy data.”
I think the working assumptions in many of these comments were unrealistic. Surely, passing the toy test is insufficient, but if this is the ONLY test available before critical decisions are to be made, then speaking about insufficiency instead of conducting the test is surealistic if not dangerous. What I tried to explain before, and will try again, is that, when it comes to causal inference, there is no such thing as testing a method in the “wild world” because we cannot get any feedback from this world, nor any indication of its success or failure, save for the “roar of the crowd” and future disputes on whether success or failure were due to the method implemented or due to some other factors. Under such circumstances, I dont understand what alternative we have except for testing candidate methods in the laboratory, namely on toy models. And I don’t understand the logic of refraining from toy testing until enough people use one method or another on “real life problems.” It is like shooting untested guns into highly populated areas, in foggy weather, and waiting for wisdom to come from the gun manufacturer.
And I would also be very weary of “alternative methods” whose authors decline to submit them to the scrutiny of laboratory tests. In fact, what “alternative methods” do we have, if their authors decline to divulge their names?
The second working assumption that I find to be mistaken is that DAG-based methods are not used in practice, but are waiting passively for DAG-averse practitioners to try them out. Anyone who reads the literature in applied health science knows that DAGs have become second language in epidemiology, biostatistics and the enlightened social sciences. DAG-averse practitioners should therefore ask themselves whether they are not missing precious opportunities by waiting for their peers to make the first move. First, it is an opportunity to catch up with the wave of the future and, second, it is an opportunity to be guided by models of reality so as “to know at any give time what they are doing” (G.Box, slightly paraphrased).
No, DAG-based methods are not a panacea, and you would not know if your method is successful even if you see it in use. But one thing you would know, that at any given time you acted in accordance with your knowledge of the world and in concordance with the logic of that knowledge.
May I suggest that there is no such thing as “NO model”. Generally people tend to think they can approach any set of data without bias or model in their head. However, this is not true. Were it so, they would not even get out of bed in the morning. Because they have a model of the world that where they put their foot last night getting into bed was solid floor and when they get out of bed they cling to the theory (it is nothing but a theory until proven each morning anew!) that there still is a solid floor where they plan to set their foot. No wonder people are traumatized, e.g. by earthquakes, when things are so much off that they don’t have a ready model to map the data with. It is these hidden models which even control what data we decide to collect. Because there are certainly many more data we could collect but don’t, like the shoe size of milk buyers. We ONLY ever collect data if we have an foregoing model, some call it a hunch – but nevertheless it is a model.
Comment by Darragh McCurragh — July 21, 2014 @ 5:28 am
I certainly concur with the value of model-based approaches. Indeed, this is one of the strongest arguments for using graphical models in applied settings — they help researchers clarify the range of possible models they are considering and they provide a formal method for deriving tests of those alternative models.
That said, I have to disagree with your contention that “…there is no such thing as testing a method in the ‘wild world’ because we cannot get any feedback from this world, nor any indication of its success or failure, save for the ‘roar of the crowd’…” This seems manifestly false. Scientists test the validity of causal models every day by testing whether they can be used to correctly infer the effects of interventions. These tests are not “toys” and they don’t use synthetic data. Such tests are difficult, but they allow us to understand whether the models (and the methods used to derive those models) are accurate. There are several good examples, though still far too few to evaluate the range of methods developed in CS and elsewhere. For instance:
Duvenaud, D. K., Eaton, D., Murphy, K. P., & Schmidt, M. W. (2010). Causal learning without DAGs. In NIPS Causality: Objectives and Assessment (pp. 177-190). (“We therefore consider evaluating causal learning methods in terms of predicting the effects of interventions on unseen test data… Our experiments on synthetic and biological data indicate that some non-DAG models perform as well or better than DAG-based methods at causal prediction tasks.”)
Cook, T. D., Shadish, W. R., & Wong, V. C. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within‐study comparisons. Journal of policy analysis and management, 27(4), 724-750. (“This paper analyzes 12 recent within-study comparisons contrasting causal estimates from a randomized experiment with those from an observational study sharing the same treatment group.”)
If the methods we are developing are to be taken seriously by the larger research community, we need to take empirical evaluation far more seriously.
Comment by David Jensen — July 25, 2014 @ 12:33 pm
[…] Finally, dont miss previous postings on our blog, for example: 1. On Simpson’s Paradox. Again? 2. Who Needs Causal Mediation? 3. On model-based vs. ad-hoc methods […]
Pingback by Causal Analysis in Theory and Practice » Mid-Summer Greetings from UCLA Causality Blog — August 1, 2014 @ 3:58 pm