{"id":1164,"date":"2014-07-14T19:30:04","date_gmt":"2014-07-15T02:30:04","guid":{"rendered":"http:\/\/www.mii.ucla.edu\/causality\/?p=1164"},"modified":"2014-07-14T19:30:04","modified_gmt":"2014-07-15T02:30:04","slug":"on-model-based-vs-ad-hoc-methods","status":"publish","type":"post","link":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/2014\/07\/14\/on-model-based-vs-ad-hoc-methods\/","title":{"rendered":"On model-based vs. ad-hoc methods"},"content":{"rendered":"<p>A lively discussion flared up early this month on Andrew\u00a0Gelman&#8217;s blog (garnering 114 comments!) which should\u00a0be of some interest to readers of this blog.<\/p>\n<p>The discussion started by a quote from George Box (1979) on the\u00a0advantages of model-based approaches, and drifted into\u00a0related topics such as<\/p>\n<p>(1) What is a model-based approach,<\/p>\n<p>(2) Whether mainstream statistics encourages this approach,<\/p>\n<p>(3) Whether statistics textbooks and education have given face to reality,<\/p>\n<p>(4) Whether a\u00a0practicing statistician should invest time learning causal\u00a0modeling,<\/p>\n<p>or wait till it &#8220;proves itself&#8221; in the real\u00a0messy world?<\/p>\n<p>I share highlights of this discussion here,\u00a0because I believe many readers have faced similar\u00a0disputations and misunderstandings in conversations with\u00a0pre-causal statisticians.<\/p>\n<p><!--more To read more, click here.--><\/p>\n<p>The entire discussion can be accessed here<br \/>\n<a href=\"http:\/\/andrewgelman.com\/2014\/07\/03\/great-advantage-model-based-ad-hoc-ap-proach-seems-given-time-know\/#comment-178421\" target=\"_blank\"><\/a><a href=\"http:\/\/andrewgelman.com\/2014\/07\/03\/great-advantage-model-based-ad-hoc-ap-proach-seems-given-time-know\/#comment-178421\">http:\/\/andrewgelman.com\/2014\/07\/03\/great-advantage-model-based-ad-hoc-ap-proach-seems-given-time-know\/#comment-178421<\/a><\/p>\n<p>My comments start with the statement:<\/p>\n<p>&#8220;The great advantage of the model-based over the ad hoc\u00a0approach, it seems to me, is that at any given time we know\u00a0what we are doing&#8221; (George Box, 1979)<\/p>\n<p>This is a great quote, which I wish will be taken seriously\u00a0by those who think that &#8220;models&#8221; mean models of data\u00a0as opposed to models of the &#8220;data-generation process&#8221;. Box wrote it in 1979, before people realized that the\u00a0models that tell us &#8220;what we are doing&#8221; or what data to\u00a0collect are not really models of the data but of the\u00a0&#8220;data-generation process&#8221; . The latter, unfortunately,\u00a0has not been given face in statistics textbooks or\u00a0mainstream statistics literature.<\/p>\n<p>My final comment deals with &#8220;Toys vs. Guns&#8221; and reads:<\/p>\n<p>&#8220;There have been several discussions today on the\u00a0usefulness of toy models and on whether toy tests are\u00a0necessary and\/or sufficient when the ultimate test\u00a0is success in the wild world, with its large, real,\u00a0messy, practical problems with incomplete noisy data.&#8221;<\/p>\n<p>I think the working assumptions\u00a0in many of these comments were unrealistic.\u00a0Surely, passing the toy test is insufficient, but if this\u00a0is the ONLY test available before critical\u00a0decisions are to be made, then speaking about\u00a0insufficiency instead of conducting the test\u00a0is surealistic if not dangerous. What I tried to explain before, and will try again,\u00a0is that, when it comes to causal inference, there is no\u00a0such thing as testing a method in the &#8220;wild world&#8221;\u00a0because we cannot get any feedback from this world,\u00a0nor any indication of its success or failure, save\u00a0for the &#8220;roar of the crowd&#8221; and future disputes on\u00a0whether success or failure were due to the\u00a0method implemented or due to some other factors. Under such circumstances, I dont understand what alternative we have except for testing candidate methods in the laboratory, namely on toy models. And I don&#8217;t understand the logic of refraining from toy testing until enough people use one method or another on &#8220;real life problems.&#8221; It is like shooting untested guns into highly populated areas, in foggy weather, and waiting for wisdom to come from the gun manufacturer.<br \/>\nAnd I would also be very weary of &#8220;alternative methods&#8221;\u00a0whose authors decline to submit them to\u00a0the scrutiny of laboratory tests. In fact, what\u00a0&#8220;alternative methods&#8221; do we have, if their authors\u00a0decline to divulge their names?<\/p>\n<p>The second working assumption that I find to be\u00a0mistaken is that DAG-based methods are not\u00a0used in practice, but are waiting passively for DAG-averse\u00a0practitioners to try them out. Anyone who reads the\u00a0literature in applied health science knows that\u00a0DAGs have become second language in epidemiology, biostatistics\u00a0and the enlightened social sciences. DAG-averse\u00a0practitioners should therefore ask themselves whether they\u00a0are not missing precious opportunities by waiting for their\u00a0peers to make the first move. \u00a0First, it is an opportunity\u00a0to catch up with the wave of the future and, second, it is\u00a0an opportunity to be guided by models of reality so as &#8220;to\u00a0know at any give time what they are doing&#8221;\u00a0(G.Box, slightly paraphrased).<\/p>\n<p>No, DAG-based methods are not a panacea, and you would\u00a0not know if your method is successful even if you see\u00a0it in use. But one thing you would know, that at any\u00a0given time you acted in accordance with your knowledge\u00a0of the world and in concordance with the logic of that knowledge.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A lively discussion flared up early this month on Andrew\u00a0Gelman&#8217;s blog (garnering 114 comments!) which should\u00a0be of some interest to readers of this blog. The discussion started by a quote from George Box (1979) on the\u00a0advantages of model-based approaches, and drifted into\u00a0related topics such as (1) What is a model-based approach, (2) Whether mainstream statistics [&hellip;]<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10,11,16],"tags":[],"class_list":["post-1164","post","type-post","status-publish","format-standard","hentry","category-definition","category-discussion","category-general"],"_links":{"self":[{"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/posts\/1164","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=1164"}],"version-history":[{"count":0,"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/posts\/1164\/revisions"}],"wp:attachment":[{"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=1164"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=1164"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/causality.cs.ucla.edu\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=1164"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}