### More on Pearl’s and Rubin’s frameworks for Causal Inference

**Andrew Gelman wrote a follow up to his original post:**

To follow up on yesterday's discussion, I wanted to go through a bunch of different issues involving graphical modeling and causal inference.

Contents:

– A practical issue: poststratification

– 3 kinds of graphs

– Minimal Pearl and Minimal Rubin

– Getting the most out of Minimal Pearl and Minimal Rubin

– Conceptual differences between Pearl's and Rubin's models

– Controlling for intermediate outcomes

– Statistical models are based on assumptions

– In defense of taste

– Argument from authority?

– How could these issues be resolved?

– Holes everywhere

– What I can contribute

A practical issue: poststratification

I'll start with an issue where Pearl disagrees with Rubin (and also with me, and I expect with Paul Rosenbaum, Rod Little, and many others). I'll repeat a bit from my earlier entry. Pearl writes:

For example, if we merely wish to predict whether a given person is a smoker, and we have data on the smoking behavior of seat-belt users and non-users, we should condition our prior probability P(smoking) on whether that person is a "seat-belt user" or not. Likewise, if we wish to predict the causal effect of smoking for a person known to use seat-belts, and we have separate data on how smoking affects seat-belt users and non-users, we should use the former in our prediction. . . . However, if our interest lies in the average causal effect over the entire population, then there is nothing in Bayesianism that compels us to do the analysis in each subpopulation separately and then average the results. The class-specific analysis may actually fail if the causal effect in each class is not identifiable.

Pearl seems to take this as an example of where Bayesian inference–the rule to condition on all observed data–gives the wrong answer. But I think he's missing the point. At the technical level, yes you definitely can estimate the treatment effect in two separate groups and then average. Pearl is worried that the two separate estimates might bot be identifiable–in Bayesian terms, that they will individually have large posterior uncertainties. But, if the study really is being done in a setting where the average treatment effect is identifiable, then the uncertainties in the two separate groups should cancel out when they're being combined to get the average treatment effect. If the uncertainties don't cancel, it sounds to me like there must be some additional ("prior") information that you need to add.

I'm pretty sure about this. Of all the stuff I'm talking about in this blog, Bayesian regression and poststratification is the area in which I'm most truly an expert. To get a sense of some of the gains from this approach, check out some of the recent work by Jeff Lax and Justin Phillips (here and here).

3 kinds of graphsI can think of three different ways that directed graphs have been applied to statistical modeling.

1.

Graphing the structure of a probability model.For example, consider a simple hierarchical model (the 8-schools example in chapter 5 of Bayesian Data Analysis), with a "likelihood" of y ~ N (theta, sigma^2), a "prior" of theta ~ N (mu, tau^2), and a "hyperprior" on (mu,tau). (For simplicity I'm assuming sigma is known and unmodeled; we can discuss this point later, if you'd like, but for now I'm trying to keep things clean so as to be able to use Ascii graphics.) The graph for this model is

(mu, tau) –> theta –> y

The arrows don't represent causation or anything like that–it doesn't make sense to me to talk about (mu, tau) "causing" theta, or theta "causing" y. The parameter mu, for example, simply represents the mean of the population of theta values; it has no meaning as a causal factor.2.

Graphing a hypothesized causal pattern.Cyrus gave an example in his blog comment yesterday:

(X, Y) –> L –> C

These are causal relations, as Cyrus has defined them: X and Y cause L, and so forth.3.

Graphing relations between real-world variables.Here I'm thinking of models with variables such as inflation, unemployment, and interest rates; or schooling, socioeconomic status, test scored, and delinquency.These three sorts of graphs can look similar, but they have different interpretations for causality. In particular, graphs of type 1 can be helpful for complex hierarchical models even if they are purely descriptive. For example, I recently estimated public support for school vouchers among voters, characterized by religion, ethnicity, income, and state of residence. I'm not trying to understand whether being a rich person in Texas causes you to have a certain opinion–it's an interesting question, but I'm answering some more basic descriptive questions. Nonetheless, my model has a graphical structure.

"Minimal Pearl" and "Minimal Rubin"I'd like to separate each of Rubin's and Pearl's theories into a key conceptual part and a more elaborate analytical part. I'll argue that, whatever you think of the analytical parts, the conceptual core of each theory represents a useful insight.

Minimal Rubin: Defining causal inference as potential outcomes (not necessarily "counterfactuals" because the notation can be used before an experiment is actually done, in which case any of the outcomes might be possible). I have found this to be an extremely useful way of focusing my understanding. To take just one example (which I mentioned in my earlier blog entry), when Gary and I started working on incumbency advantage twenty years ago, there was already a bit of literature on the topic: different articles with different definitions of incumbency advantage, and a near-complete confusion betweenestimandsandestimates–that is, between the formulas used to compute "incumbency advantage" numbers from data, and the underlying quantities being estimated. The potential-outcome framework allowed us to formulate the estimand–our definition of incumbency advantage–clearly, and then we were able to move to the estimation phase.

Full Rubin: The research programme under which all causal inference problems can be framed in terms of potential outcomes. The Full Rubin has had some successes (for example, the paper with Angrist and Imbens on instrumental variables) but it also creates some new difficulties, notably when dealing with intermediate outcomes.

Minimal Pearl: Displaying causal relations as a directed graph, and using graph-theoretical ideas to understand patterns such as backdoor causality and colliders. I have certainly found it useful to use graphs to explore causality, and lots and lots of people have found Pearl's ideas helpful in understanding the roles of different variables in a graph (see, for example, Cyrus's comments in the earlier blog entry). As wit

h Minimal Rubin (but in a different way), a key contribution of Minimal Pearl is to separate the causal structure from the specifics of a model. There had been lots of literature on graphs for path analysis, structural equation models, and so forth–but Pearl detached the graphical ideas from the specific correlation-based models that were out there.

Full Pearl: The research programme under which all causal inference problems can be framed in terms of graphs, colliders, the do operator, and the like. It doesn't quite work for me, but many people feel the Full Pearl is the way to go. A good argument in favor of Full Pearl is that, by handling dependence structures in a compact way, this framework frees up the researcher to think about more complicated structures of variables, to not be limited to the very simple structures that we can hold in our heads. One reason that I'm sympathetic to Full Pearl–and, at the very least, why I'd like to better understand minimal Pearl–is to see if I can improve my own modeling in this way.

Getting the most out of Minimal Pearl and Minimal RubinI'd argue that all of us–Pearl, myself, and Rubin included–would benefit from consistently using the insights of Minimal Pearl and Minimal Rubin, in particular,

from Pearl:Write models as directed graphs and, where necessary, explain exactly what the links mean and how their strength can be measured.

from Rubin:Be explicit about data collection. For example, if you're interested in the effect of inflation on unemployment, don't just talk about using inflation as a treatment; instead, specify specific treatments you might consider (adding these to the graphs, in keeping with Pearl's principles). This also goes for missing data. For example, Cyrus presented an example in which the variable Y is missing when a different variable, C, is observed. I recommend adding a new variable, I_Y, an indicator for whether Y is observed. The graphical model can then show that I_Y depends only on C.

Conceptual differences between Pearl's and Rubin's modelsI'll just list a few differences that I've seen in this discussion:

– Following Rubin's perspective, I define the causal effect of a treatment at a unit level. For simplicity I'll stick with two levels of the treatment, 0 and 1 (for example, incumbency or an open-seat election). I define the treatment effect for a single unit as y^1 – y^0, or, if you prefer subscripts, y^1_i – y^0_i. In contrast, Pearl (and Wasserman, in his comment) define treatment effects as expectations: E(y|x) or, with more labor, E(y|x=1) – E(y|x=0). Pearl et al. can feel free to use this definition, but it's different from mine.

– Here's another example. Pearl describes the following problem:

Let X and Y be the outcomes of two fair coins, and let Z be a bell that rings if at least one of X and Y comes up head. We wish to estimate the causal effect of X on Y after collecting a huge number of samples, each in the form of a triplet (X, Y, Z). Should we include Z in the analysis? If so how? Would our favorite estimate of E(Y_x) be biased? Will it give us what we expect, namely, that X has no causal effect on Y, i.e., E(Y_x) = E(Y).

I think I may be missing something in this example, but if I understand it correctly, it doesn't fit into the Rubin framework at all: in the Rubin framework,

decisions, notoutcomes, have causal effects. I'm not saying that Pearl shouldn't be working on this problem–there's clearly a lot of interest out there in methods for estimating causal effects of things that are not decisions/interventions/treatments. What I am doing is illustrating that Pearl's and Rubin's methods are different.

Controlling for intermediate outcomesPearl writes, "Let us focus on the easier example of an intermediate variable (Z) between treatment (X) and outcome (Y). Has anyone seen a proof that adjusting for Z would introduce bias?" There is no proof that an adjustment will always introduce bias (I think Corey's right that, by "unbiased," Pearl means "asymptotic consistency," in statistical jargon). The theorem is that the adjustment

canintroduce bias, and to prove this theorem, we only need a single example, such as is given on page 191 of my book with Jennifer.In any given example, there can be all sorts of other problems going on, measurement error, missing data, key unmeasured predictors, and so forth. And it's always possible that "doing something wrong" (for example, controlling for an intermediate outcome) can actually make the estimate better. Just as, to borrow Pearl's example, we might do better by adding 17.5 to any of our estimates. Except in trivial examples, we can't prove that this is a bad idea.

Even while living in a world of uncertainty, we make assumptions and, from there, try to do things that are optimal (or nearly so) within our assumptions. (A big part of my work is thinking about how to check and improve these assumptions, but that's another story.)

I realize I stated Rubin's view incorrectly. He doesn't actually say, "do not control for post-randomization variables". What he does say is: do not try to control for them ignoring the fact that they are post-randomization–that is, do not treat them as fully observed covariates. This is how his instrumental variables stuff works. So there is no contradiction with his view that you should ideally condition on all observed values, even on post-randomization variables (which can be thought of as partially observed, in the potential outcome sense).

From my perspective, the point is that Rubin's fully-Bayesian approach gets difficulty in complex settings. Rubin would argue that this difficulty is inherent and should not be avoided. And Pearl correctly pointed out the sloppiness of my statement that "Jennifer and I recommend not controlling for intermediate outcomes." A better way to put it is that it is appropriate to control for intermediate outcomes, but not if your only tool is unadjusted regression on available data.

I agree with Pearl that, "If you incorporate an intermediate variable M as a predictor in your propensity score and continue to do matching as if it is just another evidence-carrying predictor, no post processing will ever help you, except of course, redoing the estimation afresh." More sophisticated adjustment–whether using Rubin's framework, Pearl's, or some other approach–is needed. My allusion to "post-processing" was too vague.

One place where my glib advice ("don't control for intermediate outcomes") can break down is in longitudinal studies of the sort where Robins and others have developed weighting methods to estimate causal effects. Again, the real point is, yes, it's best to condition on all data; we just have to go beyond simple regression.

Statistical models are based on assumptionsPearl refers to himself as a half-Bayesian and refers to "big-brother Bayes" as an impediment to clear thinking.. I consider myself a Bayesian, pretty much. I don't always use Bayesian methods, but when I don't, I think of what I'm doing as an approximation to a more laborious full Bayesian approach.

Bayesian inference has only two steps: (1) set up a joint probability distribution for everything involved in your problem, (2) condition on observed data to get a j

oint posterior distribution for everything unobserved. Everything else in Bayes fits into these steps: model checking can be formulated in terms of step 2 (as posterior inference on replications), and model expansion goes into step 1.The most important objection to Bayesian statistics, in my opinion, is that, in realistic examples, the joint probability distribution is going to be arbitrary–often based on whatever data you happen to have at hand–and wrong. Is this a mortal flaw in Bayes? This is an empirical question; it depends on the example. Evaluation is complicated by the fact that statistical modeling, like all scientific activity, has some wiggle room. I tried a lot of models on the way to getting estimates of opinion on school vouchers.

Anyway, inference within the Bayesian framework is straightforward mathematics. We don't have to go to the grave of Thomas Bayes and ask what he would do in a situation; we just set up the model and go from there. One advantage of Bayesian methods, to me, is that it puts the focus on the model rather than on the estimation procedure.

In defense of tasteIn discussing different sorts of models, I wrote that, while some statisticians like to use discrete models, "I have a taste for continuity and always like setting up my model with smooth parameters." Personally, I think discrete models make very little sense in most social science settings (I will make exceptions in some latent-variable settings such as conceptual models, personality traits, and party identification, along with social measurements that are highly correlated with discrete biological variables such as sex), and I think most of the discrete modeling in social science is a vestige of classical significance testing ideas. But I recognize that others have different tastes than I on this matter.

Pearl very rightly queries this statement of mine, writing: "The general attitude in this discussion

has been to treat the issue as if it was a personal dispute about a wine tasting contest . . . both sides quote good reasons, so it must be a matter of taste, style, focus, perspective , interest, method etc. It isn't." In this case, Pearl is referring to a question of adjusting for pre-treatment variables rather than a question of model choice, but the same issue arises: why should applied mathematicians and research scientists (which is, ultimately, what we are) care about "taste"?Perhaps it would help I use the word "experience" instead. For example, I have experience using hierarchical models, interactions, graphical model checking, and all the other fun stuff that's in my book, whereas Rob Tibshirani has experience with generalized additive models, lasso estimation, bootstrapping, and all the great stuff in

hisbooks. You could say that I have a taste for probability models and Rob has a taste for direct data-manipulation procedures, or that I have different experiences than Rob does. However you put it, I think Rob is going to do better statistical analyses using nonparametric methods than using hierarchical Bayes, just as I'll do better the other way.I don't know if it really takes 10,000 hours to learn a new method, but it's not always a bad idea to work with what works for you.

Argument from authority?As Pearl notes, if a theorem is true, it's true, and if it's false, it's false. It does sound silly to say that someone should use a certain method just because Gelman and Hill do, or because Rosenbaum does it in his book.

But what if you frame it slightly differently, and say that Gelman and King, for example, solved some existing problems in quantitative political science (incumbency advantage, the seats-votes curve, the effects of redistricting) using certain methods. So, just maybe, there's something good about these methods, right? This sort of inductive reasoning is the basis of much of my work, which bounces back and forth between applications and methodology. Pearl is right that we should be able to focus on specific questions and not just rely on authorities (from Neyman and Fisher to Pearl and Rubin), but I don't think it's so unreasonable for me to hold up my applied successes as some validation of the methods I use.

Again, this is not to criticize Pearl's approach on his problems, just to explain why I bristle at the suggestion that I'm doing the wrong thing, for example, by poststratifying.

How could these issues be resolved?As I've already stated a few times, I think there's room in the world for Minimal Pearl, Full Pearl, Minimal Rubin, and Full Rubin–even if I don't think all four can be used at the same time on the same analysis of the same problem! Pearl and his colleagues (such as Wasserman and the anonymous author of the last section of the Wikipedia page on the Rubin Causal Model) believe Full Rubin to be a special case of Full Pearl, but as I've argued above, I don't think so.

How could the methods be compared? One could imagine some sort of data competition–in fact, I think something like this was done recently (perhaps by Pearl himself, I don't remember). The examples I have in mind are Dehejia and Wahba's reanalysis of LaLonde's analysis of a subset of data from a job-training experiment, or the notorious Harvard Nurses Study, which let do an observational analysis purportedly discovering a positive effect of hormone replacement therapy, a finding that was revealed to be in error from a later randomized experiment. The question in these examples is whether methods such as Rubin's propensity score or Greenland and Robins's g-estimation could reliably get good answers from observational data.

I don't think such a competition would be conclusive though, in that different statisticians can do well using different methods. Much depends on details of implementation.

Holes everywhereAs I discussed earlier, scientific disagreement can be frustrating, and I think I very well understand the frustration that both Pearl and Rubin feel on this, in their own ways. I continue to believe that all the analytical tools we have–Rubin's framework, Pearl's framework, the normal distribution, the logistic distribution, and, yes, even Bayesian inference–are incomplete. They all have holes. Just to quickly list them:

– When you consider multiple treatment factors, Rubin's framework leads to a proliferation of potential outcomes that has always left me confused. In addition, Rubin (in chapter 7 of Bayesian Data Analysis) recommends controlling for all variables that can affect treatment assignment. This is part of the more general recommendation to include all variables in a model, something that we realistically cannot generally do.

– Pearl's framework seems to me to assume that each note in the network corresponds, via the do operator, to a particular treatment or manipulation. Realistically there can be many ways of altering a variable; including this in the model can lead to proliferation of nodes and no sense of how to proceed to estimate a causal model.

– The normal, logistic, etc., distributions can be useful but basically never fit real data. We always have the question of when to make our models more complicated and realistic.

– Bayesian inference–realistically, almost all useful statistical inference–depends on models. We are getting better at checking these models, but the theory can never be even close to airtight except in very simple examples.

What I can contributeI've done some work on causal inference (notably, my 1990 article with King on incumbency advantage, and also my book chapter from 2004 on treatment effects in before-after data), but considering the other participants in this discussion, causality is clearly not my area of expertise. Most of my work is on statistical modeling, graphics, and model checking.

But . . . statistical modeling can contribute to causal inference. In an observational study with lots of background variables to control for, there is a lot of freedom in putting together a statistical model–different possible interactions, link functions, and all the rest. Further complexities arise in modeling missing data and latent factors. Better modeling, and model checking, can lead to better causal inference. This is true in Rubin's framework as well as in Pearl's: the structure may be there, but the model still needs to be built and tested. And, as we become more confident (without being overconfident) in our models, we can make them more complex and realistic.

**Judea Pearl replies:**

Dear Andrew,

Thanks for a comprehensive summary of the discussion which, I am sure, would help many readers understand the fundamentals of causal analysis.

There are five brief comments I would like to add.

- I would really like to see how a Bayesian method estimates the treatment effect in two subgroups where it is not identifiable, and then, by averaging the two results (with two huge posterior uncertainties) gets the correct average treatment effect, which is identifiable, hence has a narrow posterior uncertainly. The reason I declared myself a "half Bayesian" is that, from my perspective, non-identifiability is more that just large posterior uncertainty. I see non-identifiability as "irredeemable" posterior uncertainty. Which means that on a certain surface in probability space the shape of the prior uncertainty remains the same no matter how many samples you take.Simple example: Consider two competing causal models X—->Y and X We assign prior probability p to the left model and 1-p to the right, and start updating — p will not change. We might continue by defining a prior over p, prior on the prior etc. — still nothing will change.From this perspective, I would really like to learn from you how a linear combination of two wide and irredeemable uncertainties can cancel them out and reduce them to a point estimate.. I have no doubt that it can be done by fine-tuned tweaking because, after all, the information is there — one can prove (even in Rubin's language) that the correct thing to do is to ignore the subgroup identity. So, whenever the information is there, a clever analyst can set up the variables in such a way that subgroup identity will essentially be ignored. But I am talking about doing it the honest way, as you described it: "the uncertainties in the two separate groups should cancel out when they're being combined to get the average treatment effect."If I recall my happy days as a Bayesian, the only operation allowed in combining uncertainties from two subgroups is taking a linear combination of the two, weighted by the (given) relative frequencies of the groups. But, I am willing to learn new methods.
- This brings me to question 2. Why not demonstrate this Bayesian method on the coins-and-bell example. It is essentially identical to the M-structure. Treatment effect is non-identifiable for patients "for whom the bell tolled", nor for those for whom the bell remained silent. Yet we know from the physics of the story that the average treatment effect is identifiable and is zero. (It is actually zero for every patient). I hope you don't let yourself be diverted by externalities, such as the perception that "causal effect" in Rubin's writings (not his model) is applicable strictly to decisions, not to events. The language of causality has graduated from this restriction, we naturally say "the accident caused his death" — no decisions here (the graduation happened in the Renaissance, see epilogue to my book). But, even if you think this restriction is profound, it does not exonerate us from computing the causal effect of one coin on the other. Just think of a "decision maker" who decides to sing Hallelujah if coin-1 comes out head, another "decision maker" decides to administer poison to a patient if coin-2 comes out head. Now we got ourselves a fully authorized Rubin-type "causal effect" problem: And, again, the bell rings if one (or two) of the coins shows head. So, What is the causal effect of the "decision" to sing Hallelujah on the patient death? I would love to see how the solution evolves from conditioning on the bell ringing. It is a simple problem. We all understand the physics. We all understand what answer to expect. Why go to philosophy to get things messy?
- Your description of "Full Pearl", is not accurate. "Full Pearl" is not framing problems in "graphs, colliders, and do operators." Full Pearl calls for framing problems the way an investigator perceives Nature to work, regardless of whether one can estimate the parameters involved, then, once you set your model, you mark down what you know and what you do not know, what you are sure about and what you are not sure about, and do the analysis in this space of "Nature models". If it so happens that you are only sure about the structure of the graph and nothing else, then and only then, you do "graphs, colliders, and do operators." Look at the way I described the coins-bell problem — no graph, no collider, no do-operator, just story — the way you and I and most scientists communicate. Or take, for example, the analysis of "probability of causation" (defined as P(Y_0 = 1| X=1, Y=0) chapter 9 of my book) If you do not know anything, you get a broad bound on that quantity. If you have both experimental and observational data, you get a narrower bound. If you can assume monotonicity, you get identifiability from experimental data and corrected bound from observational data. Finally, if you know the graph structure, you get full identifiability even not assuming monotonicity. The difference between "Full Pearl" and "Full Rubin" is that the last phase (using graphs) is somehow missing from the latter. The reason I spend several several chapters on graphs and how to use them without even going to the Y_x notation, is because I strongly believe that, if scientists have any idea about their science, they encode it in the form of a graph. So, from my perspective, Rubin prohibition on graphs denies scientists the use of the language in which they feel most comfortable to communicate knowledge, and this knowledge is indispensable in both Rubin and my frameworks. I allow them to say plainly: treatment does not change gender, while Rubin forces them to express it in the form of "ignorability" condition. My solemn pledge: if anyone shows me a scientist who speaks "ignorability" I will convert to Rubinism.
- Another cosmetic difference that you emphasize is the the difference between defining treatment effect for a single unit as y_1 – y_0 while, in contrast, Pearl and Larry ALLEGEDLY define treatment effect as expectation E(y|x=1) – E(y|x=0). My whole book is dedicated to the difference between conditional expectations and causal expectations. So, I don't think you meant it that way. You probably meant to focus on the difference between unit-base causal effect y_1 (u) – y_0 (u) and average causal effect P(Y_1 = 1) – P( Y_) = 1) (which I obtain from P(Y_x = 1), x = 0,1.2.3.4.5…) I call it cosmetic difference because chapters 7-10 of my book deal with the unit-based counterfactual, Y_x (u) from which you can get average causal effects, differences, ratios, direct effects, probabilities of causa

tion and more and more… Moreover, in contrast to Rubin's framework, the counterfactual quantities Y_x (u) are not treated as primitives, partially invisible entities. No! The are DERIVED (let me repeat: DERIVED) from more fundamental quantities specified in the model. It is good to derive things from the model because this keeps you honest, coherent,and allows you to prove theorems of general applicability. This shows up, for example, in proving that one should refrain from conditioning (stratifying, controlling etc etc) on intermediate variables. If you just get one counterexample to conditioning on an intermediary Z, you do not know if the blame is with your parameters, or with Z being on the pathway, or Z being just any outcome of the treatment. Perhaps it would be OK to condition on Z when it is only a proxy for an intermediary, or a consequence of the outcome. ??? (the answer, by the way, is in chapter 11 of my book) When Y_0 and Y_1 are derivable from the model, you can tell precisely when "ignorability" is compromised by conditioning. - Finally, you say: "Pearl and his colleagues … believe Full Rubin to be a special case of Full Pearl, but as I've argued above, I don't think so." Andrew, in the age of mathematics, it is no longer for Pearl or Gelman to believe of not believe whether one system of inference subsumes another. A system of inference, in the age of logic and computer science can be characterized formally and completely like any formal language. In chapter 7 of my book I offer a complete axiomatic characterization or the Structural Causal Model that I propose. Lo and Behold, it coincides with the rules of inference used in Rubin's model (e.g., consistency, composition and effectiveness) What does it mean? It means that a theorem in one system is a theorem in the other. (the rules of translation are in chapter 7 of my book, and are posted in a review article on my website, http://ftp.cs.ucla.edu/pub/stat_ser/Test_pea-final.pdf) It means that Full Rubin is a special case of Full Pearl, suffering only from two syntactic deficiencies: Dont use graphs, don't use structural equations; i.e., express all knowledge in the language of "ignorability" sentences.

In computer science, we can look back and imagine (counterfactually) what the world would be like had we disallowed compilers and forced everyone to program in machine language. The analogy is clear. The two blunders I mentioned earlier (1. inappropriate conditioning and 2. paradoxical direct effects) are the first concrete manifestations of the harm caused by such prohibition. Let us hope they are the last ones.

Hey there I am so happy I found your webpage, I really found you by mistake, while I was

browsing on Yahoo for something else, Regardless I

am here now and would just like to say thank you for a incredible post and a all round exciting blog (I also love the theme/design), I don’t have time to browse it all at the moment but I have bookmarked

it and also added in your RSS feeds, so when I have time I will be back to read a

lot more, Please do keep up the great work.

Comment by dana tunai — April 6, 2017 @ 7:23 pm

I visit day-to-day some blogs and websites to read posts,

however this website provides quality based writing.

Comment by pinjaman tanpa jaminan — April 6, 2017 @ 7:29 pm

Oh my goodness! Incredible article dude! Thank you, However

I am going through difficulties with your RSS. I don’t know why I can’t join it.

Is there anybody else getting similar RSS problems? Anyone who

knows the solution can you kindly respond? Thanx!!

Comment by Top — April 18, 2017 @ 12:13 am

Pearl’s framework does not require that each node in the network corresponds, via the do operator, to a particular treatment or manipulation. Rather, each node is a random variable, and the do operator fixes a node X to X=x without affecting the joint distribution of the ancestors of X (in sharp contrast to standard conditioning).

Comment by Olivia Fair — June 28, 2017 @ 10:17 am

Perhaps some discussion of _arguably reasonable/defensible_ specifications for the joint distributions (priors and likelihoods) may be helpful (for me anyways).

(the do operate operates on the specification??? and should be separate from usual Bayes conditioning of the chosen joint probability model on “all” the data ??? as well as the choice of Larry’s functional on the posterior? )

But in his response the letters Rubin does seem concerned about the specification of in-defensible models …

Fisher in his later writings seemed to be arguing that the most important task in the design of studies was to lessen dependence on assumptions that were difficult to make true/check. And Rubin seems to be concerned that the Pearl approach will lead to more likely use of less defensible specifications. Pearl’s admirable encouragement for investigators to explicate their beliefs may need to be better managed.

This is not math, but _pragmatics_ and of large concern in applications.

Comment by Anna — July 1, 2017 @ 10:08 am