Causal Analysis in Theory and Practice

July 19, 2012

A note posted by Elias Bareinboim

Filed under: covariate selection,Discussion,General,Identification,Opinion,Selection Bias — eb @ 10:00 pm

In the past week, I have been engaged in a discussion with Andrew Gelman and his blog readers regarding causal inference, selection bias, confounding, and generalizability. I was trying to understand how his method which he calls “hierarchical modelling” would handle these issues and what guarantees it provides. Unfortunately, I could not reach an understanding of Gelman’s method (probably because no examples were provided).

Still, I think that this discussion having touched core issues of scientific methodology would be of interest to readers of this blog, the link follows:
http://andrewgelman.com/2012/07/long-discussion-about-causal-inference-and-the-use-of-hierarchical-models-to-bridge-between-different-inferential-settings/

Previous discussions took place regarding Rubin and Pearl’s dispute, here are some interesting links:
http://andrewgelman.com/2009/07/disputes_about/
http://andrewgelman.com/2009/07/more_on_pearlru/
http://andrewgelman.com/2009/07/pearls_and_gelm/
http://andrewgelman.com/2012/01/judea-pearl-on-why-he-is-only-a-half-bayesian/

If anyone understands how “hierarchical modeling” can solve a simple toy problem (e.g., M-bias, control of confounding, mediation, generalizability), please share with us.

Cheers,
Bareinboim

Comments (1)

November 30, 2009

Measurement Cost and Estimator’s Variance

Filed under: Back-door criterion,covariate selection,Discussion,measurement cost — moderator @ 4:06 pm

Sander Greenland from UCLA writes:

The machinery in your book addresses only issues of identification and unbiasedness. Of equal concern for practice is variance, which comes to the fore when (as usual) one has a lot of estimators with similar bias to choose from, for within that set of estimators the variance becomes the key driver of expected loss (usually taken as MSE (mean-squared-error = variance+bias^2). Thus for example you may identify a lot of (almost-) sufficient subsets in a graph; but the minimum MSE attainable with each may span an order of magnitude. On top of that, the financial costs of obtaining each subset may span orders of magnitudes. So your identification results, while important and useful, are just a start on working out which variables to spend the money to measure and adjust for. The math of the subsequent MSE and cost considerations is harder, but no less important.

Judea Pearl replies:

You are absolutely right, it is just a start, as is stated in Causality page 95. The reason I did not emphasize the analysis of variance in this book was my assumption that, after a century of extremely fruitful statistical research, one would have little to add to this area.

My hypothesis was:

Once we identify a causal parameter, and produce an estimand of that parameter in closed mathematical form, a century of statistical research can be harnessed to the problem, and render theestimation task a routine exercise in data analysis. Why spend energy on areas well researched when so much needs to be done in areas of neglect?

However, the specific problem you raised, that of choosing among competing sufficient sets, happens to be one that Tian, Paz and Pearl (1998) did tackle and solved. See Causality page 80, reading: “The criterion also enable the analyst to search for an optimal set of covariates — a set Z that minimizes measurement cost or sampling variability (Tian et al, 1998).” [Available at http://ftp.cs.ucla.edu/pub/stat_ser/r254.pdf] By “solution”, I mean of course, an analytical solution, assuming that cost is additive and well defined for each covariate. The paper provides a polynomial time algorithm that identifies the minimal (or minimum cost) sets of nodes that d-separates two nodes in a graph. When applied to a graph purged of outgoing arrows from the treatment node, the algorithm will enumerate all minimal sufficient sets, i.e., sets of measurements that de-confound the causal relation between treatment and outcome.

Readers who deem such an algorithm useful, should have no difficulty implementing it from the description given in the paper; the introduction of variance considerations though would require some domain-specific expertise.

Comments (2)