In response to an email discussion between Sander Greenland and Tyler VanderWeele concerning the semantics of causal dags, I have offered the following comment:
The phrase "what is causal about … " can be answered at three distinct levels of discussion: 1. Interpretation, 2. Construction and 3. Validation.
Causal dag is a model, and, like any other model, it is a symbolic object that acts as an oracle, that is, it produces answers to a set of queries of interest. In our case, the queries of interest are those concerning interventions and counterfactuals.
A distinction should be made here between "Causal Bayesian Networks" which are oracles of interventional queries only (see Causality Def. 1.3.1 pp. 23,) and functional or counterfactual dags, which are smarter oracles (given all the functions), capable of answering interventional as well as counterfactual queries, via Eq. (3.51), pp 98. For example, Fig. 1.6 (a) is a Causal Bayesian Network but not a functional dag of the process described.
What questions should investigators ask themselves during the construction of a causal dag, so as to minimize the likelihood of misspecification?
Given the judgmental nature of the assumptions embedded in the dag, this is a cognitive question, touching on the way scientists encode experience.
2.1 Scientists encode experience in the form of scientific laws, that is, counterfactual value-assignment processes. To determine the value x that variable X takes on, Nature is envisioned as examining the values s of some other variables, say S, and deciding according to some function x=f(s) what value X would be assigned. This is a more fundamental conception of causation than intervention, and it applies to non-manipulable variables as well, hence my favorite counter-slogan: "Of course causation without manipulation," or "Causation precedes manipulation."
2.2 To match the nature of scientific thought, the construction of causal dags is best done in counterfactual vocabulary. People judgment about statistical dependencies emanate from their judgment about cause-effect relationships, and people's judgments about causation emanate from counterfactual thinking; (otherwise, why would David Hume and David Lewis be tempted to define the former in term of the latter and not the other way around? see pp. 238-9). Accordingly, questions such as "What other factors determine X beside S?" or "Are the omitted factors determining X correlated with those determining Y?" can be quite meaningful to investigators and, hence, can be answered reliably, which explains why students find it easier to think in terms of "the existence of a hidden common parent of two or more nodes."
2.3 There is, I admit, some finger crossing in such judgment, as there is in any judgment, but the amount of guesswork is much much less than in the highly respected "Let us assume strong ignorability" which no mortal understands, except through translation into "no correlated hidden factors."
Given a causal dag, are its predictions compatible with the set of observations and experiments that one can perform?. The three conditions of Def. 1.3.1 pp. 23, are sufficient for guaranteeing that ALL observational and interventional queries be answered correctly. What does it mean? It means that, once conditions i-iii are satisfied, we can predict the effects of any policy, atomic as well as compound, deterministic as well as stochastic, that can be specified in terms of direct changes onto a given set of variables in the study. (By this we exclude unanticipated side-effects).In reality, students who construct dags do not think interventions, nor is it reasonable to assume that all interventional distributions would be available to us (as is assumed in Def 1.3.1). Still, it is healthy to have a sufficient set of empirical validation criteria such as Def 1.3.1.