Counterfactual causality
Hill [
5] avoided defining exactly what he meant with a causal effect:
"I have no wish, nor the skill to embark upon a philosophical discussion of the meaning of 'causation'."
However, it seems that he applied the counterfactual model because he then writes:
"... the decisive question is whether the frequency of the undesirable event B would be influenced by a change in the environmental feature A."
Counterfactual causality dates back at least to the 18
th century Scottish philosopher David Hume [
15] but only became standard in epidemiology from the 1980s. Being the inventor of randomised clinical trials [
3,
4], Hill was strongly influenced by the idea of randomised group assignment, which precludes confounding. The idea of randomisation, invented by R.A. Fisher's in the 1920s and 1930s was, in turn, stimulated by Hume [
3]. As Fisher and Hill were friends for at least some years [
3], it seems likely that Hill was strongly influenced by counterfactual thinking. Today, the counterfactual, or potential outcome, model of causality has become more or less standard in epidemiology, and it has been argued that counterfactual causality captures most aspects of causality in health sciences [
13,
14].
To define a counterfactual effect, imagine an individual
i at a fixed time. Principally we assume that
(a)
this individual could be assigned to both exposure levels we aim to compare (X = 0 and X = 1 respectively) and
(b)
that the outcome
Y
i
exists under both exposure levels (denoted by
Y
i0
and
Y
i1
respectively) [[
14] and references therein].
The causal effect of
X = 1 versus
X = 0 within an individual
i at the time of treatment or exposure assignment can be defined as [
13‐
20]:
Note that the use of the difference measure is not exclusive – for strictly positive outcomes one can also use the ratio measure Y
i1
/Y
i0
. For a binary outcome, this definition means that the outcome event occurs under one exposure level but not under the other. Therefore, a causal effect of a binary event is a necessary condition without which the event would not have occurred; it is not necessarily a sufficient condition. Clearly, the outcome is not observable under at least one of the two exposure levels of interest. Thus, the outcome has to be estimated under the unobserved or counterfactual condition, known as the counterfactual or potential outcome.
According to Rothman [
21], a comprehensive causal mechanism is defined as a set of factors that are jointly sufficient to induce a binary outcome event, and that are minimally sufficient; that is, under the omission of just one factor the outcome would change. Rothman [
21] called this the
sufficient -component cause model. A similar idea can be found in an earlier paper by Lewis [
22]. Since several causal mechanisms are in line with the same specific counterfactual difference for a fixed individual at a fixed time, the sufficient-component cause model can be regarded as a finer version of the counterfactual model [[
14], and references therein].
As there are often no objective criteria to determine individual counterfactual outcomes, the best option is usually to estimate population average effects. The population average effect is defined as the average of individual causal effects over all individuals in the target population on whom inference is to be made. The estimation of average causal effects in epidemiology is subject to various biases [
23]. These biases are determined both by the study design and the mechanism that generates the data. In a randomised controlled trial (RCT), bias due to confounding cannot occur, but confounders might be distributed unequally across treatment levels by chance, especially in small samples. If compliance is perfect, there is no measurement error in the treatment. Other biases might still occur, however, such as bias due to measurement error in the outcome and selection bias (because individuals in the RCT might not represent all individuals in the target population). Observational studies are prone to all kinds of biases, and these depend on the causal mechanism underlying the data. For instance, bias due to confounding is determined by the factors that affect both exposure and outcome, and the distribution of these factors.
I shall demonstrate that most of Hill's considerations involve more than the X – Y association and biases in that association; their application depends on assumptions about a comprehensive causal system, of which the X – Y effect is just one component. I argue that the heuristic value of Hill's considerations converges to zero as the complexity of a causal system and the uncertainty about the true causal system increase.
2. Consistency
A relationship is observed repeatedly
For Hill [
5], the repeated observation of an association included "different persons, places, circumstances and time". The benefit of this rule was that consistently finding an association with different study designs (e.g. in both retrospective and prospective studies) reduced the probability that an association would be due to a "constant error or fallacy" in the same study design. On the other hand, he pointed out that shared flaws in different studies would tend to replicate the same wrong conclusion. Likewise, differing results in different investigations might indicate that some studies correctly showed a causal relationship, whereas others failed to identify it.
This point is explained by Rothman and Greenland [[
18], p. 25]: causal agents might require that another condition was present; for instance, transfusion could lead to infection with the human immunodeficiency virus only if the virus was present. Now, according to the sufficient-component cause model [
20,
21], and as stated by Rothman and Greenland [[
18], p. 25], whether and to what extent there is a causal effect on average depends on the prevalences of complementary causal factors.
Cox and Wermuth [[
28], pp. 225] have added the consideration that an association that does not vary strongly across the values of intrinsic variables would be more likely to be causal. If an association were similar across individuals with different immutable properties, such as sex and birth date, the association would be more likely to have a stable substantive interpretation. Variables other than
X and
Y might change as a consequence of interventions among other factors in a comprehensive causal system. One should be careful when applying this guideline; effect heterogeneity depends on the choice of the effect measure. This choice should be based on a relevant substantive theory and on correspondence with the counterfactual and sufficient-component cause model (the latter two indicating that differences rather than ratios should be used); both may, however, contradict [
29].
From the counterfactual perspective, the following questions arise when asking whether to apply the consideration on consistency:
a)
If the causal effect was truly the same in all studies, would one expect to observe different associations in different studies (possibly involving different persons, places, circumstances and time)? To what degree would the associations be expected to differ?
b)
If the causal effect varied across the studies, would one expect to observe equal or different associations? What magnitude of differences would one expect?
Note that in the presence of effect modifiers there exists no such thing as "the causal effect", the effect modifiers need to be fixed at suitable values. Also note that only a) or b) is actually counterfactual depending on whether the effect truly varies across the different studies or not. Answering these questions requires a comprehensive causal theory that indicates how different entities (individual factors, setting, time, etc.) act together in causing Y. Within such a causal system one can predict how the X – Y association should change if one used different persons, places, circumstances and times in different studies. As one can only observe associations this also involves bias, and bias might operate differently in different studies.
An observed pattern of association across the different studies that is in line with the expected pattern would provide evidence for an effect of X on Y if the underlying causal theory applies. Another pattern would indicate that there is either no effect of X on Y or that the supposed theory is false. In complex situations and bias-prone designs, the probability might be substantial that a causal theory does not include important features that change the expected X – Y association. Here, the uncertainty regarding whether or not to demand an association (or which magnitude of association) could be high, and so the consistency consideration might bring more harm than benefit.
5. Biological gradient
The outcome increases monotonically with increasing dose of exposure or according to a function predicted by a substantive theory
Hill [
5] favoured linear relationships between exposure level and outcome, for instance, between the number of cigarettes smoked per day and the death rate from cancer. If the shape of the dose-response relationship were a more complex, especially a non-monotonic, function, this would require a more complex substantive explanation.
Others have been less demanding and more specific in their definition of a dose-response relation, requiring only a particular shape of relationship (not necessarily linear or monotonic), which is predicted from a substantive theory [[
28], p. 225]. Rothman and Greenland [[
18], p. 26] have argued that parts of J-shaped dose-response curves might be caused by the respective exposure levels while others might be due to confounding only. They also provided a counter-example for a non-causal dose-response association. To demand a dose-response relationship could be misleading if such an assumption contradicted substantive knowledge. No dose-response relationship in presumably causal effects has been found, for example, between the intake of inhaled corticosteroids and lung function among asthma patients [
33] or in the pharmacotherapy of mental disorders [
34]. Further examples and similar arguments as above had been provided previously by Lanes and Poole [
35].
Counterfactual causality defines the difference between each pair of exposure levels as a distinct causal effect. The consideration on biological gradient is therefore again not a consideration on a specific causal difference but a consideration on a broader causal system involving several exposure levels. It requires a substantive theory that predicts how the outcome should change when the exposure varies over several levels. If there are k exposure levels, then this theory has to predict k-1 counterfactual differences. Some theories demand a gradient over the levels and others do not, while different theories might demand different gradients.
When applying this consideration, bias has to be properly corrected for each of the k-1 observed associations. If the observed sequence of associations over the exposure levels is in line with a theory and bias is properly addressed for each comparison, this provides evidence for the theory; otherwise, the theory, or at least one bias model, is false. Here, causal differences between specific exposure levels might in any case exist.
Several exposure levels are required to establish a dose-reponse relationship. On the other hand, the more exposure levels there are, the higher is the danger of mis-applying this consideration, because a single wrong conclusion (among k-1 possible wrong conclusions) about the existence of a specific causal difference might be sufficient for a wrong conclusion on the overall theory. RCTs are particularly useful for assessing dose-response relationships, because they avoid some biases that are sometimes difficult to correct for using other study designs.
8. Experiment
Causation is more likely if evidence is based on randomised experiments
Hill [
5] argued that a causal interpretation of an association from a non-experimental study was supported if a randomised prevention derived from the association confirmed the finding. For instance, after finding that certain events were related to the number of people smoking, one might forbid smoking to see whether the frequency of the events decrease consecutively.
To Rothman and Greenland [[
7], p. 27], it has not been clear whether Hill meant evidence from animal or human experiments. Human experiments were hardly available in epidemiology, and results from animal experiments could not easily be applied to human beings. To Susser [
11], Hill's examples suggested that he meant intervention and active change rather than research design. Both Susser [
11] and Rothman and Greenland [[
7], p. 27] stated that results from randomised experiments provided stronger evidence than results based on other study designs, but always had several possible explanations. Cox and Wermuth [[
28], p. 225f.] relaxed that criterion by replacing the qualitative difference between experimental and non-experimental studies with the rather quantitative conception of "strength of intervention": an observed difference would be more likely to be causal if it followed a massive intervention. This is motivated by the possibility that a change following a modest intervention could result from the circumstances of a treatment rather than from the treatment itself. One might add that the Cox and Wermuth consideration requires a modest intervention to be precluded from having a strong influence – an assumption that is certainly context-dependent to a high degree.
In terms of counterfactual causality, the distinction between massive and modest interventions is irrelevant, because a causal effect is only defined for a fixed index and a fixed reference condition. Hence, if interpreted in terms of strength of intervention, this is again not a consideration on a specific causal difference, but rather a consideration on a comprehensive causal theory (as the one on biological gradient). Such a theory is required in order to decide what is a modest and what is a strong intervention.
If the consideration on experiment is interpreted in terms of avoiding some biases in estimating a specific causal effect by conducting an RCT, it should be generalised as follows: observed associations should equal the true counterfactual difference as closely as possible (despite random error). Bias is reduced either by using a study design that avoids major biases or by properly correcting for bias. Clearly, avoiding bias is preferable to correcting for it, but it is often impossible to avoid some biases. As already mentioned, in RCTs with perfect compliance, confounding cannot occur (although confounders might be distributed unequally by chance) and there is no measurement error in the exposure. However, bias due to measurement error could still occur in the outcome, and there may be bias due to selection, missing data, etc. [
22]. Thus, Hill's original formulation [
5] covered only one or two among a variety of possible biases.
Instead, two more general question arise: which study design is likely to validly identify a presumed causal effect? And, if the optimal study design is not possible, how can bias be accurately corrected for? As in the consideration on strength, this can be summarised by: which results would be expected to be observed if the data were free of any bias? A causal effect is more likely if, after bias adjustment, the interval estimate excludes the null value, and it is even more likely if the lower boundary is far from the null value. If adjustment is done properly, systematic error in the corrected interval estimate decreases if the knowledge about biases increases. As a consequence, one can hardly ever demonstrate a causal effect if biases are poorly understood – this is the case even in large samples, because the associated systematic error in the results would remain even while random error decreases.