Cates' illustrative example
Cates used for illustration a subset of the trials included in a systematic review published on the Cochrane Library [
10]. He used the ten trials of high intensity nursing interventions to encourage smoking cessation – his paper shows the actual results of each trial. He demonstrates that the treat-as-one-trial approach gives an answer in the opposite direction to that from standard meta-analysis, and attributes this to the fact that the analysis is not immune to the effect of Simpson's paradox. Moore and colleagues believe that these trials should not all be grouped together and that the paradoxical answer arose from the inappropriate pooling. They thus split the trials by setting – seven trials done in hospital settings and three in primary care. We agree that it may be wise to investigate whether setting affects effectiveness before combining all of these trials, but we do not agree with the method by which Moore et al undertakes this investigation, nor their implication that the methodological problem could only occur if one pools trials inappropriately.
Moore et al show that the results are somewhat different in the two subgroups (their table
1). They argue that among hospital patients the relative treatment benefit was statistically significant (RR = 1.3, 95% CI 1.1 to 1.6) and that the NNT of 14 (95% CI 9 to 26) is a useful result. In unselected primary care patients (their definition) there was a not a statistically significant result and the NNT was 222. They conclude that nursing interventions are "probably ineffective" in these patients. Our comments are:
1 The evidence of an effect in hospital patients is fairly weak, being only marginally statistically significant.
2 The estimated effect in primary care has a confidence interval that goes way above the whole CI for secondary care, so that it is quite inappropriate to dismiss the intervention on such slight evidence. (However, it seems as if the result for this group is based on a random effects analysis.)
3 The comparison of the subgroups should not be based on comparison of P values (one significant and one not), whether explicit or, as here, implicit [
11]. By a formal test of interaction the pooled results from the two groups of trials are not significantly different.
4 No account is taken of the quality of these trials. For example, two trials (including the largest) were not properly randomised and another was a cluster randomised trial that was analysed wrongly [
10].
5 Interested readers should consult the Cochrane review [
10] to get the 'real' results. The review includes data from additional trials and analyses stratified by patient type and type of intervention. The authors conclude that the intervention is beneficial in both hospitalised and non-hospitalised patients (RR = 1.28 (1.03 to 1.61) (random effects model)), there being no-significant difference in RR between primary care and secondary care patients (P = 0.42). Applying the overall relative increase in cessation rates of 28% gives: (a) an NNT of 89 for primary care trials, based on the median placebo quit rate of 4%; (b) an NNT of 12 for secondary care trials, based on the median placebo quit rate of 30%.
In addition, one of the trials that Moore et al included as a trial of "unselected primary care patients", was in fact done in patients with cardiovascular problems [
12]. Our common sense tells us to exclude that trial. We summarise the results of meta-analyses in Table
2, here using the risk ratio (relative risk) as Moore et al did. (We cannot exactly reproduce the results given by Moore et al as we are not sure which method they used to obtain the relative risks.)
Table 2
Results of meta-analyses of trials of high intensity nursing to reduce smoking using standard§ and 'treat-as-one-trial' methods, with relative risk as effect measure
Hospital (7)
| 435/1367 | 318/1295 | | |
Meta-analysis | | | 1.30 (1.16 to 1.47) | 13.6 (8.7 to 25.5)¶¶
|
Treat-as-one-trial | | | 1.30 (1.15 to 1.47) | 13.8 (9.4 to 25.9) |
Primary Care (3)
*
| 111/2453 | 41/1006 | | |
Meta-analysis | | | 1.01 (0.71 to 1.42) | 2454 (58.4 to H84.6¶)¶¶
|
Treat-as-one-trial | | | 1.11 (0.78 to 1.58) | 222.5 (52.0 to H97.7¶) |
Primary Care (2)
**
| 87/2246 | 25/958 | | |
Meta-analysis | | | 1.54 (0.97 to 2.44) | 71.0 (26.6 to H1277¶)¶¶
|
Treat-as-one-trial | | | 1.48 (0.96 to 2.30) | 79.1(39.2 to H4369¶) |
We agree with Moore et al that it helps to split the trials by setting to gauge the differential impact of the intervention – the NNTs in the two settings are clearly different. But notably, once the trial Moore et al inappropriately include as a primary care trial is excluded, the results expressed as risk ratios are surprisingly similar for both settings, both in the subset of data presented by Cates [
1], and the full results of the review [
10]. There is also little difference between the results using standard meta-analysis and the treat-as-one-trial method, but as we noted above, although use of the treat as one trial method increases the risk of bias, bias will not always be seen.
Other points
Moore's method of grouping trials with similar control-group event rates does appear to reduce the problem, as would be predicted, but it cannot eradicate it. It is important to note that this is a 'results based' categorisation that is not based on a priori clinical criteria. Also, grouping by control group event rate only reduces the bias for analysis of risk differences, and not for relative effect measures. But grouping by control group event rate leads to worse problems as the treatment effect is correlated with the control group event rate [
14].
Moore et al also say that an analysis based on pooling risk differences assumes that the control group event rate is the same in all trials. This statement is incorrect – what is assumed (in a fixed effect analysis) is that the true treatment effect expressed as a risk difference was the same in all trials. There is no statistical assumption that the event rates per arm are similar across trials. There may be other reasons to worry about this issue, as discussed above.
Cates comments on the choice of effect measure for binary data. It is true that there is empirical evidence that relative effect measures are more likely to be homogenous across trials this does not mean that absolute measures should never be used. More seriously, it does not help us unravel the choice between odds ratio and risk ratio, where the empirical evidence shows no such dominance of one measure [
5,
6].
Systematic reviews involve subjectivity, for example in deciding which studies to analyse. It is essential that reviews include the summary data from each study so that readers can examine the implications of some of these judgements [
15]. The methods of analysis should also be specified, including the method to derive an estimated NNT. For example, it would be misleading not to report the use of the treat-as-one-trial method.