Background
Often the outcomes measured in medical research are count outcomes. Typically, these measure the number of times a particular event happens to an individual in a defined period. Examples of count outcomes include the number of falls by the individual, the number of asthma exacerbations or the number of incontinence episodes. These outcomes are commonly measured in randomised controlled trials (RCTs) to determine the effect of an intervention.
There are many ways of summarising the difference between interventions when the outcome is a count outcome [
1‐
3], such as:
-
A simple rate ratio—the ratio of the number of events per person time at risk in each of the treatment groups.
-
A rate ratio calculated from the Poisson regression family—such as Poisson and negative binomial.
-
A risk ratio after the data are dichotomised into those with and without the event.
-
A hazard ratio using the time to the event—either the time to the first event or using a method that copes with multiple times to events.
-
A difference in means that treats the data as continuous and is compared using a
t test or linear regression. More recently, the ratio of means has been used [
4]. These analyses cause few problems for count outcomes with a high mean, such as pulse rate, as the Poisson distribution with a high mean approximates a normal distribution. In practice, however, this approach is often used on data with lower means.
-
A difference in medians tested by a non-parametric test such as the Wilcoxon rank sum test or the ratio of medians.
The variety of analytic methods used in RCTs with count outcomes causes difficulties when carrying out a meta-analysis. In addition to the usual problems of heterogeneity arising from populations and treatments, there is heterogeneity in outcomes and analysis methods used across RCTs to evaluate the effect of the intervention. This raises a key question of whether the results from these alternative methods of analysis are comparable enough (exchangeable) to be combined in a meta-analysis.
This paper describes a simulation study designed to see whether mixing the results of different methods of analysis could give reasonable answers in a meta-analysis.
Falling is a major health problem for older people, with approximately 30 % of people over the age of 65 falling each year, with many falls resulting in injury and hospitalisation. The 2009 Cochrane systematic review “Interventions for preventing falls in older people living in the community” included 43 trials that assessed the effect of exercise programmes [
5]. The two primary outcomes in this review were the rate of falls and the proportion of fallers. Twenty-six of the 43 studies contributed to the rate of falls meta-analysis, and 31 to the number of fallers. Some studies could not be used because of the way the data were analysed and presented. We asked for individual patient data from randomised trials included in this systematic review, analysed them in different ways and compared the resulting meta-analyses.
Discussion
The results of this study suggest that it may well be possible in many situations to combine in a meta-analysis the estimates of intervention effects for count outcomes analysed in various ways, as the results from the different analysis methods were very similar. Apart from a few instances, most analyses gave estimates that were on average close to the RaR from a negative binomial regression. Further, examination of the range of data from the simulations showed that the confidence intervals of most of the methods were similar. Therefore, pooling intervention estimates calculated by different methods is likely to be generally reasonable. This has been shown using both simulations and actual data from a meta-analysis of RCTs. When events were rare, or there was no treatment effect, all methods of analysis provide a very similar estimate of intervention effect with similar variation. An exception to this is the ratio of medians, which is impossible to calculate unless both groups have more than 50 % of participants with events. As events become more common, dichotomising the results into those with the event and those without increasingly loses the ability to discriminate between treatments, and the confidence interval becomes narrower. Intuitively, as events become more common, it is likely that all, or almost all, of the participants will experience one or more events. Similarly, time to the first event loses the ability to discriminate with increasing event rates, but this happens more slowly than with dichotomising the data.
Poisson regression and negative binomial regression models gave very similar results for the RaR, even when there was a significant amount of overdispersion. This was expected given these distributions have the same expected value [
13,
26]. The standard error of the RaR estimated from Poisson regression will be too small in the presence of overdispersion, which will have implications for the weights in meta-analytic models. In this simulation, the underestimation of the standard error was only slight but was most noticeable with both a high mean and a lot of overdispersion. Trials that are analysed using Poisson regression in the presence of overdispersion will receive too much weight in the meta-analysis. The impact of not allowing for overdispersion, and subsequent underestimation of the variance of the intervention effect, was evident when comparing the fixed effect meta-analysis confidence intervals calculated from using Poisson regression compared with the negative binomial regression in the empirical study.
Adjusting the survival analyses for multiple events also gave estimates close to those from the negative binomial regression, although the confidence intervals were wider, especially as the mean increases. An exception to this was the Andersen-Gill method that gave an estimate of the HR that was, on average, slightly further from 1 than the negative binomial RaR. The difference between the estimates increases as the mean increases, which may lead to a different interpretation of the intervention effect and make it unreasonable to combine Andersen-Gill HR estimates with those estimated from the negative binomial regression. All survival models in these simulations make the assumption of proportional hazards. In our simulations, the proportional hazards assumption is likely to be true because of the way the data was generated but may not be so for any particular RCT.
The ratio of medians is clearly inappropriate where the event rate is low as the medians in one or both groups are likely to be zero. As the event rate increases, the average difference between estimates calculated from the ratio of medians and negative binomial regression is small. However, in any particular trial, the difference could be large, as indicated by the large standard deviation of the differences. Especially when the mean is low, the distribution of the ratio of medians is highly concentrated at discrete values but becomes smoother as the mean increases. This could lead to different variances compared with the other models. In practice, it is difficult to use the ratio of medians as the standard error cannot be computed from commonly reported statistics. There is a formula for the 95 % confidence interval of the ratio of medians, but calculation requires the original data [
27]. An alternative to using this formula, but still requiring the original data, is to use a method such as bootstrapping to compute the standard errors. More commonly, trial authors will report one of the other effect measures, such as the simple RaR (or at least the raw data that allows this ratio to be calculated). Calculation of the ratio of means is likely to be possible from many studies where the means are reported. There is a standard formula that calculates an approximate standard error from the mean, standard deviation and number of individuals in each of the arms of the study [
4].
It is perhaps unsurprising that the estimates and their distributions are similar. The simple RaR and Poisson regression estimate the same parameter; any differences are likely to be due to rounding errors, as the Poisson regression requires more calculations to be performed. The expected values of the estimates from Poisson regression and negative binomial regression are the same. Survival analysis and Poisson regression estimate the same parameter when the baseline hazard is constant [
28], which in these simulations will hold, and should for many RCTs. The ratio of means is the coefficient from a linear regression of group assignment on the log of the count outcomes. This is similar to the coefficient in a Poisson regression, except that linear regression does not cope well with zero scores in the outcome, the error structure is different and it is unable to adjust for different follow-up periods.
We chose the negative binomial model as the reference model as it seems appropriate for this sort of data, especially in the presence of overdispersion. This does not allow for the estimation of bias in any of the methods, as we do not have the “true” value. As the question we wanted to answer was whether the results of the different methods could be combined in a meta-analysis looking at the difference from one of the methods was more appropriate.
There are other possibilities for the analysis of count outcomes, such as zero inflated Poisson, zero inflated negative binomial and Poisson regression with robust errors which allows for overdispersion by relaxing the requirement that the mean and variance are equal. However, we did not evaluate these methods since they are not used very often in practice.
Previously, it has been established that, to prevent bias, it is important to account for the length of exposure, which may differ because of dropouts that are not missing at random [
12,
29]. The simple rate ratio, Poisson regression and negative binomial regression are all able to adjust for varying follow-up times, as do the survival analysis methods. Thus, it is surprising that the ratio of means and the ratio of medians yield similar effect estimates to those estimated from the negative binomial regression. This may be a result of the data sets generated assuming similar attrition across groups, and the missing data mechanism being participants missing completely at random. Under various scenarios (e.g. varying attrition rates and different missing data mechanisms (e.g. not missing at random)), the ratio of means and ratio of medians may yield effect estimates that differ compared with those estimated from negative binomial regression.
The choice of a uniform distribution to pick the times that the events occurred may not be the most realistic option. Events may be more likely to occur closer together or further apart than a uniform distribution would give. They also may not be independent of each other, particularly as having an event may increase or decrease the time to the next event and this may depend on the nature of the event.
The fact that intervention effect estimates from RCTs using different analytical methods can, in some circumstances, be pooled in a meta-analysis should not make the method of analysis a random choice in any particular trial. The analysis should match the hypothesis and the study design. We have previously advocated for the use of negative binomial regression in evaluating falls prevention studies [
11], as have others for this type of data [
14]. Negative binomial regression allows for all events to be included (thus using all information) and the length of exposure to vary and more appropriately accounts for overdispersed data. But it does treat individuals who have multiple events in quick succession, and then none for the rest of the follow-up period the same as those who have the same number of events evenly spread out throughout the period.
We have concentrated on the point estimates, with no detailed examination of the variances of these. Thus, more questions remain to be answered about meta-analysis of count data outcomes analysed using alternative methods. The impact of the trial analytical method on meta-analytic intervention effects, their standard errors and heterogeneity needs to be investigated. The impact is likely to vary by the chosen meta-analysis model (random effects versus fixed effect), so any investigation should examine both models. This simulation only examined data that were missing completely at random. This is overly simplistic, and research examining the impact of different missing data mechanisms and how these interact with the trial and meta-analysis methods would be valuable.
The focus of this paper is on RCTs, but these methods of analysis are used for other types of studies (non-randomised trials, observational studies), which may also be included in meta-analyses. For study types other than RCTs, it would be critical to examine the impact of covariates and missing data, in addition to the examination we have undertaken in this paper.
Competing interests
All authors declare that they have no conflicts of interest. CR was an investigator on two of the studies that provided data for the empirical study.
Authors’ contributions
PH conceived of the idea in conjunction with CR, carried out the simulations, wrote the first draft of the manuscript. CR helped with the original idea and revised the manuscript. JM refined the idea, helped with the simulations, and revised the manuscript. All authors read and approved the final manuscript.