Background
Regression analyses of time series of disease counts on putative environmental determinants, especially air pollution and weather, have been a prominent component of environmental epidemiology of the past quarter century, with no sign of diminishing [
1‐
5]. The units (temporal resolution) are often days, but sometimes weeks, months, or years, and duration can be from a year or less to many decades. For planning such studies, it may be useful to predict the precision of coefficients that will be estimated, or predict power to identify non-null associations. Sometimes, available data is fixed, so that the question (for example for research funders) is whether the new information obtainable from analysing them is worth the cost of doing so, or gives adequate protection against false positives [
6]. Sometimes, new data can be obtained at a cost, and a choice needs to be made as to the number of years or the number of locations to collect data from.
Most epidemiologists will understand, as we confirm below, that precision and power will depend on the number of observations (eg days), the total number of disease events or mean events/day, and, for multi-series (eg multi-city) studies, the number of series. The nature of that dependence and whether there are other factors is not generally known and little addressed in the time series regression literature.
The only published method we have found addressing these questions specifically for time series regression of counts focused on power estimation and used simulations [
7]. There have in addition been long-published methods for sample size determination and power estimation more generally in generalised linear models [
8‐
10], with some focused on Poisson regression [
11]. At least two computer packages have implemented some of these: G*Power, which is free [
12], and NCSS:PASS, which is commercially available [
13]. However, it has been noted that application to the count time series context was not straightforward [
7]. Calculations need to be tailored to the specifics of each study, and the algorithmic nature of the approaches does not facilitate insight into primary determinants of precision and power in typical time series regression contexts. It is also not clear that existing generic methods can extend to multi-series studies analysed in the typical two-stage approach, comprising series-specific regressions then meta-analysis of coefficient estimates.
In this paper we propose some simple approximate formulae for standard error (SE) of the coefficient of interest (and thus precision) requiring only quantities likely to be estimable at the planning stage of a study. This SE formula also allows estimation of width of confidence interval (CI), power to detect an association of specified size, and number of days or number of disease events required for given precision or power. We do not however consider our focus exclusively to be prediction of power, as has been traditionally the case in sample size discussions, following the current trends in epidemiology to reduce emphasis on significance testing [
14]. After considering single-series studies we extend the approach to estimate precision in a multi-series study that aim to estimate the mean association (or effect) in a two-stage analysis. We illustrate use of the approximations in a worked planning scenario, and finally evaluate the accuracy of the approximations with existing data sets.
Discussion
We have provided simple approximate estimators for precision of estimates of coefficients of interest and hence power that can be helpful in advance of undertaking a study. The simplicity of the estimators allow some general guidelines to be identified: For single series, the total number of deaths and the useable variation (SD) of exposure are the dominant factors unless overdispersion is severe. For multiple series from which a meta-analytic mean is to be estimated, the aggregate of these factors over series remain dominant if there is little underlying heterogeneity of coefficients between series, but if there is such heterogeneity its extent and the number of series become important.
Our application of the estimators for single series to examples in which we in fact knew the precision realised, and comparison of power estimates to those from algorithms implemented in the package G*power suggested that the approximations we made were good to the extent that total number of deaths and usable exposure variation could be predicted. For meta-analytic means of multiple series estimates were less reliable, in particular if heterogeneity was present and not allowed for.
The finding that precision and power are dominated by total number of events is consistent with but goes further than the power simulations published previously [
7], where it was observed that “increasing time-series length and average daily outcome counts both increased power to a similar extent”. This publication also commented: “reduction in power […] can accompany use of multipollutant models”. Although we do not address multi-pollutant models explicitly in this paper, when interest in is effect of each pollutant adjusting for that of others, those other pollutants can be considered covariates in our formulation. Because inclusion of such additional covariates reduces usable exposure SD of the exposure of interest, this statement is also consistent with our result.
The conclusion that the length of the series only influences precision through total event count may seem surprising to some. It does have limitations, but we believe they are minor ones. Some exposures may have larger exposure variation in longer series, but because time series regressions routinely include covariates to model longer-term variations, such as trend or seasonality, the usable standard deviation often changes little. For example SD(x|z) of heat exposure was on average 1.07 over the full 21 years, and 1.06 over just 1 year (for convenience the first). However, aspects other than precision, such as robustness against bias, may diminish in short series. For an extreme example an estimate from a “series” of 10 days with 1000 deaths/day on average may have the same precision as one from a series of 1000 days with 10 deaths/day on average, but would seem more subject to bias, for example some other risk factor happening to be concurrent with the most exposed day or two.
There is an impact of temporal resolution of the series (days, weeks, etc) on precision of coefficients of interest that is not directly evident in our expressions. The usable variability of exposure, which does influence precision, often changes with temporal resolution. Overdispersion of the outcome may also change, in our experience often increasing with longer durations. And there may be epidemiological considerations, for example fine resolution (eg days) is optimal to estimate acute exposure effects, whereas coarser resolution (eg years) has advantages for longer term effects.
There are alternatives to the approach we have proposed. If all counts are large enough to assume that they or their logarithm are normally distributed, and vary only modestly, simpler methods might be used. Otherwise, as we have illustrated, programs such as G*Power can be used for a single series with Poisson deaths to find power given number of days, baseline death rate per day, standard deviation of exposure and variance inflation factor. But this misses the insights of the simpler approximate expressions, in particular that number of days and baseline deaths rate influence precisions and power only though the total number of deaths. If actual linked outcome and exposure data exists, one can simply run the regression to find the precision and hence deduce power. But our approach does not need such detail, and can with limited estimated summaries be used to illustrate, for example, how choices of different cities or length of series lead to different precision and power, allowing better informed design decisions.
A problem with predicting precision and power with any method is that these depend on parameters that are unknown at planning stage, in particular mean counts, usable exposure variation, and overdispersion. Our worked example illustrated some approaches to this such as making estimates from other studies or preliminary data. The Spanish multi-city comparison also suggested that in similar contexts overdispersion would be low (< 1.1) if mean counts were below about 40. Unfortunately few studies report overdispersion quantitatively, but it is our experience that at least for deaths or hospitalisations due primarily to non-infectious diseases this pattern is common. This was one reason we considered the estimate we made for the infant deaths example need not assume overdispersion. However, there may of course be exceptions, and each context should be considered on its merits.
The issue of not knowing the parameters needed to estimate precision and power is particularly acute for multiple series, where as our Spanish cities example illustrated, the extent of heterogeneity in effect estimates over series can be critical. Other simplifying assumptions also were needed, for example all approximations assumed that the usable exposure variation was constant across all studies, expression (5) assumed constant overdispersion, and expression (7) constant SE(\( {\hat{\beta}}_j \)), Uncertainty in approximations will increase with uncertainty in assumptions. It may be useful if such uncertainty is great to make estimates of precision of the desired effect measure under varying assumptions (so possibly using different approximations) as a sensitivity analysis.
There are several limitations in the approach we present. First, our estimators of precision are approximate, and we have not undertaken a comprehensive evaluation of them by simulation. However, the evaluation from real data suggests that primary source of error is not in the expressions but in the limited precision with which input parameters, in particular usable predictor exposure SD, can be predicted at planning stage. This is an issue whatever the accuracy of algorithms to find precision and power for given input parameters. Second, we do not discuss non-linear or distributed lag models. Both of these could be addressed to some extent by estimating precision and power for a simplified model. For example the analysis of Spanish data above approximated the curved linear temperature-mortality curve with a linear-threshold model. To approximate using a distributed lag model to estimate a cumulative risk of all lags we could estimate the usable standard deviation of the running mean of temperature over the most important lags, for example a three-day running mean for heat.
Also, we have not addressed auto-correlation in residuals. If allowed for in the model this would somewhat diminish precision, but since it has usually been found small if present at all, the impact of this is not generally expected to be large. Neither have we considered distributional models other than Poisson with scale overdispersion, for example negative binomial or zero-inflated Poisson. Finally, we do not discuss case cross-over analysis. However, given the equivalence between fixed stratum versions of this approach and time series regression with stepped time functions [
21], there seems no reason to believe that the results presented here would not apply provided that the “usable exposure SD” is that conditional on time stratum as well as any other covariates.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.