Our main purpose is to describe the conditional Poisson model, but before doing this we introduce the illustrative data and terminology, and briefly review the conditional logistic regression and unconditional Poisson regression formulations for case cross-over studies.
Conditional logistic model for case cross-over data
Since the model formulation is standard and can be found elsewhere [
2], only a summary is given here. Data are expanded to include each case and all other days in the stratum as if a matched set in a case–control study or risk set in Cox regression. Thus if there are k deaths in a stratum, the stratum data must appear k times in the expanded data set. If there are on average K deaths in a stratum, the dataset size will be multiplied by K.
With this expanded data and the notation described above, the conditional logistic model can be written
(1)
Where D1,s is the event that the death in stratum s occurs on day i, β is a row vector of parameters, and superscript T denotes transpose.
The data duplication is reduced (say “semi-expanded”) if there are multiple deaths on the same day by multiplying the likelihood contribution from that day by the number of deaths on the case day (weighting). However, even in the semi-expanded form strata with deaths on more than one day must be repeated in the data as many times are there are days with cases, with different “case” days each time replicated.
Excerpts from the London data in the original count and semi-expanded case crossover format are presented in Tables
2 and
3. In the semi-expanded format each day is repeated four (or five) times, once as a “case” day and three (or four) times as a control day.
Table 2
Excerpt from example daily data in original format
2002 1 Sun | 06 jan 2002 | 2.4 | 7.1 | 198 |
2002 1 Sun | 13 jan 2002 | 17.6 | 8.2 | 204 |
2002 1 Sun | 20 jan 2002 | 49.9 | 8.9 | 167 |
2002 1 Sun | 27 jan 2002 | 42.5 | 10.5 | 169 |
2002 1 Mon | 07 jan 2002 | 4.1 | 5.2 | 180 |
. . . . |
Table 3
Excerpt from example data in semi-expanded format for case crossover conditional logistic analysis
2002 1 Sun | 2002 1 Sun 1 | 06 jan 2002 | 2.4 | 7.1 | 1 | 198 |
2002 1 Sun | 2002 1 Sun 1 | 13 jan 2002 | 17.6 | 8.2 | 0 | 198 |
2002 1 Sun | 2002 1 Sun 1 | 20 jan 2002 | 49.9 | 8.9 | 0 | 198 |
2002 1 Sun | 2002 1 Sun 1 | 27 jan 2002 | 42.5 | 10.5 | 0 | 198 |
2002 1 Sun | 2002 1 Sun 2 | 06 jan 2002 | 2.4 | 7.1 | 0 | 204 |
2002 1 Sun | 2002 1 Sun 2 | 13 jan 2002 | 17.6 | 8.2 | 1 | 204 |
2002 1 Sun | 2002 1 Sun 2 | 20 jan 2002 | 49.9 | 8.9 | 0 | 204 |
2002 1 Sun | 2002 1 Sun 2 | 27 jan 2002 | 42.5 | 10.5 | 0 | 204 |
2002 1 Sun | 2002 1 Sun 3 | 06 jan 2002 | 2.4 | 7.1 | 0 | 167 |
2002 1 Sun | 2002 1 Sun 3 | 13 jan 2002 | 17.6 | 8.2 | 0 | 167 |
2002 1 Sun | 2002 1 Sun 3 | 20 jan 2002 | 49.9 | 8.9 | 1 | 167 |
2002 1 Sun | 2002 1 Sun 3 | 27 jan 2002 | 42.5 | 10.5 | 0 | 167 |
2002 1 Sun | 2002 1 Sun 4 | 06 jan 2002 | 2.4 | 7.1 | 0 | 169 |
2002 1 Sun | 2002 1 Sun 4 | 13 jan 2002 | 17.6 | 8.2 | 0 | 169 |
2002 1 Sun | 2002 1 Sun 4 | 20 jan 2002 | 49.9 | 8.9 | 0 | 169 |
2002 1 Sun | 2002 1 Sun 4 | 27 jan 2002 | 42.5 | 10.5 | 1 | 169 |
2002 1 Mon | 2002 1 Mon 1 | 07 jan 2002 | 4.1 | 5.2 | 1 | 180 |
2002 1 Mon | 2002 1 Mon 1 | 14 jan 2002 | 18.7 | 9.3 | 0 | 180 |
2002 1 Mon | 2002 1 Mon 1 | 21 jan 2002 | 38.1 | 10.8 | 0 | 180 |
2002 1 Mon | 2002 1 Mon 1 | 28 jan 2002 | 56.1 | 10.3 | 0 | 180 |
. . . . |
The unconditional Poisson regression model
It has been shown that a standard (unconditional) Poisson model applied to data in the original time series format (top Table
2) with indicator variables for strata give identical estimates and inference to conditional logistic regression on expanded data – the two models are equivalent [
2,
4]. The association of pollution with mortality can be thought to be inferred from the extent to which WITHIN STRATA daily death counts are explained by daily exposure concentrations. Because it provides a familiar starting point from which we can describe the conditional Poisson regression model we describe this model algebraically here.
Because control of factors changing across strata is no longer achieved by design, in addition to the regressors
x
i
we also include stratum indicator variables (a vector
z
i
):
(2)
It helps understand the conditional variant of this model to re-write the term
α
T
z
i
as α
s where day i falls in stratum s (thus vector α = (α
1,…, α
S)) . Then the model is
(3)
The conditional Poisson regression model
The conditional Poisson model is the same as model (3), except that instead of the parameters {α
s} being estimated they are “conditioned out”, by conditioning on the sum of events
in each stratum. Technically, the conditional Poisson model is actually a multinomial model, with
(4)
However, describing it as a conditional Poisson model emphasizes its connections with the Poisson model and has proved convenient in formulating algorithms for packages to fit the parameters, so it is generally implemented under the conditional Poisson name. Where both can be fit, the conditional Poisson model gives identical estimates and inferences to the unconditional Poisson model and hence to the conditional logistic model (illustrated in the Results section).
The conditional Poisson model was first proposed in the econometrics literature, illustrated by a study of the dependence of annual number of patents registered by companies on their R&D expenditure [
5]. It has been proposed for the self-controlled case series design in the first place for vaccine safety studies in a series of papers by Farrington and co-workers [
6‐
8]. In this literature “exposure” typically varies between study subjects as well as over time, but a special case is where many subjects share the same exposure series, as in a typical case crossover study [
9]. We are not aware of published use of the model for environmental stratified time series analyses, where the overwhelming preponderance is of conditional logistic analyses in a case crossover formulation.
The authors are familiar with implementations of the conditional Poisson model in Stata (xtpoisson with fe option) and in R (gnm with eliminate option). Examples of using these two implementations are given in Additional file
1. Strata that have no cases may be dropped, because they do not contribute to the likelihood. The EPICURE AMFIT package [
10] implements the conditional Poisson model for stratified survival data under the label background stratified Poisson and this has been used quite extensively in studies of cancer effects of ionizing radiation. Richardson [
11] comments that the AMFIT implementation has an unnecessary limitation in the number of strata, and proposed a method without that limitation using SAS procedure nlp or mlmixed. Xu [
12] presents an approach to fit conditional Poisson models in SAS, but as this is effectively by re-formulating as a conditional logistic model we class this a conditional logistic formulation (discussed below). Many packages have programs that fit multinomial models, but these do not allow exposures
x to vary within in each stratum
s (e.g. pollution to vary within strata), so they cannot be used as an alternative for case crossover analyses or others that concern us here.
The conditional Poisson model, like the unconditional Poisson and conditional logistic formulations, can incorporate potentially confounding covariates not homogeneous within strata for example temperature (if air pollution is the focus). All the models can also explore modification of associations of exposure with outcomes by either such covariates or those homogeneous in strata. In the case crossover context, modifiers may be individual (e.g. age) or in multi-city studies ecological (city-level). Analyses of multi-city studies may be single-step (pooling all strata across cities) as well as the conventional multi-step (city-specific at step 1, meta-analysis at step 2). The simplicity of the conditional Poisson formulation makes the single step approach straightforward to apply (simply pool all cities into one dataset and make the strata by city as well as month and day-of-week). However, the implicit assumptions of this approach (no random or systematic between-city effects) would need investigating. A single-step analysis is particularly attractive when exposure series are available for small areas within cities.
The original event counts may have variation greater than that predicted by a Poisson distribution, so be “overdispersed” in a Poisson model. This overdispersion is not apparent in a conditional logistic analysis because in each “case–control” set in the expanded data outcomes are binary (0 or 1) for which overdispersion has no meaning. However, the assumption of independence between case–control sets in a conditional logistic model implicitly assumes no overdispersion of counts. If the binary outcomes (in the case crossover formulation) are clustered by day, then the variance of observed daily counts around the value predicted from that model will be overdispersed Poisson [
2]. Where there is such overdispersion in counts a conditional logistic regression will therefore underestimate uncertainty in estimated coefficients.
Like the unconditional Poisson model with strata, the conditional model can be extended to a quasi-Poisson (overdispersed Poisson) variant, in which scale over-dispersion within strata is allowed for. In either case the over-dispersion ψ is best estimated from the Pearson chi-squared statistic, though neither this nor other estimates are consistent when data are sparse (few events per stratum) [
13]. Quasi-Poisson is an option in the R implementation, and can be implemented in Stata with some post-processing (see Additional file
1).
Similarly, the methods discussed by Brumback [
14] for allowing for autocorrelation for count time series in general can be applied to the conditional as well an unconditional Poisson models. We are not aware of any off the self-software implementation but ad hoc implementations in Stata and R are described in additional file
1. As with overdispersion, it is sometimes thought that a case crossover analysis, especially if stratified by day of week, is not affected by autocorrelation. However, the case crossover formulation assumes that observations (in the expanded data format) are independent both within and across strata – an assumption that is violated if there is residual autocorrelation in counts.
The Poisson models can also accommodate studies where rate denominators (durations of time intervals or numbers of subjects at risk) vary between study units (“days”) by using an appropriate offset. Residual and influence analysis is also possible with the Poisson models.
The conditional logistic formulation does not easily allow any of these extensions apart from the incorporation of covariates.
Comparing processor time taken in fitting each model
To compare processor time taken to fit each of the three models described above we simulated datasets with a range of sizes, corresponding to possible scenarios. For each scenario we simulated ten years of daily data. Baseline mortality rates of 1,10, and 100 deaths/day represented small, medium, and large cities. Three more data-sets included multiples of this baseline number of days to illustrate multi-city or multi-area studies analysed in one stage. Outcome counts were generated to follow a Poisson distribution with mean given by the exponent of a linear sum of seven covariates (exposures and confounders). The covariates were distributed as multivariate normal, mutually correlated at r = 0.25, and scaled so that one standard deviation of each covariate was associated with a rate ratio of 1.05. Two types of case cross-over stratification were considered: by month and day-of-week, as described above, and by month only.