Skip to main content
Log in

Background stratified Poisson regression analysis of cohort data

  • Original Paper
  • Published:
Radiation and Environmental Biophysics Aims and scope Submit manuscript

Abstract

Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as ‘nuisance’ variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this ‘conditional’ regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Allison PD (1995) Survival analysis using SAS: a practical guide. SAS Institute, Cary

    Google Scholar 

  • Beane Freeman LE, Blair A, Lubin JH, Stewart PA, Hayes RB, Hoover RN, Hauptmann M (2009) Mortality from lymphohematopoietic malignancies among workers in formaldehyde industries: the National Cancer Institute Cohort. J Natl Cancer Inst 101(10):751–761

    Article  Google Scholar 

  • Breslow N, Day NE (1980) Statistical methods in cancer research: the analysis of case-control studies. IARC Scientific Publications, Lyon

    Google Scholar 

  • Breslow NE, Day NE (1987) Statistical methods in cancer research: the design and analysis of cohort studies. International Agency for Research on Cancer, Lyon

    Google Scholar 

  • Cardis E, Vrijheid M, Blettner M, Gilbert E, Hakama M, Hill C, Howe G, Kaldor J, Muirhead CR, Schubauer-Berigan M, Yoshimura T, Bermann F, Cowper G, Fix J, Hacker C, Heinmiller B, Marshall M, Thierry-Chef I, Utterback D, Ahn YO, Amoros E, Ashmore P, Auvinen A, Bae JM, Solano JB, Biau A, Combalot E, Deboodt P, Diez Sacristan A, Eklof M, Engels H, Engholm G, Gulis G, Habib R, Holan K, Hyvonen H, Kerekes A, Kurtinaitis J, Malker H, Martuzzi M, Mastauskas A, Monnet A, Moser M, Pearce MS, Richardson DB, Rodriguez-Artalejo F, Rogel A, Tardy H, Telle-Lamberton M, Turai I, Usel M, Veress K (2005) Risk of cancer after low doses of ionising radiation: retrospective cohort study in 15 countries. Br Med J 331(7508):77

    Article  Google Scholar 

  • Cardis E, Vrijheid M, Blettner M, Gilbert E, Hakama M, Hill C, Howe G, Kaldor J, Muirhead CR, Schubauer-Berigan M, Yoshimura T, Bermann F, Cowper G, Fix J, Hacker C, Heinmiller B, Marshall M, Thierry-Chef I, Utterback D, Ahn YO, Amoros E, Ashmore P, Auvinen A, Bae JM, Solano JB, Biau A, Combalot E, Deboodt P, Diez Sacristan A, Eklof M, Engels H, Engholm G, Gulis G, Habib R, Holan K, Hyvonen H, Kerekes A, Kurtinaitis J, Malker H, Martuzzi M, Mastauskas A, Monnet A, Moser M, Pearce MS, Richardson DB, Rodriguez-Artalejo F, Rogel A, Tardy H, Telle-Lamberton M, Turai I, Usel M, Veress K (2007) The 15-country collaborative study of cancer risk among radiation workers in the nuclear industry: estimates of radiation related cancer risks. Radiat Res 167(4):396–416

    Article  Google Scholar 

  • Cummings P, McKnight B, Greenland S (2003a) Matched cohort methods for injury research. Epidemiol Rev 25:43–50

    Article  Google Scholar 

  • Cummings P, McKnight B, Weiss NS (2003b) Matched-pair cohort methods in traffic crash research. Accid Anal Prev 35(1):131–141

    Article  Google Scholar 

  • Frome EL (1983) The analysis of rates using Poisson regression models. Biometrics 39(3):665–674

    Article  MATH  Google Scholar 

  • Frome EL, Checkoway H (1985) Epidemiologic programs for computers and calculators. Use of Poisson regression models in estimating incidence rates and ratios. Am J Epidemiol 121(2):309–323

    Google Scholar 

  • Greenland S (1989) Modeling and variable selection in epidemiologic analysis. Am J Public Health 79(3):340–349

    Article  Google Scholar 

  • Greenland S (2008) Invited commentary: variable selection versus shrinkage in the control of multiple confounders. Am J Epidemiol 167(5):523–529 discussion 530–521

    Article  Google Scholar 

  • Hornung RW, Meinhardt TJ (1987) Quantitative risk assessment of lung cancer in U.S. uranium miners. Health Phys 52(4):417–430

    Article  Google Scholar 

  • Langholz B, Richardson DB (2010) Fitting general relative risk models for survival time and matched case-control analysis. Am J Epidemiol 171(3):377–383

    Article  Google Scholar 

  • Langholz B, Thomas D, Xiang A, Stram D (1999) Latency analysis in epidemiologic studies of occupational exposures: application to the Colorado Plateau uranium miners cohort. Am J Ind Med 35(3):246–256

    Article  Google Scholar 

  • Lubin JH, Boice JD Jr, Edling C, Hornung RW, Howe GR, Kunz E, Kusiak RA, Morrison HI, Radford EP, Samet JM et al (1995) Lung cancer in radon-exposed miners and estimation of risk from indoor exposure. J Natl Cancer Inst 87(11):817–827

    Article  Google Scholar 

  • Lubin JH, Pottern LM, Stone BJ, Fraumeni JF Jr (2000) Respiratory cancer in a cohort of copper smelter workers: results from more than 50 years of follow-up. Am J Epidemiol 151(6):554–565

    Google Scholar 

  • Maldonado G, Greenland S (1993) Simulation study of confounder-selection strategies. Am J Epidemiol 138(11):923–936

    Google Scholar 

  • Muirhead CR, O’Hagan JA, Haylock RG, Phillipson MA, Willcock T, Berridge GL, Zhang W (2009) Mortality and cancer incidence following occupational radiation exposure: third analysis of the National Registry for Radiation Workers. Br J Cancer 100(1):206–212

    Article  Google Scholar 

  • Pearl J (2000) Causality: models, reasoning, and inference. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Preston DL, Kato H, Kopecky KJ, Fujita S (1987) Studies of the mortality of A-bomb survivors, report 8. Cancer mortality, 1950–1982. Radiat Res 111(1):151–178

    Article  Google Scholar 

  • Preston DL, Lubin JH, Pierce DA, McConney ME (1993) Epicure: user’s guide. Hirosoft International Corporation, Seattle

    Google Scholar 

  • Richardson DB (2008) A simple approach for fitting linear relative rate models in SAS. Am J Epidemiol 168(11):1333–1338

    Article  Google Scholar 

  • Richardson DB (2009) Exposure to ionizing radiation in adulthood and thyroid cancer incidence. Epidemiology 20(2):181–187

    Article  Google Scholar 

  • Singer JD, Willett JB (2003) Applied longitudinal data analysis: modeling change and event occurrence. Oxford University Press, New York

    Google Scholar 

  • Thomas D (1981) General relative risk models for survival time and matched case-control analysis. Biometrics 37(4):673–686

    Article  Google Scholar 

  • Weng HY, Hsueh YH, Messam LL, Hertz-Picciotto I (2009) Methods of covariate selection: directed acyclic graphs and the change-in-estimate procedure. Am J Epidemiol 169(10):1182–1190

    Article  Google Scholar 

Download references

Acknowledgments

This project was supported by grant R01-CA117841 from the National Cancer Institute, National Institutes of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David B. Richardson.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOC 35.5 kb)

Appendices

Appendix 1

A standard log-linear unconditional Poisson regression model of the form \( \lambda (\alpha ,\beta ) = \exp (\alpha_{1} S_{1} + \alpha_{2} S_{2} + \alpha_{3} S_{3} + \alpha_{4} S_{4} + \beta_{1} Z) \) may be fitted to the data in Table 1 via the SAS statistical package as follows:

The variables P and c denote counts of person-time and events, respectively, in the grouped data structure. The ‘parms’ statement defines the parameters to be estimated, and the ‘profile’ statement requests associated 95% likelihood-based confidence intervals. The term ‘lambda’ specifies that the rate of disease conforms to an exponential function of the model covariates. The ‘LL’ statement specifies the expression for the unconditional Poisson likelihood, and the statement ‘max LL’ defines the function to be maximized.

A log-linear Poisson regression model may be fitted to the data structure in Table 2, with background stratification on covariates A and B, via the SAS statistical package as follows:

The analytical data structure has one record per stratum. The variables _ncovals and _totcases denote the total number of exposure values, and total number of cases, in each stratum. The arrays _cases, _pt, and _z index the values for the counts of events, person-time, and levels of the exposure variable(s) of interest in each stratum of the analytical data structure. The length of the arrays will depend upon the analytical data structure. The variables caseprod and sum, which are the numerator and denominator, respectively, of the expression for the conditional likelihood, are initialized at each new record in the analysis. The term ‘phi’ defines the relative rate function of the regression model. In the example above, the rate ratio function conforms to a standard log-linear model. The ‘parms’ statement defines the parameter(s) to be estimated, and the ‘profile’ statement requests associated 95% profile likelihood confidence bounds. The ‘LL’ statement specifies the expression for the log likelihood in this model, and the statement ‘max LL’ defines the function to be maximized.

The SAS procedure PROC NLP is part of the SAS/OR statistical package. Some SAS users may have access to the SAS/STAT package but not the SAS/OR package. Therefore, below, we also provide sample code for fitting background stratified Poisson regression models via the SAS PROC NLMIXED procedure which is part of the SAS/STAT package. SAS PROC NLMIXED does not directly output profile likelihood confidence intervals for estimated parameters but does report Wald-type confidence intervals.

This approach accommodates a variety of functional forms for the relative rate function, ϕ. For example, a linear excess relative rate model of the form ϕ = (1 + βz) would be fitted by replacing the statement ‘ ’ with the statement ‘ ’.

Appendix 2

With the model as in (2), the unconditional log-likelihood contribution from stratum s is \( L_{s} (\alpha_{s} ,\beta ) = c_{s} \alpha_{s} + \sum\nolimits_{{z \in R_{s} }} {c_{sz} \ln (P_{sz} \varphi (z,\beta ))} - \exp (\alpha_{s} )\sum\nolimits_{{z \in R_{s} }} {P_{sz} \varphi (z,\beta )} . \) The observed information at \( \hat{\alpha },I(\hat{\alpha },\beta ) \) is given by

$$ - dU\left( {\hat{\alpha }_{s} } \right)/d\alpha_{s} = c_{s} $$
$$ - dU\left( {\hat{\alpha }_{s} } \right)/d\alpha_{l} = 0 $$
$$ - dU\left( {\hat{\alpha }_{s} } \right)/d\beta = c_{s} \sum\limits_{{z \in R_{s} }} {P_{sz} \varphi^{\prime } (z;\beta )} /\sum\limits_{{z \in R_{s} }} {P_{sz} \varphi (z;\beta )} $$
$$ - dU(\beta )/d\beta = \sum\limits_{s} {\left[ {\sum\limits_{{i \in D_{s} }} {\left[ {\frac{{\varphi^{\prime \prime } (Z_{i} ;\beta )}}{{\varphi (Z_{i} ;\beta )}} - \left( {\frac{{\varphi^{\prime } (Z_{i} ;\beta )}}{{\varphi (Z_{i} ;\beta )}}} \right)^{2} } \right] - c_{s} \frac{{\sum\nolimits_{{z \in R_{s} }} {P_{sz} \varphi^{\prime \prime } (z;\beta )} }}{{\sum\nolimits_{{z \in R_{s} }} {P_{sz} \varphi (z;\beta )} }}} } \right]} . $$

The variance estimate for \( \hat{\beta } \) is the corner of the inverse of the observed information (evaluated at \( \hat{\alpha }, \) \( \hat{\beta } \)) which may be obtained using the well-known matrix formula

$$ (I^{ - 1} )_{\beta ,\beta } = \left[ {I_{\beta ,\beta } - I_{\beta ,\alpha } I_{\alpha ,\alpha }^{{^{ - 1} }} I_{\alpha ,\beta } } \right]^{ - 1} . $$

Since I α,α is a diagonal matrix, it is easy to compute that

$$ \text{var} \left( {\hat{\beta }} \right)^{ - 1} = \sum\limits_{s} {\left[ {\sum\limits_{{i \in D_{s} }} {\left[ {\frac{{\varphi^{\prime \prime } (Z_{i} ;\beta )}}{{\varphi (Z_{i} ;\beta )}} - \left( {\frac{{\varphi^{\prime } (Z_{i} ;\beta )}}{{\varphi (Z_{i} ;\beta )}}} \right)^{2} } \right] - \,c_{s} \left[ {\frac{{\sum\nolimits_{{z \in R_{s} }} {P_{sz} \varphi^{\prime \prime } (z;\beta )} }}{{\sum\nolimits_{{z \in R_{s} }} {P_{sz} \varphi (z;\beta )} }}\, - \left( {\frac{{\sum\nolimits_{{z \in R_{s} }} {P_{sz} \varphi^{\prime } (z;\beta )} }}{{\sum\nolimits_{{z \in R_{s} }} {P_{sz} \varphi (z;\beta )} }}} \right)^{2} } \right]^{{}} } } \right]} $$

This expression is the same as the second derivative of the ‘conditional’ Poisson log-likelihood; consequently, estimated standard errors and associated Wald-type confidence intervals will be the same. For simplicity, we derive the expression for a single parameter β. The expressions apply to column vector β where the derivatives are as in standard vector calculus and squared terms are replaced by outer products, i.e., replace a 2 by aa t where a t is the transpose of a.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Richardson, D.B., Langholz, B. Background stratified Poisson regression analysis of cohort data. Radiat Environ Biophys 51, 15–22 (2012). https://doi.org/10.1007/s00411-011-0394-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00411-011-0394-5

Keywords

Navigation