Abstract
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as ‘nuisance’ variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this ‘conditional’ regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.
Similar content being viewed by others
References
Allison PD (1995) Survival analysis using SAS: a practical guide. SAS Institute, Cary
Beane Freeman LE, Blair A, Lubin JH, Stewart PA, Hayes RB, Hoover RN, Hauptmann M (2009) Mortality from lymphohematopoietic malignancies among workers in formaldehyde industries: the National Cancer Institute Cohort. J Natl Cancer Inst 101(10):751–761
Breslow N, Day NE (1980) Statistical methods in cancer research: the analysis of case-control studies. IARC Scientific Publications, Lyon
Breslow NE, Day NE (1987) Statistical methods in cancer research: the design and analysis of cohort studies. International Agency for Research on Cancer, Lyon
Cardis E, Vrijheid M, Blettner M, Gilbert E, Hakama M, Hill C, Howe G, Kaldor J, Muirhead CR, Schubauer-Berigan M, Yoshimura T, Bermann F, Cowper G, Fix J, Hacker C, Heinmiller B, Marshall M, Thierry-Chef I, Utterback D, Ahn YO, Amoros E, Ashmore P, Auvinen A, Bae JM, Solano JB, Biau A, Combalot E, Deboodt P, Diez Sacristan A, Eklof M, Engels H, Engholm G, Gulis G, Habib R, Holan K, Hyvonen H, Kerekes A, Kurtinaitis J, Malker H, Martuzzi M, Mastauskas A, Monnet A, Moser M, Pearce MS, Richardson DB, Rodriguez-Artalejo F, Rogel A, Tardy H, Telle-Lamberton M, Turai I, Usel M, Veress K (2005) Risk of cancer after low doses of ionising radiation: retrospective cohort study in 15 countries. Br Med J 331(7508):77
Cardis E, Vrijheid M, Blettner M, Gilbert E, Hakama M, Hill C, Howe G, Kaldor J, Muirhead CR, Schubauer-Berigan M, Yoshimura T, Bermann F, Cowper G, Fix J, Hacker C, Heinmiller B, Marshall M, Thierry-Chef I, Utterback D, Ahn YO, Amoros E, Ashmore P, Auvinen A, Bae JM, Solano JB, Biau A, Combalot E, Deboodt P, Diez Sacristan A, Eklof M, Engels H, Engholm G, Gulis G, Habib R, Holan K, Hyvonen H, Kerekes A, Kurtinaitis J, Malker H, Martuzzi M, Mastauskas A, Monnet A, Moser M, Pearce MS, Richardson DB, Rodriguez-Artalejo F, Rogel A, Tardy H, Telle-Lamberton M, Turai I, Usel M, Veress K (2007) The 15-country collaborative study of cancer risk among radiation workers in the nuclear industry: estimates of radiation related cancer risks. Radiat Res 167(4):396–416
Cummings P, McKnight B, Greenland S (2003a) Matched cohort methods for injury research. Epidemiol Rev 25:43–50
Cummings P, McKnight B, Weiss NS (2003b) Matched-pair cohort methods in traffic crash research. Accid Anal Prev 35(1):131–141
Frome EL (1983) The analysis of rates using Poisson regression models. Biometrics 39(3):665–674
Frome EL, Checkoway H (1985) Epidemiologic programs for computers and calculators. Use of Poisson regression models in estimating incidence rates and ratios. Am J Epidemiol 121(2):309–323
Greenland S (1989) Modeling and variable selection in epidemiologic analysis. Am J Public Health 79(3):340–349
Greenland S (2008) Invited commentary: variable selection versus shrinkage in the control of multiple confounders. Am J Epidemiol 167(5):523–529 discussion 530–521
Hornung RW, Meinhardt TJ (1987) Quantitative risk assessment of lung cancer in U.S. uranium miners. Health Phys 52(4):417–430
Langholz B, Richardson DB (2010) Fitting general relative risk models for survival time and matched case-control analysis. Am J Epidemiol 171(3):377–383
Langholz B, Thomas D, Xiang A, Stram D (1999) Latency analysis in epidemiologic studies of occupational exposures: application to the Colorado Plateau uranium miners cohort. Am J Ind Med 35(3):246–256
Lubin JH, Boice JD Jr, Edling C, Hornung RW, Howe GR, Kunz E, Kusiak RA, Morrison HI, Radford EP, Samet JM et al (1995) Lung cancer in radon-exposed miners and estimation of risk from indoor exposure. J Natl Cancer Inst 87(11):817–827
Lubin JH, Pottern LM, Stone BJ, Fraumeni JF Jr (2000) Respiratory cancer in a cohort of copper smelter workers: results from more than 50 years of follow-up. Am J Epidemiol 151(6):554–565
Maldonado G, Greenland S (1993) Simulation study of confounder-selection strategies. Am J Epidemiol 138(11):923–936
Muirhead CR, O’Hagan JA, Haylock RG, Phillipson MA, Willcock T, Berridge GL, Zhang W (2009) Mortality and cancer incidence following occupational radiation exposure: third analysis of the National Registry for Radiation Workers. Br J Cancer 100(1):206–212
Pearl J (2000) Causality: models, reasoning, and inference. Cambridge University Press, Cambridge
Preston DL, Kato H, Kopecky KJ, Fujita S (1987) Studies of the mortality of A-bomb survivors, report 8. Cancer mortality, 1950–1982. Radiat Res 111(1):151–178
Preston DL, Lubin JH, Pierce DA, McConney ME (1993) Epicure: user’s guide. Hirosoft International Corporation, Seattle
Richardson DB (2008) A simple approach for fitting linear relative rate models in SAS. Am J Epidemiol 168(11):1333–1338
Richardson DB (2009) Exposure to ionizing radiation in adulthood and thyroid cancer incidence. Epidemiology 20(2):181–187
Singer JD, Willett JB (2003) Applied longitudinal data analysis: modeling change and event occurrence. Oxford University Press, New York
Thomas D (1981) General relative risk models for survival time and matched case-control analysis. Biometrics 37(4):673–686
Weng HY, Hsueh YH, Messam LL, Hertz-Picciotto I (2009) Methods of covariate selection: directed acyclic graphs and the change-in-estimate procedure. Am J Epidemiol 169(10):1182–1190
Acknowledgments
This project was supported by grant R01-CA117841 from the National Cancer Institute, National Institutes of Health.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Appendix 1
A standard log-linear unconditional Poisson regression model of the form \( \lambda (\alpha ,\beta ) = \exp (\alpha_{1} S_{1} + \alpha_{2} S_{2} + \alpha_{3} S_{3} + \alpha_{4} S_{4} + \beta_{1} Z) \) may be fitted to the data in Table 1 via the SAS statistical package as follows:
The variables P and c denote counts of person-time and events, respectively, in the grouped data structure. The ‘parms’ statement defines the parameters to be estimated, and the ‘profile’ statement requests associated 95% likelihood-based confidence intervals. The term ‘lambda’ specifies that the rate of disease conforms to an exponential function of the model covariates. The ‘LL’ statement specifies the expression for the unconditional Poisson likelihood, and the statement ‘max LL’ defines the function to be maximized.
A log-linear Poisson regression model may be fitted to the data structure in Table 2, with background stratification on covariates A and B, via the SAS statistical package as follows:
The analytical data structure has one record per stratum. The variables _ncovals and _totcases denote the total number of exposure values, and total number of cases, in each stratum. The arrays _cases, _pt, and _z index the values for the counts of events, person-time, and levels of the exposure variable(s) of interest in each stratum of the analytical data structure. The length of the arrays will depend upon the analytical data structure. The variables caseprod and sum, which are the numerator and denominator, respectively, of the expression for the conditional likelihood, are initialized at each new record in the analysis. The term ‘phi’ defines the relative rate function of the regression model. In the example above, the rate ratio function conforms to a standard log-linear model. The ‘parms’ statement defines the parameter(s) to be estimated, and the ‘profile’ statement requests associated 95% profile likelihood confidence bounds. The ‘LL’ statement specifies the expression for the log likelihood in this model, and the statement ‘max LL’ defines the function to be maximized.
The SAS procedure PROC NLP is part of the SAS/OR statistical package. Some SAS users may have access to the SAS/STAT package but not the SAS/OR package. Therefore, below, we also provide sample code for fitting background stratified Poisson regression models via the SAS PROC NLMIXED procedure which is part of the SAS/STAT package. SAS PROC NLMIXED does not directly output profile likelihood confidence intervals for estimated parameters but does report Wald-type confidence intervals.
This approach accommodates a variety of functional forms for the relative rate function, ϕ. For example, a linear excess relative rate model of the form ϕ = (1 + βz) would be fitted by replacing the statement ‘ ’ with the statement ‘ ’.
Appendix 2
With the model as in (2), the unconditional log-likelihood contribution from stratum s is \( L_{s} (\alpha_{s} ,\beta ) = c_{s} \alpha_{s} + \sum\nolimits_{{z \in R_{s} }} {c_{sz} \ln (P_{sz} \varphi (z,\beta ))} - \exp (\alpha_{s} )\sum\nolimits_{{z \in R_{s} }} {P_{sz} \varphi (z,\beta )} . \) The observed information at \( \hat{\alpha },I(\hat{\alpha },\beta ) \) is given by
The variance estimate for \( \hat{\beta } \) is the corner of the inverse of the observed information (evaluated at \( \hat{\alpha }, \) \( \hat{\beta } \)) which may be obtained using the well-known matrix formula
Since I α,α is a diagonal matrix, it is easy to compute that
This expression is the same as the second derivative of the ‘conditional’ Poisson log-likelihood; consequently, estimated standard errors and associated Wald-type confidence intervals will be the same. For simplicity, we derive the expression for a single parameter β. The expressions apply to column vector β where the derivatives are as in standard vector calculus and squared terms are replaced by outer products, i.e., replace a 2 by aa t where a t is the transpose of a.
Rights and permissions
About this article
Cite this article
Richardson, D.B., Langholz, B. Background stratified Poisson regression analysis of cohort data. Radiat Environ Biophys 51, 15–22 (2012). https://doi.org/10.1007/s00411-011-0394-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00411-011-0394-5