Statistical Methods in Cancer Epidemiological Studies

Xue, Xiaonan; Hoover, Donald R.

doi:10.1007/978-1-59745-416-2_13

Statistical Methods in Cancer Epidemiological Studies

Xiaonan Xue³ &
Donald R. Hoover⁴

Protocol

3451 Accesses
9 Citations

Part of the book series: Methods in Molecular Biology ((MIMB,volume 471))

Summary

In this chapter, we discuss statistical methods for various study designs that are commonly used in epidemiological research and particularly in cancer epidemiological research. After a brief review of basic concepts in epidemiological studies, statistical methods for case-control studies and cohort studies are discussed. Statistical methods for nested case-control and case-cohort studies, which have been increasingly used in cancer epidemiology, also are discussed. This chapter is designed for cancer epidemiologists who understand basic statistical methods for commonly used epidemiological study designs and are able to initiate power and sample size calculations. Therefore, this chapter emphasizes newly developed statistical methods for epidemiological studies as well as study planning.

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

Reference

Benjamin, Y., and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300.
Google Scholar
Westfall, P.H., and Young, S.S. (1993) Resampling-based Multiple Testing, New York : John Wiley & Sons, Inc.
Google Scholar
Hoh, J., Wille, A., Zee, R., et al. (2000) Selecting SNPs in two-stage analysis of disease association data: a model-free approach. Am. J. Hum. Genet. 64, l413–7. 4. Hintze, J.L. (2001) PASS: Power and Sample Size Software. East Kaysville UT.
Google Scholar
Hoh, J., Wille, A., Zee, R., et al. (2000) Selecting SNPs in two-stage analysis of disease association data: a model-free approach. Am. J. Hum. Genet. 64, l413–7.
Google Scholar
Hintze, J.L. (2001) PASS: Power and Sample Size Software. East Kaysville UT.
Google Scholar
Rothman, K.J. (1986) Modern Epidemiology, Boston/Toronto : Little, Brown and Company.
Google Scholar
Armitage, P., and Berr y, G. (1990) Statistical Methods in Medical Research, London : Cambridge University Press,.
Google Scholar
Breslow, N.E., and Day, N.E. (1980) Statistical Methods in Cancer Research, Volume I: The Analysis of Case-Control Studies, IARC Scientific Publications, No. 32, Lyon, France : International Agency for Research on Cancer.
Google Scholar
Armitage, P. (1955) Test for linear trends in proportions and frequencies. Biometrics 11, 375–386.
Article Google Scholar
Mantel, N. (1963). Chi-square tests with one degree of freedom: extensions of the Mantel-Haenszel procedure. J. Am. Stat. Assoc. 58, 690–700.
Article Google Scholar
Mantel, N., and Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. J. Natl. Cancer Inst. 22, 719–748.
CAS PubMed Google Scholar
Robins, J.M., Breslow, N.E., and Greenland, S. (1986) Estimators of the Mantel-Haen-szel variance consistent in both sparse data and large-strata limiting models. Biometrics 42, 311–323.
Article CAS PubMed Google Scholar
Agresti, A. (2002) Categorical Data Analysis, 2nd edition, New York : John Wiley & Sons, Inc.
Book Google Scholar
Hosmer, D.W, Jr., and Lemeshow, S. (2000) Applied Logistic Regression, 2nd edition, New York : John Wiley & Sons, Inc.
Book Google Scholar
Allison, P.D. (1999) Logistic Regression Using the SAS System: Theory and Application, Cary, NC : SAS Institute.
Google Scholar
SAS Institute. (1995) Logistic Regression Examples Using the SAS System, Cary, NC: SAS Institute Inc.
Google Scholar
Hsieh, F.Y., Block, D.A., and Larsen, M.D. (1998) A Simple Method of Sample Size Calculation for Linear and Logistic Regression. Stat. Med. 17, 1623–1634.
Article CAS PubMed Google Scholar
McNemar, Q. (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12, 153–157.
Article CAS PubMed Google Scholar
Liddell, F.D.K. (1983) Simplified exact analysis of case-referent studies: matched pairs; dichotomous exposure. J. Epidemiol. Community Health 37, 82–84.
Article CAS PubMed Google Scholar
Ury, H.K. (1975) Efficiency of casecontrol studies with multiple controls per case: continuous or dichotomous data. Biometrics 31, 643–649.
Article CAS PubMed Google Scholar
Cox, D.R., and Hinkley, D.V. (1974) Theoretical Statistics, London, UK : Chapman & Hall.
Google Scholar
Stokes, M.E., Davis, C.S., and Koch, G.G. (2000) Categorical Data Analysis Using the SAS System, 2nd edition. Cary, NC : SAS Institute.
Google Scholar
Allison, P.D. (1995) Survival Analysis Using the SAS System: A Practical Guide, Cary, NC : SAS Institute.
Google Scholar
Dupont, W. (1988) Power calculations for matched case-control studies. Biometrics 44, 1157–1168.
Article CAS PubMed Google Scholar
Walker, A.M. (1982) Anamorphic analysis: sampling and estimation for covariate effects when both exposure and disease are known. Biometrics 38, 1025–32.
Article CAS PubMed Google Scholar
White, J.E. (1982) A two-stage design for the study of the relationship between a rare exposure and a rare disease. Am. J. Epide miol. 115, 119–28.
CAS Google Scholar
Cain, K.C., and Breslow, N.E. (1988) Logistic regression analysis and efficient design for two-stage studies. Am. J. Epide miol. 128, 1198–206.
CAS Google Scholar
Scott, A.H., and Wild, C.J. (1997) Fitting regression models to case-control data by maximum likelihood. Biometrika 84, 57–71.
Article Google Scholar
Chatterjee, N., Chen, Y.H., and Breslow, N.E. (2003) A pseudoscore estimator for regression problems with two-stage sam pling. J. Am. Stat. Assoc. 98, 158–68.
Article Google Scholar
Reilly, M. (1996) Optimal sampling strate gies for two-stage studies. Am. J. Epidemiol. 143, 92–100.
CAS PubMed Google Scholar
Hanley, J.A., Csizmadi, I., and Collet, J.-P. (2005) Two-stage case-control studies: precision of parameter estimates and con siderations in selecting sample size. Am. J. Epidemiol. 162, 1225–1234.
Article PubMed Google Scholar
Thomas, D., Xie, R., and Mulugeta G. (2004) Two-stage sampling designs for gene association studies. Genet. Epidemiol. 27, 401–414.
Article PubMed Google Scholar
Maddala, G.S. (1983) Limited-Dependent and Qualitative Variables in Econometrics, New York: Cambridge University Press.
Google Scholar
Kalbfleisch, J.D., and Prentice, R.L. (1980) The Statistical Analysis of Failure Time Data, New York : John Wiley & Sons, Inc.
Google Scholar
Kaplan, E.L., and Meier, P. (1958) Nonpar-ametric estimation form incomplete obser vations. J. Am. Stat. Assoc. 53, 457–481.
Article Google Scholar
Greenwood, M. (1926) The errors of sam pling of the survivorship tables, in Reports on Public Health and Statistical Subjects, no. 33. London: HMSO, Appendix I.
Google Scholar
Miller, R.G., Jr. (1983) What Price Kaplan-Meier? Biometrics 39, 1077–1081.
Article PubMed Google Scholar
Meier, P., Karrison, T., Chappell, R., and Xie, H. (2004) The Price of Kaplan-Meier. J. Am. Stat. Assoc. 99, 890–896.
Article Google Scholar
Lawless, J.F. (1982) Statistical Methods and ethods for Lifetime Data, New York: John Wiley & Sons, Inc.
Google Scholar
Collett, D. (1994) Modeling Survival Data in Medical Research, p. 23, London, UK: Chapman & Hall.
Google Scholar
Tsiatis, A.A. (1975) Nonidentifiability aspect of the problem of competing risks. Proc. N Υ Acad. Sci. 72 (1), 20–22.
Article CAS Google Scholar
Breslow, N.E., and Day, N.E. (1987) Sta tistical Methods in Cancer Research, Vol ume II: The Design and Analysis of Cohort Studies, IARC Scientific Publications, No. 82, Lyon, France: International Agency for Research on Cancer.
Google Scholar
Cox, D.R. (1972) Regression models and life tables. J. R. Stat. Soc. Ser. B 20, 187–220.
Google Scholar
Cox, D.R., and Oakes, D. (1984) Analysis of Survival Data, London, UK: Chapman & Hall.
Google Scholar
Andersen, P.K., Borgan, Ø., Gill, R.D., and Keiding, N. (1992) Statistical Models Based on Counting Processes, New York: Springer-Verlag.
Google Scholar
Schoenfeld, D. (1982) Partial residuals for the proportional hazards regression model. Biometrika 69, 239–241.
Article Google Scholar
Thiebaut, A.C.M., and Benichou, J. (2004) Choice of time-scale in Cox's model analysis of epidemiologic cohort data: a simulation study. Stat. Med. 23, 3803–3820.
Article PubMed Google Scholar
Prentice, P.L., and Gloeckler, L.A. (1978) Regression analysis of grouped survival data with applications to breast cancer data. Bio metrics 34, 57–67.
CAS Google Scholar
Allison, P.D. (1982) Discrete-time methods for the analysis of event histories. In: Socio logical Methods and Research, 15 ed. S. Leinhardt, San Francisco, CA: Jossey-Bass, 61–98.
Google Scholar
D'Agostino, R.B., Lee, M.-L., Belanger, A.J., Cupples, L., Anderson, K., and Kan-nel, W.B. (1990) Relation of pooled logistic regression to time dependent Cox regres sion analysis: the Framingham Heart Study. Stat. Med. 9, 1501–1515.
Article PubMed Google Scholar
Sun, J. (2006) The Statistical Analysis of Interval-censored Failure Time Data, NY: Springer.
Google Scholar
Newman, A.B., Arnold, A.M., Naydeck, B.L., et al. (2003) Successful aging: effect of subclinical cardiovascular disease. Arch. Intern. Med. 163, 2315–2322.
Article PubMed Google Scholar
Wei, L.J., Lin, D.Y., and Weissfeld, L. (1989). Regression analysis of multivariate incomplete failure time data by modeling marginal distribution. J. Am. Stat. Assoc. 84, 1065–1073.
Article Google Scholar
Strickler, H.D., Palefsky, J.M., Shah, K.V., Anastos, K., Klein, R.S., Minkoff, H., Duerr, A., Massad, L.S., Celentano, D.D., Hall, C., Fazzari, M., Cu-Uvin, S., Bacon, M., Schu-man, P, Levine, A.M., Durante, A.J., Gange, S., Melnick, S., Burk, R.D. (2003). Human papillomavirus 16 and immune status in human immunodeficiency virus-seropositive women. J. Natl. Cancer Inst. 95, 1062–71.
Article PubMed Google Scholar
Strickler, H.D., Burk, R.D., Fazzari, M., Anastos, K., Minkoff, H., Massad, L.S., Hall, C., Bacon, M., Levine, A.M., Watts, H., Silverberg, M.J., Xue, X., Schlecht, N., Melnick, S., Palefsky, J.M. (2005). HPV Natural History and Possible HPV Reac tivation in HIV-Positive Women. J. Natl. Cancer Inst. 97, 577–86.
Article PubMed Google Scholar
Lee, E., Wei, L., and Amato, D. (1992) Cox-Type Regression Analysis for Large Numbers of Small Groups of Correlated Failure Time Observations, Netherlands: Kluwer Academic Publishers, 237–247.
Google Scholar
Andersen, P.K., and Gill, R.D. (1982). Cox's regression model counting process: a large sample study. Ann. Stat. 10, 1100–1120.
Article Google Scholar
Lin, D., Wei, L., Yang, I., and Ying, Z. (2000). Semiparametric regression for the mean and rate functions of recurrent events. J.R. Stat. Soc. B 62, 711–730.
Article Google Scholar
Lawless, J., and Nadeau, C. (1995) Some sim ple robust methods for the analysis of recur rent events. Technometrics 37, 158–168.
Article Google Scholar
Pepe, M., and Cai, J. (1993) Some graphi cal displays and marginal regression analyses for recurrent failure times and time dependent covariates. J. Am. Stat. Assoc. 88, 881–820.
Article Google Scholar
Prentice, R.L., Williams, B.J., and Peterson, A.V. (1981). On the regression analysis of multivariate failure time data. Biometrika 68, 373–379.
Article Google Scholar
Liang, K.Y., and Zeger, S.L. (1986) Longi tudinal data analysis using generalized linear models Biometrika 73, 13–22.
Article Google Scholar
Lipsitz, S.H., Kim, K., and Zhao, L. (1994) Analysis of repeated categorical data using generalized estimating equations. Stat. Med. 13, 1149–1163.
Article CAS PubMed Google Scholar
Miller, M.E., Davis, C.S., and Landis, J.R. (1993) The analysis of longitudinal poly-tomous data: generalized estimating equa tions and connections with weighted least squares. Biometrics 49, 1033–1044.
Article CAS PubMed Google Scholar
Zeger, S.L., Liang, K.-Y., and Albert, P.S. (1988) Models for longitudinal data: a gen eralized estimation equation approach. Bio metrics 44, 1049–1060.
CAS Google Scholar
Diggle, P.J., Liang, K.Y., and Zeger, S.L. (1994) Analysis of Longitudinal Data, Oxford: Clarendon Press.
Google Scholar
Goldfarb, N. (1960) An Introduction to Longitudinal Statistical Analysis-the Method of Repeated Observations from a Fixed Sample, Glencoe, IL: Free Press.
Google Scholar
Hoover, D.R. (2002) Power for t-test com parisons of unbalanced cluster exposure studies J Urban Health 79 (2), 278–94.
PubMed Google Scholar
Pan, W. (2001). Sample size and power cal culations with correlated binary data. Con trolled Clin. Trials 22, 211–227.
Article CAS Google Scholar
Kupper, L.L., McMichael, A.J., and Spirtas, R. (1975) A hybrid epidemiologic study design useful in estimating relative risk. J. Am. Stat. Assoc. 351, 524–528.
Article Google Scholar
Breslow, N.E., Lubin, J.H., Marek, P., and Langholz, B. (1983) Multiplicative models and cohort analysis. J. Am. Stat. Assoc. 78, 1–12.
Article Google Scholar
Ernster, V.L. (1994) Nested case-control studies. Prev. Med. 23, 587–590.
Article CAS PubMed Google Scholar
Essebag, V., Genest J., Suissa S., and Pilote L. (2003). The nested case-control study in cardiology. Am. Heart J. 146, 581–590.
Article PubMed Google Scholar
Sidney, S., Friedman, G.D., and Hiatt R.A. (1986). Serum cholesterol and large bowel cancer. Am. J. Epidemiol. 124, 33–38.
CAS PubMed Google Scholar
Krieger, N., Wolff, M.S., Hiatt, R.A., Riv era, M., Vogelman, J., and Orentreich, N. (1994) Breast cancer and serum organochlo rines. J. Natl. Cancer Inst. 86, 589–599.
Article CAS PubMed Google Scholar
Langholz, B., and Clayton, D. (1994). Sam pling strategies in nested case-control stud ies. Environ. Health Perspect. 102 (Suppl 8), 46–51.
Google Scholar
Steenland, K., Deddens, J.A. (1997) Increased precision using counter-matching in nested case-control studies. Epidemiology 8, 238–42.
Article CAS PubMed Google Scholar
Langholz, B. (2005) Counter-matching. Encyclopedia of Biostatistics. 2nd edition. Vol.2. ed. P. Armitage and T. Colton, Chichester, UK: John Wiley & Sons, Ltd., 1248–1254.
Google Scholar
Cologne, J.B., Sharp, G.B., Neriishi, K., Verkasalo, P.K., Land, C.E. and Nakachi, K. (2004). Improving the efficiency of nested case-control studies of interaction by select ing controls using counter matching on exposure. Int. J. Epidemiol. 33, 485–492.
Article PubMed Google Scholar
Andrieu, N., Goldstein, A.M., Thomas, D.C., and Langholz, B. (2001) Counter-matching in studies of gene-environment interaction: efficiency and feasibility. Am. J. Epidemiol. 153, 265–274.
Article CAS PubMed Google Scholar
Prentice, R. (1986). A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73, 1–11.
Article Google Scholar
Chen, K. (2001). Generalized case-cohort sampling. J. R. Stat. Soc. B 63, 791–809.
Article Google Scholar
Self, S.G., and Prentice, R. (1988). Asymp totic distribution theory and efficiency results for case-cohort studies. Ann. Stat. 16, 64–81.
Article Google Scholar
Barlow, W.E. (1994) Robust variance esti mation for the case-cohort design. Biometrics 50, 1064–1072.
Article CAS PubMed Google Scholar
Therneau, T.M., and Li, H. (1999) Com puting the Cox model for case-cohort designs. Lifetime Data Anal. 5, 99–112.
Article CAS PubMed Google Scholar
Ramadhani, M.K., Elias, S.G., van Noord, P.A.H., et al. (2005) Innate left handedness and risk of breast cancer: case-cohort study. BMJ 331, 882–883.
Article PubMed Google Scholar
Savitz, D.A., Cai, J, van Wijngaarden, E., et al. (2000) Case-cohort analysis of brain cancer and leukemia in electric utility workers using a refined magnetic field job-exposure matrix. Am. J. Ind. Med. 38, 417–425.
Article CAS PubMed Google Scholar
Zeka, A., Eisen, E.A., Kriebel, D, et al. (2004). Risk of upper aerodigestive tract cancers in a case-cohort study of autowork ers exposed to metalworking fluids. Occup. Environ. Med. 61, 426–431.
Article CAS PubMed Google Scholar
Cai, J., and Zeng, D. (2004) Sample size/ power calculation of case-cohort studies. Biometrics 60, 1015–1024.
Article PubMed Google Scholar
Kim, M.Y., Xue, X., and Du. Y. (2006) Approaches for calculating power for case-cohort studies. Biometrics 62, 929–933.
Article PubMed Google Scholar
Wacholder, S. (1991) Practical considera tions in choosing between the case-cohort and nested case-control designs. Epidemiol ogy 2, 155–158.
CAS Google Scholar
Barlow, W.E., Ichikawa L., Rosner, D., and Izumi S. (1999) Analysis of case-cohort designs. J. Clin. Epidemiol. 52, 1165–1172.
Article CAS PubMed Google Scholar
Langholz, B., and Thomas, D.C. (1990) Nested case-control and case-cohort methods of sampling from a cohort: a critical compari son. Am. J. Epidemiol. 131, 169–176.
CAS PubMed Google Scholar
Langholz, B., and Thomas, D.C. (1991) Effi ciency of cohort sampling designs: some sur prising results. Biometrics 47, 1563–1571.
Article CAS PubMed Google Scholar
Matanoski, G.M., and Tao, X. (2003) Sty rene exposure and ischemic heart disease: a case-cohort study. Am. J. Epidemiol. 158, 988–995.
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
Xiaonan Xue
Department of Statistics and Institute for Health Care Policy and Aging Research, Rutgers University, New Brunswick, NJ, USA
Donald R. Hoover

Authors

Xiaonan Xue
View author publications
You can also search for this author in PubMed Google Scholar
Donald R. Hoover
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Division of Cancer Control and Population Sciences Bethesda, Maryland, 20892, USA
Mukesh Verma PhD

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Xue, X., Hoover, D.R. (2009). Statistical Methods in Cancer Epidemiological Studies. In: Verma, M. (eds) Cancer Epidemiology. Methods in Molecular Biology, vol 471. Humana Press. https://doi.org/10.1007/978-1-59745-416-2_13

Download citation

DOI: https://doi.org/10.1007/978-1-59745-416-2_13
Publisher Name: Humana Press
Print ISBN: 978-1-58829-987-1
Online ISBN: 978-1-59745-416-2
eBook Packages: Springer Protocols

Publish with us

Policies and ethics