Summary
In this chapter, we discuss statistical methods for various study designs that are commonly used in epidemiological research and particularly in cancer epidemiological research. After a brief review of basic concepts in epidemiological studies, statistical methods for case-control studies and cohort studies are discussed. Statistical methods for nested case-control and case-cohort studies, which have been increasingly used in cancer epidemiology, also are discussed. This chapter is designed for cancer epidemiologists who understand basic statistical methods for commonly used epidemiological study designs and are able to initiate power and sample size calculations. Therefore, this chapter emphasizes newly developed statistical methods for epidemiological studies as well as study planning.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReference
Benjamin, Y., and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300.
Westfall, P.H., and Young, S.S. (1993) Resampling-based Multiple Testing, New York : John Wiley & Sons, Inc.
Hoh, J., Wille, A., Zee, R., et al. (2000) Selecting SNPs in two-stage analysis of disease association data: a model-free approach. Am. J. Hum. Genet. 64, l413–7. 4. Hintze, J.L. (2001) PASS: Power and Sample Size Software. East Kaysville UT.
Hoh, J., Wille, A., Zee, R., et al. (2000) Selecting SNPs in two-stage analysis of disease association data: a model-free approach. Am. J. Hum. Genet. 64, l413–7.
Hintze, J.L. (2001) PASS: Power and Sample Size Software. East Kaysville UT.
Rothman, K.J. (1986) Modern Epidemiology, Boston/Toronto : Little, Brown and Company.
Armitage, P., and Berr y, G. (1990) Statistical Methods in Medical Research, London : Cambridge University Press,.
Breslow, N.E., and Day, N.E. (1980) Statistical Methods in Cancer Research, Volume I: The Analysis of Case-Control Studies, IARC Scientific Publications, No. 32, Lyon, France : International Agency for Research on Cancer.
Armitage, P. (1955) Test for linear trends in proportions and frequencies. Biometrics 11, 375–386.
Mantel, N. (1963). Chi-square tests with one degree of freedom: extensions of the Mantel-Haenszel procedure. J. Am. Stat. Assoc. 58, 690–700.
Mantel, N., and Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. J. Natl. Cancer Inst. 22, 719–748.
Robins, J.M., Breslow, N.E., and Greenland, S. (1986) Estimators of the Mantel-Haen-szel variance consistent in both sparse data and large-strata limiting models. Biometrics 42, 311–323.
Agresti, A. (2002) Categorical Data Analysis, 2nd edition, New York : John Wiley & Sons, Inc.
Hosmer, D.W, Jr., and Lemeshow, S. (2000) Applied Logistic Regression, 2nd edition, New York : John Wiley & Sons, Inc.
Allison, P.D. (1999) Logistic Regression Using the SAS System: Theory and Application, Cary, NC : SAS Institute.
SAS Institute. (1995) Logistic Regression Examples Using the SAS System, Cary, NC: SAS Institute Inc.
Hsieh, F.Y., Block, D.A., and Larsen, M.D. (1998) A Simple Method of Sample Size Calculation for Linear and Logistic Regression. Stat. Med. 17, 1623–1634.
McNemar, Q. (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12, 153–157.
Liddell, F.D.K. (1983) Simplified exact analysis of case-referent studies: matched pairs; dichotomous exposure. J. Epidemiol. Community Health 37, 82–84.
Ury, H.K. (1975) Efficiency of casecontrol studies with multiple controls per case: continuous or dichotomous data. Biometrics 31, 643–649.
Cox, D.R., and Hinkley, D.V. (1974) Theoretical Statistics, London, UK : Chapman & Hall.
Stokes, M.E., Davis, C.S., and Koch, G.G. (2000) Categorical Data Analysis Using the SAS System, 2nd edition. Cary, NC : SAS Institute.
Allison, P.D. (1995) Survival Analysis Using the SAS System: A Practical Guide, Cary, NC : SAS Institute.
Dupont, W. (1988) Power calculations for matched case-control studies. Biometrics 44, 1157–1168.
Walker, A.M. (1982) Anamorphic analysis: sampling and estimation for covariate effects when both exposure and disease are known. Biometrics 38, 1025–32.
White, J.E. (1982) A two-stage design for the study of the relationship between a rare exposure and a rare disease. Am. J. Epide miol. 115, 119–28.
Cain, K.C., and Breslow, N.E. (1988) Logistic regression analysis and efficient design for two-stage studies. Am. J. Epide miol. 128, 1198–206.
Scott, A.H., and Wild, C.J. (1997) Fitting regression models to case-control data by maximum likelihood. Biometrika 84, 57–71.
Chatterjee, N., Chen, Y.H., and Breslow, N.E. (2003) A pseudoscore estimator for regression problems with two-stage sam pling. J. Am. Stat. Assoc. 98, 158–68.
Reilly, M. (1996) Optimal sampling strate gies for two-stage studies. Am. J. Epidemiol. 143, 92–100.
Hanley, J.A., Csizmadi, I., and Collet, J.-P. (2005) Two-stage case-control studies: precision of parameter estimates and con siderations in selecting sample size. Am. J. Epidemiol. 162, 1225–1234.
Thomas, D., Xie, R., and Mulugeta G. (2004) Two-stage sampling designs for gene association studies. Genet. Epidemiol. 27, 401–414.
Maddala, G.S. (1983) Limited-Dependent and Qualitative Variables in Econometrics, New York: Cambridge University Press.
Kalbfleisch, J.D., and Prentice, R.L. (1980) The Statistical Analysis of Failure Time Data, New York : John Wiley & Sons, Inc.
Kaplan, E.L., and Meier, P. (1958) Nonpar-ametric estimation form incomplete obser vations. J. Am. Stat. Assoc. 53, 457–481.
Greenwood, M. (1926) The errors of sam pling of the survivorship tables, in Reports on Public Health and Statistical Subjects, no. 33. London: HMSO, Appendix I.
Miller, R.G., Jr. (1983) What Price Kaplan-Meier? Biometrics 39, 1077–1081.
Meier, P., Karrison, T., Chappell, R., and Xie, H. (2004) The Price of Kaplan-Meier. J. Am. Stat. Assoc. 99, 890–896.
Lawless, J.F. (1982) Statistical Methods and ethods for Lifetime Data, New York: John Wiley & Sons, Inc.
Collett, D. (1994) Modeling Survival Data in Medical Research, p. 23, London, UK: Chapman & Hall.
Tsiatis, A.A. (1975) Nonidentifiability aspect of the problem of competing risks. Proc. N Υ Acad. Sci. 72 (1), 20–22.
Breslow, N.E., and Day, N.E. (1987) Sta tistical Methods in Cancer Research, Vol ume II: The Design and Analysis of Cohort Studies, IARC Scientific Publications, No. 82, Lyon, France: International Agency for Research on Cancer.
Cox, D.R. (1972) Regression models and life tables. J. R. Stat. Soc. Ser. B 20, 187–220.
Cox, D.R., and Oakes, D. (1984) Analysis of Survival Data, London, UK: Chapman & Hall.
Andersen, P.K., Borgan, Ø., Gill, R.D., and Keiding, N. (1992) Statistical Models Based on Counting Processes, New York: Springer-Verlag.
Schoenfeld, D. (1982) Partial residuals for the proportional hazards regression model. Biometrika 69, 239–241.
Thiebaut, A.C.M., and Benichou, J. (2004) Choice of time-scale in Cox's model analysis of epidemiologic cohort data: a simulation study. Stat. Med. 23, 3803–3820.
Prentice, P.L., and Gloeckler, L.A. (1978) Regression analysis of grouped survival data with applications to breast cancer data. Bio metrics 34, 57–67.
Allison, P.D. (1982) Discrete-time methods for the analysis of event histories. In: Socio logical Methods and Research, 15 ed. S. Leinhardt, San Francisco, CA: Jossey-Bass, 61–98.
D'Agostino, R.B., Lee, M.-L., Belanger, A.J., Cupples, L., Anderson, K., and Kan-nel, W.B. (1990) Relation of pooled logistic regression to time dependent Cox regres sion analysis: the Framingham Heart Study. Stat. Med. 9, 1501–1515.
Sun, J. (2006) The Statistical Analysis of Interval-censored Failure Time Data, NY: Springer.
Newman, A.B., Arnold, A.M., Naydeck, B.L., et al. (2003) Successful aging: effect of subclinical cardiovascular disease. Arch. Intern. Med. 163, 2315–2322.
Wei, L.J., Lin, D.Y., and Weissfeld, L. (1989). Regression analysis of multivariate incomplete failure time data by modeling marginal distribution. J. Am. Stat. Assoc. 84, 1065–1073.
Strickler, H.D., Palefsky, J.M., Shah, K.V., Anastos, K., Klein, R.S., Minkoff, H., Duerr, A., Massad, L.S., Celentano, D.D., Hall, C., Fazzari, M., Cu-Uvin, S., Bacon, M., Schu-man, P, Levine, A.M., Durante, A.J., Gange, S., Melnick, S., Burk, R.D. (2003). Human papillomavirus 16 and immune status in human immunodeficiency virus-seropositive women. J. Natl. Cancer Inst. 95, 1062–71.
Strickler, H.D., Burk, R.D., Fazzari, M., Anastos, K., Minkoff, H., Massad, L.S., Hall, C., Bacon, M., Levine, A.M., Watts, H., Silverberg, M.J., Xue, X., Schlecht, N., Melnick, S., Palefsky, J.M. (2005). HPV Natural History and Possible HPV Reac tivation in HIV-Positive Women. J. Natl. Cancer Inst. 97, 577–86.
Lee, E., Wei, L., and Amato, D. (1992) Cox-Type Regression Analysis for Large Numbers of Small Groups of Correlated Failure Time Observations, Netherlands: Kluwer Academic Publishers, 237–247.
Andersen, P.K., and Gill, R.D. (1982). Cox's regression model counting process: a large sample study. Ann. Stat. 10, 1100–1120.
Lin, D., Wei, L., Yang, I., and Ying, Z. (2000). Semiparametric regression for the mean and rate functions of recurrent events. J.R. Stat. Soc. B 62, 711–730.
Lawless, J., and Nadeau, C. (1995) Some sim ple robust methods for the analysis of recur rent events. Technometrics 37, 158–168.
Pepe, M., and Cai, J. (1993) Some graphi cal displays and marginal regression analyses for recurrent failure times and time dependent covariates. J. Am. Stat. Assoc. 88, 881–820.
Prentice, R.L., Williams, B.J., and Peterson, A.V. (1981). On the regression analysis of multivariate failure time data. Biometrika 68, 373–379.
Liang, K.Y., and Zeger, S.L. (1986) Longi tudinal data analysis using generalized linear models Biometrika 73, 13–22.
Lipsitz, S.H., Kim, K., and Zhao, L. (1994) Analysis of repeated categorical data using generalized estimating equations. Stat. Med. 13, 1149–1163.
Miller, M.E., Davis, C.S., and Landis, J.R. (1993) The analysis of longitudinal poly-tomous data: generalized estimating equa tions and connections with weighted least squares. Biometrics 49, 1033–1044.
Zeger, S.L., Liang, K.-Y., and Albert, P.S. (1988) Models for longitudinal data: a gen eralized estimation equation approach. Bio metrics 44, 1049–1060.
Diggle, P.J., Liang, K.Y., and Zeger, S.L. (1994) Analysis of Longitudinal Data, Oxford: Clarendon Press.
Goldfarb, N. (1960) An Introduction to Longitudinal Statistical Analysis-the Method of Repeated Observations from a Fixed Sample, Glencoe, IL: Free Press.
Hoover, D.R. (2002) Power for t-test com parisons of unbalanced cluster exposure studies J Urban Health 79 (2), 278–94.
Pan, W. (2001). Sample size and power cal culations with correlated binary data. Con trolled Clin. Trials 22, 211–227.
Kupper, L.L., McMichael, A.J., and Spirtas, R. (1975) A hybrid epidemiologic study design useful in estimating relative risk. J. Am. Stat. Assoc. 351, 524–528.
Breslow, N.E., Lubin, J.H., Marek, P., and Langholz, B. (1983) Multiplicative models and cohort analysis. J. Am. Stat. Assoc. 78, 1–12.
Ernster, V.L. (1994) Nested case-control studies. Prev. Med. 23, 587–590.
Essebag, V., Genest J., Suissa S., and Pilote L. (2003). The nested case-control study in cardiology. Am. Heart J. 146, 581–590.
Sidney, S., Friedman, G.D., and Hiatt R.A. (1986). Serum cholesterol and large bowel cancer. Am. J. Epidemiol. 124, 33–38.
Krieger, N., Wolff, M.S., Hiatt, R.A., Riv era, M., Vogelman, J., and Orentreich, N. (1994) Breast cancer and serum organochlo rines. J. Natl. Cancer Inst. 86, 589–599.
Langholz, B., and Clayton, D. (1994). Sam pling strategies in nested case-control stud ies. Environ. Health Perspect. 102 (Suppl 8), 46–51.
Steenland, K., Deddens, J.A. (1997) Increased precision using counter-matching in nested case-control studies. Epidemiology 8, 238–42.
Langholz, B. (2005) Counter-matching. Encyclopedia of Biostatistics. 2nd edition. Vol.2. ed. P. Armitage and T. Colton, Chichester, UK: John Wiley & Sons, Ltd., 1248–1254.
Cologne, J.B., Sharp, G.B., Neriishi, K., Verkasalo, P.K., Land, C.E. and Nakachi, K. (2004). Improving the efficiency of nested case-control studies of interaction by select ing controls using counter matching on exposure. Int. J. Epidemiol. 33, 485–492.
Andrieu, N., Goldstein, A.M., Thomas, D.C., and Langholz, B. (2001) Counter-matching in studies of gene-environment interaction: efficiency and feasibility. Am. J. Epidemiol. 153, 265–274.
Prentice, R. (1986). A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73, 1–11.
Chen, K. (2001). Generalized case-cohort sampling. J. R. Stat. Soc. B 63, 791–809.
Self, S.G., and Prentice, R. (1988). Asymp totic distribution theory and efficiency results for case-cohort studies. Ann. Stat. 16, 64–81.
Barlow, W.E. (1994) Robust variance esti mation for the case-cohort design. Biometrics 50, 1064–1072.
Therneau, T.M., and Li, H. (1999) Com puting the Cox model for case-cohort designs. Lifetime Data Anal. 5, 99–112.
Ramadhani, M.K., Elias, S.G., van Noord, P.A.H., et al. (2005) Innate left handedness and risk of breast cancer: case-cohort study. BMJ 331, 882–883.
Savitz, D.A., Cai, J, van Wijngaarden, E., et al. (2000) Case-cohort analysis of brain cancer and leukemia in electric utility workers using a refined magnetic field job-exposure matrix. Am. J. Ind. Med. 38, 417–425.
Zeka, A., Eisen, E.A., Kriebel, D, et al. (2004). Risk of upper aerodigestive tract cancers in a case-cohort study of autowork ers exposed to metalworking fluids. Occup. Environ. Med. 61, 426–431.
Cai, J., and Zeng, D. (2004) Sample size/ power calculation of case-cohort studies. Biometrics 60, 1015–1024.
Kim, M.Y., Xue, X., and Du. Y. (2006) Approaches for calculating power for case-cohort studies. Biometrics 62, 929–933.
Wacholder, S. (1991) Practical considera tions in choosing between the case-cohort and nested case-control designs. Epidemiol ogy 2, 155–158.
Barlow, W.E., Ichikawa L., Rosner, D., and Izumi S. (1999) Analysis of case-cohort designs. J. Clin. Epidemiol. 52, 1165–1172.
Langholz, B., and Thomas, D.C. (1990) Nested case-control and case-cohort methods of sampling from a cohort: a critical compari son. Am. J. Epidemiol. 131, 169–176.
Langholz, B., and Thomas, D.C. (1991) Effi ciency of cohort sampling designs: some sur prising results. Biometrics 47, 1563–1571.
Matanoski, G.M., and Tao, X. (2003) Sty rene exposure and ischemic heart disease: a case-cohort study. Am. J. Epidemiol. 158, 988–995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Xue, X., Hoover, D.R. (2009). Statistical Methods in Cancer Epidemiological Studies. In: Verma, M. (eds) Cancer Epidemiology. Methods in Molecular Biology, vol 471. Humana Press. https://doi.org/10.1007/978-1-59745-416-2_13
Download citation
DOI: https://doi.org/10.1007/978-1-59745-416-2_13
Publisher Name: Humana Press
Print ISBN: 978-1-58829-987-1
Online ISBN: 978-1-59745-416-2
eBook Packages: Springer Protocols