Abstract
Genome-wide association studies (GWAS) search for associations between genetic variants and disease status, typically via logistic regression. Often there are covariates, such as sex or well-established major genetic factors, that are known to affect disease susceptibility and are independent of tested genotypes at the population level. We show theoretically and with data from recent GWAS on multiple sclerosis, psoriasis and ankylosing spondylitis that inclusion of known covariates can substantially reduce power for the identification of associated variants when the disease prevalence is lower than a few percent. Whether the inclusion of such covariates reduces or increases power to detect genetic effects depends on various factors, including the prevalence of the disease studied. When the disease is common (prevalence of >20%), the inclusion of covariates typically increases power, whereas, for rarer diseases, it can often decrease power to detect new genetic associations.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Spencer, C.C., Su, Z., Donnelly, P. & Marchini, J. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet. 5, e1000477 (2009).
Robinson, L.D. & Jewell, N.P. Some surprising results about covariate adjustment in logistic-regression models. Int. Stat. Rev. 59, 227–240 (1991).
Prentice, R.L. & Pyke, R. Logistic disease incidence models and case-control studies. Biometrika 66, 403–411 (1979).
Neuhaus, J.M. & Jewell, N.P. A geometric approach to assess bias due to omitted covariates in generalized linear-models. Biometrika 80, 807–815 (1993).
Stringer, S., Wray, N.R., Kahn, R.S. & Derks, E.M. Underestimated effect sizes in GWAS: fundamental limitations of single SNP analysis for dichotomous phenotypes. PLoS ONE 6, e27964 (2011).
Neuhaus, J.M. Estimation efficiency with omitted covariates in generalized linear models. J. Am. Stat. Assoc. 93, 1124–1129 (1998).
Xing, G. & Xing, C. Adjusting for covariates in logistic regression models. Genet. Epidemiol. 34, 769–771 (2010).
Lin, D.Y. & Zeng, D. On the relative efficiency of using summary statistics versus individual-level data in meta-analysis. Biometrika 97, 321–332 (2010).
International Multiple Sclerosis Genetics Consortium & Wellcome Trust Case Control Consortium 2. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219 (2011).
Genetic Analysis of Psoriasis Consortium & Wellcome Trust Case Control Consortium 2. A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1. Nat. Genet. 42, 985–990 (2010).
Australo-Anglo-American Spondyloarthritis Consortium & Wellcome Trust Case Control Consortium 2. Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility. Nat. Genet. 43, 761–767 (2011).
Chasman, D.I. et al. Genome-wide association study reveals three susceptibility loci for common migraine in the general population. Nat. Genet. 43, 695–698 (2011).
Lee, L.F. Specification error in multinomial logit-models—analysis of the omitted variable bias. J. Econom. 20, 197–209 (1982).
Vukcevic, D., Hechter, E., Spencer, C. & Donnelly, P. Disease model distortion in association studies. Genet. Epidemiol. 35, 278–290 (2011).
Acknowledgements
We thank G. Nicholson for helpful comments. This work was funded by the Wellcome Trust, as part of the Wellcome Trust Case Control Consortium 2 project (085475/B/08/Z and 085475/Z/08/Z) and through the Wellcome Trust core grant for the Wellcome Trust Centre for Human Genetics (090532/Z/09/Z). P.D. was supported in part by a Wolfson Royal Society Merit Award and a Wellcome Trust Senior Investigator Award (095552/Z/11/Z). C.C.A.S. was supported in part by a Wellcome Trust Career Development Fellowship (097364/Z/11/Z).
Author information
Authors and Affiliations
Contributions
M.P., P.D. and C.C.A.S. jointly designed the study and wrote the paper. M.P. derived the mathematical results and carried out the example analyses.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–7 and Supplementary Note (PDF 259 kb)
Rights and permissions
About this article
Cite this article
Pirinen, M., Donnelly, P. & Spencer, C. Including known covariates can reduce power to detect genetic effects in case-control studies. Nat Genet 44, 848–851 (2012). https://doi.org/10.1038/ng.2346
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.2346
This article is cited by
-
High-resolution African HLA resource uncovers HLA-DRB1 expression effects underlying vaccine response
Nature Medicine (2024)
-
COLLAGENE enables privacy-aware federated and collaborative genomic data analysis
Genome Biology (2023)
-
A mixed framework for causal impact analysis under confounding and selection biases: a focus on Egra dataset
International Journal of Information Technology (2023)
-
HostSeq: a Canadian whole genome sequencing and clinical data resource
BMC Genomic Data (2023)
-
NOD1 rs2075820 (p.E266K) polymorphism is associated with gastric cancer among individuals infected with cagPAI-positive H. pylori
Biological Research (2021)