Abstract
Many common human diseases are complex and are expected to be highly heterogeneous, with multiple causative loci and multiple rare and common variants at some of the causative loci contributing to the risk of these diseases. Data from the genome-wide association studies (GWAS) and metadata such as known gene functions and pathways provide the possibility of identifying genetic variants, genes and pathways that are associated with complex phenotypes. Single-marker-based tests have been very successful in identifying thousands of genetic variants for hundreds of complex phenotypes. However, these variants only explain very small percentages of the heritabilities. To account for the locus- and allelic-heterogeneity, gene-based and pathway-based tests can be very useful in the next stage of the analysis of GWAS data. U-statistics, which summarize the genomic similarity between pair of individuals and link the genomic similarity to phenotype similarity, have proved to be very useful for testing the associations between a set of single nucleotide polymorphisms and the phenotypes. Compared to single marker analysis, the advantages afforded by the U-statistics-based methods is large when the number of markers involved is large. We review several formulations of U-statistics in genetic association studies and point out the links of these statistics with other similarity-based tests of genetic association. Finally, potential application of U-statistics in analysis of the next-generation sequencing data and rare variants association studies are discussed.
Similar content being viewed by others
References
Bansal V, Libiger O, Torkamani A, Schork NJ (2010) Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet 11:773–785
Hoeffding W (1948) A class of statistics with asymptotically normal distributions. Ann Stat 19:293–325
Hindorff LA, Junkins HA, Hall PN, Mehta JP, Manolio TA (2011) A catalog of published genome-wide association studies. http://www.genome.gov/gwastudies
Huang H, Chanda P, Alonso A, Bader JS, Arking DE (2011) Gene-based tests of association. PLoS Genetics 7:e1002177. doi:10.1371/journal.pgen.1002177
Kwee LC, Liu D, Lin X, Ghosh D, Epstein MP (2008): A powerful and flexible multilocus association test for quantitative traits. Am J Hum Genet 82:386–397
Lee AJ (1990) U-statistics: theory and practice. Marcel Dekker, New York
Li M, Ye C, Fu W, Elston RC, Lu Q (2011) Detecting genetic interactions for quantitative traits with U-Statistics. Genet Epidemiol 35:457–468
Maher B (2008) Personal genomes: The case of the missing heritability. Nature 456:1821
Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A et al (2009) Finding the missing heritability of complex diseases. Nature 461:747–753
McArdle BH, Anderson MJ (2001) Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82:290–297
McKinney BA, Crowe JE Jr, Guo J, Tian D (2009) Capturing the spectrum of interaction effects in genetic association studies by simulated evaporative cooling network analysis. PLoS Genet 5:e1000432. doi:10.1371/journal.pgen.1000432
Moore JH, White BC (2007) Tuning ReliefF for genome-wide genetic analysis. In: Lecture notes in computer science: evolutionary computation, machine learning, and data mining in bioinformatics. Springer, Berlin, pp 166–175
Nguyen LB, Diskin SJ, Cappasso M, Wang K, Diamond MA, Glessner J, Kim C, Attiyeh EF, Mosse YP, Cole K, Lolascon A, Devoto M, Hakonarson H, Li H, Maris JM (2011) Phenotype restricted genome-wide association study using a gene-centric approach identifies three low-risk neuroblastoma susceptibility loci. PLoS Genet 7(3):e1002026. doi:10.1371/journal.pgen.1002026
Pan W (2011) Relationship between genomic distance-based regression and kernel machine regression for multi-marker association testing. Genet Epidemiol 35:211–216
Schaid DJ (2010) Genomic similarity and kernel methods II: methods for genomic information. Hum Hered 70:132–140
Schaid DJ, McDonnell SK, Hebbring SJ, Cunningham JM, Thibodeau SN (2005) Nonparametric tests of association of multiple genes with human disease. Am J Hum Genet 76:780–793
Sen PK (2006) Robust statistical inference for high-dimensional data models with application to genomics. Aust J Stat 35:197–214
The Gene Ontology Consortium (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29
Tzeng JY, Zhang D, Chang SM, Thomas DC, Davidian M (2009) Gene-trait similarity regression for multimarker-based association analysis. Biometrics 65:822–832
Tzeng JY, Zhang D, Pongpanich M, Smith C, McCarthy MI, Sale MM, Bradford BW, Hsu FC, Thomas DC, Sullivan PF (2011) Detecting gene and gene-environment effects of common and uncommon variants on quantitative traits: a marker-set approach using gene-trait similarity regression. Am J Hum Genet 89:277–288
Wahba G (2012) Dissimilarity data in statistical model building and machine learning. In: Ji L, Poon YS, Yang L, Yao S-T (eds) Fifth international congress of chinese mathematicians, AMS/IP studies in advanced mathematics, pp 51:785–809
Wang K, Li M, Bucan M (2007) Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 81:1278–1283
Wei Z, Li M, Rebbeck T, Li H (2008) U-statistics-based tests for multiple genes in genetic association studies. Ann Hum Genet 72:821–833
Wessel J, Schork NJ (2006) Generalized genomic distance based regression methodology for multilocus association analysis. Am J Hum Genet 79:792–806
Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, Hunter DJ, Lin X (2010) Powerful SNP-set analysis for case-control genome-wide association studies. Am J Hum Genet 86:929–942
Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89:82–93
Zhong PS, Chen SX (2011) Tests for high-dimensional regression coefficients with factorial designs. J Am Stat Assoc 106:260–274
Acknowledgments
This research was supported by NIH grant CA127334. I thank the editor Dr. David-Alexandre Trégouët for inviting me to contribute this review.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, H. U-statistics in genetic association studies. Hum Genet 131, 1395–1401 (2012). https://doi.org/10.1007/s00439-012-1178-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-012-1178-y