Skip to main content
Log in

U-statistics in genetic association studies

  • Review Paper
  • Published:
Human Genetics Aims and scope Submit manuscript

Abstract

Many common human diseases are complex and are expected to be highly heterogeneous, with multiple causative loci and multiple rare and common variants at some of the causative loci contributing to the risk of these diseases. Data from the genome-wide association studies (GWAS) and metadata such as known gene functions and pathways provide the possibility of identifying genetic variants, genes and pathways that are associated with complex phenotypes. Single-marker-based tests have been very successful in identifying thousands of genetic variants for hundreds of complex phenotypes. However, these variants only explain very small percentages of the heritabilities. To account for the locus- and allelic-heterogeneity, gene-based and pathway-based tests can be very useful in the next stage of the analysis of GWAS data. U-statistics, which summarize the genomic similarity between pair of individuals and link the genomic similarity to phenotype similarity, have proved to be very useful for testing the associations between a set of single nucleotide polymorphisms and the phenotypes. Compared to single marker analysis, the advantages afforded by the U-statistics-based methods is large when the number of markers involved is large. We review several formulations of U-statistics in genetic association studies and point out the links of these statistics with other similarity-based tests of genetic association. Finally, potential application of U-statistics in analysis of the next-generation sequencing data and rare variants association studies are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bansal V, Libiger O, Torkamani A, Schork NJ (2010) Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet 11:773–785

    Article  PubMed  CAS  Google Scholar 

  • Hoeffding W (1948) A class of statistics with asymptotically normal distributions. Ann Stat 19:293–325

    Article  Google Scholar 

  • Hindorff LA, Junkins HA, Hall PN, Mehta JP, Manolio TA (2011) A catalog of published genome-wide association studies. http://www.genome.gov/gwastudies

  • Huang H, Chanda P, Alonso A, Bader JS, Arking DE (2011) Gene-based tests of association. PLoS Genetics 7:e1002177. doi:10.1371/journal.pgen.1002177

  • Kwee LC, Liu D, Lin X, Ghosh D, Epstein MP (2008): A powerful and flexible multilocus association test for quantitative traits. Am J Hum Genet 82:386–397

    Article  PubMed  CAS  Google Scholar 

  • Lee AJ (1990) U-statistics: theory and practice. Marcel Dekker, New York

  • Li M, Ye C, Fu W, Elston RC, Lu Q (2011) Detecting genetic interactions for quantitative traits with U-Statistics. Genet Epidemiol 35:457–468

    PubMed  Google Scholar 

  • Maher B (2008) Personal genomes: The case of the missing heritability. Nature 456:1821

    Google Scholar 

  • Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A et al (2009) Finding the missing heritability of complex diseases. Nature 461:747–753

    Article  PubMed  CAS  Google Scholar 

  • McArdle BH, Anderson MJ (2001) Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82:290–297

    Article  Google Scholar 

  • McKinney BA, Crowe JE Jr, Guo J, Tian D (2009) Capturing the spectrum of interaction effects in genetic association studies by simulated evaporative cooling network analysis. PLoS Genet 5:e1000432. doi:10.1371/journal.pgen.1000432

  • Moore JH, White BC (2007) Tuning ReliefF for genome-wide genetic analysis. In: Lecture notes in computer science: evolutionary computation, machine learning, and data mining in bioinformatics. Springer, Berlin, pp 166–175

  • Nguyen LB, Diskin SJ, Cappasso M, Wang K, Diamond MA, Glessner J, Kim C, Attiyeh EF, Mosse YP, Cole K, Lolascon A, Devoto M, Hakonarson H, Li H, Maris JM (2011) Phenotype restricted genome-wide association study using a gene-centric approach identifies three low-risk neuroblastoma susceptibility loci. PLoS Genet 7(3):e1002026. doi:10.1371/journal.pgen.1002026

  • Pan W (2011) Relationship between genomic distance-based regression and kernel machine regression for multi-marker association testing. Genet Epidemiol 35:211–216

    Article  Google Scholar 

  • Schaid DJ (2010) Genomic similarity and kernel methods II: methods for genomic information. Hum Hered 70:132–140

    Article  PubMed  CAS  Google Scholar 

  • Schaid DJ, McDonnell SK, Hebbring SJ, Cunningham JM, Thibodeau SN (2005) Nonparametric tests of association of multiple genes with human disease. Am J Hum Genet 76:780–793

    Google Scholar 

  • Sen PK (2006) Robust statistical inference for high-dimensional data models with application to genomics. Aust J Stat 35:197–214

    Google Scholar 

  • The Gene Ontology Consortium (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29

    Article  Google Scholar 

  • Tzeng JY, Zhang D, Chang SM, Thomas DC, Davidian M (2009) Gene-trait similarity regression for multimarker-based association analysis. Biometrics 65:822–832

    Article  PubMed  Google Scholar 

  • Tzeng JY, Zhang D, Pongpanich M, Smith C, McCarthy MI, Sale MM, Bradford BW, Hsu FC, Thomas DC, Sullivan PF (2011) Detecting gene and gene-environment effects of common and uncommon variants on quantitative traits: a marker-set approach using gene-trait similarity regression. Am J Hum Genet 89:277–288

    Google Scholar 

  • Wahba G (2012) Dissimilarity data in statistical model building and machine learning. In: Ji L, Poon YS, Yang L, Yao S-T (eds) Fifth international congress of chinese mathematicians, AMS/IP studies in advanced mathematics, pp 51:785–809

  • Wang K, Li M, Bucan M (2007) Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 81:1278–1283

    Article  PubMed  CAS  Google Scholar 

  • Wei Z, Li M, Rebbeck T, Li H (2008) U-statistics-based tests for multiple genes in genetic association studies. Ann Hum Genet 72:821–833

    Article  PubMed  CAS  Google Scholar 

  • Wessel J, Schork NJ (2006) Generalized genomic distance based regression methodology for multilocus association analysis. Am J Hum Genet 79:792–806

    Article  PubMed  CAS  Google Scholar 

  • Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, Hunter DJ, Lin X (2010) Powerful SNP-set analysis for case-control genome-wide association studies. Am J Hum Genet 86:929–942

    Article  PubMed  CAS  Google Scholar 

  • Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89:82–93

    Article  PubMed  CAS  Google Scholar 

  • Zhong PS, Chen SX (2011) Tests for high-dimensional regression coefficients with factorial designs. J Am Stat Assoc 106:260–274

    Article  CAS  Google Scholar 

Download references

Acknowledgments

This research was supported by NIH grant CA127334. I thank the editor Dr. David-Alexandre Trégouët for inviting me to contribute this review.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongzhe Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, H. U-statistics in genetic association studies. Hum Genet 131, 1395–1401 (2012). https://doi.org/10.1007/s00439-012-1178-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00439-012-1178-y

Keywords

Navigation