U-statistics in genetic association studies

Li, Hongzhe

doi:10.1007/s00439-012-1178-y

U-statistics in genetic association studies

Review Paper
Published: 20 May 2012

Volume 131, pages 1395–1401, (2012)
Cite this article

Human Genetics Aims and scope Submit manuscript

Hongzhe Li¹

666 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Many common human diseases are complex and are expected to be highly heterogeneous, with multiple causative loci and multiple rare and common variants at some of the causative loci contributing to the risk of these diseases. Data from the genome-wide association studies (GWAS) and metadata such as known gene functions and pathways provide the possibility of identifying genetic variants, genes and pathways that are associated with complex phenotypes. Single-marker-based tests have been very successful in identifying thousands of genetic variants for hundreds of complex phenotypes. However, these variants only explain very small percentages of the heritabilities. To account for the locus- and allelic-heterogeneity, gene-based and pathway-based tests can be very useful in the next stage of the analysis of GWAS data. U-statistics, which summarize the genomic similarity between pair of individuals and link the genomic similarity to phenotype similarity, have proved to be very useful for testing the associations between a set of single nucleotide polymorphisms and the phenotypes. Compared to single marker analysis, the advantages afforded by the U-statistics-based methods is large when the number of markers involved is large. We review several formulations of U-statistics in genetic association studies and point out the links of these statistics with other similarity-based tests of genetic association. Finally, potential application of U-statistics in analysis of the next-generation sequencing data and rare variants association studies are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introduction to Statistical Methods for Integrative Data Analysis in Genome-Wide Association Studies

Rare-variant collapsing analyses for complex traits: guidelines and applications

Article 11 October 2019

Genome-Wide Association Studies: A Comprehensive Tool to Explore Comparative Genomic Variations and Interactions

References

Bansal V, Libiger O, Torkamani A, Schork NJ (2010) Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet 11:773–785
Article PubMed CAS Google Scholar
Hoeffding W (1948) A class of statistics with asymptotically normal distributions. Ann Stat 19:293–325
Article Google Scholar
Hindorff LA, Junkins HA, Hall PN, Mehta JP, Manolio TA (2011) A catalog of published genome-wide association studies. http://www.genome.gov/gwastudies
Huang H, Chanda P, Alonso A, Bader JS, Arking DE (2011) Gene-based tests of association. PLoS Genetics 7:e1002177. doi:10.1371/journal.pgen.1002177
Kwee LC, Liu D, Lin X, Ghosh D, Epstein MP (2008): A powerful and flexible multilocus association test for quantitative traits. Am J Hum Genet 82:386–397
Article PubMed CAS Google Scholar
Lee AJ (1990) U-statistics: theory and practice. Marcel Dekker, New York
Li M, Ye C, Fu W, Elston RC, Lu Q (2011) Detecting genetic interactions for quantitative traits with U-Statistics. Genet Epidemiol 35:457–468
PubMed Google Scholar
Maher B (2008) Personal genomes: The case of the missing heritability. Nature 456:1821
Google Scholar
Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A et al (2009) Finding the missing heritability of complex diseases. Nature 461:747–753
Article PubMed CAS Google Scholar
McArdle BH, Anderson MJ (2001) Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82:290–297
Article Google Scholar
McKinney BA, Crowe JE Jr, Guo J, Tian D (2009) Capturing the spectrum of interaction effects in genetic association studies by simulated evaporative cooling network analysis. PLoS Genet 5:e1000432. doi:10.1371/journal.pgen.1000432
Moore JH, White BC (2007) Tuning ReliefF for genome-wide genetic analysis. In: Lecture notes in computer science: evolutionary computation, machine learning, and data mining in bioinformatics. Springer, Berlin, pp 166–175
Nguyen LB, Diskin SJ, Cappasso M, Wang K, Diamond MA, Glessner J, Kim C, Attiyeh EF, Mosse YP, Cole K, Lolascon A, Devoto M, Hakonarson H, Li H, Maris JM (2011) Phenotype restricted genome-wide association study using a gene-centric approach identifies three low-risk neuroblastoma susceptibility loci. PLoS Genet 7(3):e1002026. doi:10.1371/journal.pgen.1002026
Pan W (2011) Relationship between genomic distance-based regression and kernel machine regression for multi-marker association testing. Genet Epidemiol 35:211–216
Article Google Scholar
Schaid DJ (2010) Genomic similarity and kernel methods II: methods for genomic information. Hum Hered 70:132–140
Article PubMed CAS Google Scholar
Schaid DJ, McDonnell SK, Hebbring SJ, Cunningham JM, Thibodeau SN (2005) Nonparametric tests of association of multiple genes with human disease. Am J Hum Genet 76:780–793
Google Scholar
Sen PK (2006) Robust statistical inference for high-dimensional data models with application to genomics. Aust J Stat 35:197–214
Google Scholar
The Gene Ontology Consortium (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29
Article Google Scholar
Tzeng JY, Zhang D, Chang SM, Thomas DC, Davidian M (2009) Gene-trait similarity regression for multimarker-based association analysis. Biometrics 65:822–832
Article PubMed Google Scholar
Tzeng JY, Zhang D, Pongpanich M, Smith C, McCarthy MI, Sale MM, Bradford BW, Hsu FC, Thomas DC, Sullivan PF (2011) Detecting gene and gene-environment effects of common and uncommon variants on quantitative traits: a marker-set approach using gene-trait similarity regression. Am J Hum Genet 89:277–288
Google Scholar
Wahba G (2012) Dissimilarity data in statistical model building and machine learning. In: Ji L, Poon YS, Yang L, Yao S-T (eds) Fifth international congress of chinese mathematicians, AMS/IP studies in advanced mathematics, pp 51:785–809
Wang K, Li M, Bucan M (2007) Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 81:1278–1283
Article PubMed CAS Google Scholar
Wei Z, Li M, Rebbeck T, Li H (2008) U-statistics-based tests for multiple genes in genetic association studies. Ann Hum Genet 72:821–833
Article PubMed CAS Google Scholar
Wessel J, Schork NJ (2006) Generalized genomic distance based regression methodology for multilocus association analysis. Am J Hum Genet 79:792–806
Article PubMed CAS Google Scholar
Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, Hunter DJ, Lin X (2010) Powerful SNP-set analysis for case-control genome-wide association studies. Am J Hum Genet 86:929–942
Article PubMed CAS Google Scholar
Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89:82–93
Article PubMed CAS Google Scholar
Zhong PS, Chen SX (2011) Tests for high-dimensional regression coefficients with factorial designs. J Am Stat Assoc 106:260–274
Article CAS Google Scholar

Download references

Acknowledgments

This research was supported by NIH grant CA127334. I thank the editor Dr. David-Alexandre Trégouët for inviting me to contribute this review.

Author information

Authors and Affiliations

Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
Hongzhe Li

Authors

Hongzhe Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongzhe Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, H. U-statistics in genetic association studies. Hum Genet 131, 1395–1401 (2012). https://doi.org/10.1007/s00439-012-1178-y

Download citation

Received: 17 April 2012
Accepted: 07 May 2012
Published: 20 May 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s00439-012-1178-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

U-statistics in genetic association studies

Abstract

Access this article

Similar content being viewed by others

Introduction to Statistical Methods for Integrative Data Analysis in Genome-Wide Association Studies

Rare-variant collapsing analyses for complex traits: guidelines and applications

Genome-Wide Association Studies: A Comprehensive Tool to Explore Comparative Genomic Variations and Interactions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

U-statistics in genetic association studies

Abstract

Access this article

Similar content being viewed by others

Introduction to Statistical Methods for Integrative Data Analysis in Genome-Wide Association Studies

Rare-variant collapsing analyses for complex traits: guidelines and applications

Genome-Wide Association Studies: A Comprehensive Tool to Explore Comparative Genomic Variations and Interactions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation