Skip to main content

Bioinformatics Challenges in Genome-Wide Association Studies (GWAS)

  • Protocol
  • First Online:
Clinical Bioinformatics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1168))

Abstract

Genome-wide association studies (GWAS) are a powerful tool for investigators to examine the human genome to detect genetic risk factors, reveal the genetic architecture of diseases and open up new opportunities for treatment and prevention. However, despite its successes, GWAS have not been able to identify genetic loci that are effective classifiers of disease, limiting their value for genetic testing. This chapter highlights the challenges that lie ahead for GWAS in better identifying disease risk predictors, and how we may address them. In this regard, we review basic concepts regarding GWAS, the technologies used for capturing genetic variation, the missing heritability problem, the need for efficient study design especially for replication efforts, reducing the bias introduced into a dataset, and how to utilize new resources available, such as electronic medical records. We also look to what lies ahead for the field, and the approaches that can be taken to realize the full potential of GWAS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

EMR:

Electronic medical record

GWAS:

Genome-wide association study/studies

LD:

Linkage disequilibrium

MAF:

Minor allele frequency

SNP:

Single nucleotide polymorphism

References

  1. Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6:95–108

    Article  CAS  PubMed  Google Scholar 

  2. Hindorff L, MacArthur J, Morales J et al. A catalog of published genome-wide association studies. www.genome.gov/gwastudies/

  3. Reich DE, Lander ES (2001) On the allelic spectrum of human disease. Trends Genet 17:502–510

    Article  CAS  PubMed  Google Scholar 

  4. Edwards AO, Ritter R, Abel KJ et al (2005) Complement factor H polymorphism and age-related macular degeneration. Science 308:421–424

    Article  CAS  PubMed  Google Scholar 

  5. Haines JL, Hauser MA, Schmidt S et al (2005) Complement factor H variant increases the risk of age-related macular degeneration. Science 308:419–421

    Article  CAS  PubMed  Google Scholar 

  6. Klein RJ, Zeiss C, Chew EY et al (2005) Complement factor H polymorphism in age-related macular degeneration. Science 308:385–389

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  7. Maller J, George S, Purcell S et al (2006) Common variation in three genes, including a noncoding variant in CFH, strongly influences risk of age-related macular degeneration. Nat Genet 38:1055–1059

    Article  CAS  PubMed  Google Scholar 

  8. Williams SM, Canter JA, Crawford DC et al (2007) Problems with genome-wide association studies. Science 316:1841–1842

    CAS  Google Scholar 

  9. Jakobsdottir J, Gorin MB, Conley YP et al (2009) Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers. PLoS Genet 5:e1000337

    Article  PubMed Central  PubMed  Google Scholar 

  10. Easton DF, Pooley KA, Dunning AM et al (2009) Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447:1087–1093

    Article  Google Scholar 

  11. Ahmed S, Thomas G, Ghoussaini M et al (2009) Newly discovered breast cancer susceptibility loci on 3p24 and 17q23.2. Nat Genet 41:585–590

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  12. Ragoussis J (2009) Genotyping technologies for genetic research. Annu Rev Genomics Hum Genet 10:117–133

    Article  CAS  PubMed  Google Scholar 

  13. Denny JC (2012) Mining electronic health records in the genomics era. PLoS Comput Biol 8:e1002823

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  14. The International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437:1299–1320

    Article  PubMed Central  Google Scholar 

  15. The 1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A et al (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073

    Article  PubMed  Google Scholar 

  16. Sherry ST, Ward MH, Kholodov M et al (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29:308–311

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  17. Griffith OL, Montgomery SB, Bernier B et al (2008) ORegAnno: an open-access community-driven resource for regulatory annotation. Nucleic Acids Res 36:D107–D113

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  18. The Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678

    Article  PubMed Central  Google Scholar 

  19. Scuteri A, Sanna S, Chen W-M et al (2007) Genome-wide association scan shows genetic variants in the FTO gene are associated with obesity-related traits. PLoS Genet 3:e115

    Article  PubMed Central  PubMed  Google Scholar 

  20. Frayling TM, Timpson NJ, Weedon MN et al (2007) A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316:889–894

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  21. Saxena R, Voight BF, Lyssenko V et al (2007) Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316:1331–1336

    Article  CAS  PubMed  Google Scholar 

  22. Corder EH, Saunders AM, Strittmatter WJ et al (1993) Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer’s disease in late onset families. Science 261:921–923

    Article  CAS  PubMed  Google Scholar 

  23. Bansal V, Libiger O, Torkamani A et al (2010) Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet 11:773–785

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  24. Gibson G (2012) Rare and common variants: twenty arguments. Nat Rev Genet 13:135–145

    Article  CAS  PubMed  Google Scholar 

  25. Devlin B, Risch N (1995) A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29:311–322

    Article  CAS  PubMed  Google Scholar 

  26. Li M, Li C, Guan W (2008) Evaluation of coverage variation of SNP chips for genome-wide association studies. Eur J Hum Genet 16:635–643

    Article  CAS  PubMed  Google Scholar 

  27. Distefano JK, Taverna DM (2011) Technological issues and experimental design of gene association studies. Methods Mol Biol 700:3–16

    Article  CAS  PubMed  Google Scholar 

  28. Lewis CM, Knight J (2012) Introduction to genetic association studies. Cold Spring Harb Protoc 3:297–306

    Google Scholar 

  29. Lewis CM (2002) Genetic association studies: design, analysis and interpretation. Brief Bioinform 3:146–153

    Article  CAS  PubMed  Google Scholar 

  30. Teslovich TM, Musunuru K, Smith AV et al (2010) Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466:707–713

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  31. Bush WS, Moore JH (2012) Genome-wide association studies. PLoS Comput Biol 8:e1002822

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  32. Habek M, Brinar VV, Borovečki F (2010) Genes associated with multiple sclerosis: 15 and counting. Expert Rev Mol Diagn 10:857–861

    Article  CAS  PubMed  Google Scholar 

  33. Polman CH, Reingold SC, Edan G et al (2005) Diagnostic criteria for multiple sclerosis: 2005 revisions to the “McDonald Criteria”. Ann Neurol 58:840–846

    Article  PubMed  Google Scholar 

  34. Kohane IS (2011) Using electronic health records to drive discovery in disease genomics. Nat Rev Genet 12:417–428

    Article  CAS  PubMed  Google Scholar 

  35. Sager N, Lyman M, Bucknall C et al (1994) Natural language processing and the representation of clinical data. J Am Med Inform Assoc 1:142–160

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  36. Friedman C, Hripcsak G, Shablinsky I (1998) An evaluation of natural language processing methodologies. Proc AMIA Symp 855–859

    Google Scholar 

  37. Haug PJ, Ranum DL, Frederick PR (1990) Computerized extraction of coded findings from free-text radiologic reports. Work in progress. Radiology 174:543–548

    CAS  PubMed  Google Scholar 

  38. Kullo IJ, Fan J, Pathak J et al (2010) Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease. J Am Med Inform Assoc 17:568–574

    Article  PubMed Central  PubMed  Google Scholar 

  39. Ding K, de Andrade M, Manolio TA et al (2013) Genetic variants that confer resistance to malaria are associated with red blood cell traits in African-Americans: an electronic medical record-based genome-wide association study. G3 (Bethesda) 3:1061–1068

    Article  CAS  PubMed Central  Google Scholar 

  40. Wilke RA, Berg RL, Linneman JG et al (2010) Quantification of the clinical modifiers impacting high-density lipoprotein cholesterol in the community: Personalized Medicine Research Project. Prev Cardiol 13:63–68

    Article  CAS  PubMed  Google Scholar 

  41. McCarty CA, Wilke RA (2010) Biobanking and pharmacogenomics. Pharmacogenomics 11:637–641

    Article  CAS  PubMed  Google Scholar 

  42. Ritchie MD, Denny JC, Crawford DC et al (2010) Robust replication of genotype–phenotype associations across multiple diseases in an electronic medical record. Am J Hum Genet 86:560–572

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  43. Dubé JB, Hegele RA (2013) Genetics 100 for cardiologists: basics of genome-wide association studies. Can J Cardiol 29:10–17

    Article  PubMed  Google Scholar 

  44. Price AL, Zaitlen NA, Reich D et al (2010) New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11:459–463

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  45. Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164:1567–1587

    CAS  PubMed Central  PubMed  Google Scholar 

  46. Price AL, Patterson NJ, Plenge RM et al (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909

    Article  CAS  PubMed  Google Scholar 

  47. Sale M, Mychaleckyj JC, Chen W (2009) Planning and executing a genome wide association study (GWAS). Methods Mol Biol 590:403–418

    Article  CAS  PubMed  Google Scholar 

  48. Eichler EE, Flint J, Gibson G et al (2010) Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11:446–450

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  49. Cordell HJ (2002) Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum Mol Genet 11:2463–2468

    Article  CAS  PubMed  Google Scholar 

  50. Moore JH, Williams SM (2005) Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. Bioessays 27:637–646

    Article  CAS  PubMed  Google Scholar 

  51. Manolio TA, Collins FS, Cox NJ et al (2009) Finding the missing heritability of complex diseases. Nature 461:747–753

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  52. Moore JH, Asselbergs FW, Williams SM (2010) Bioinformatics challenges for genome-wide association studies. Bioinformatics 26:445–455

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  53. Moore J, Ritchie M (2004) The challenges of whole-genome approaches to common disease. J Am Med Assoc 291:1642–1643

    Article  CAS  Google Scholar 

  54. Moore JH (2004) Computational analysis of gene–gene interactions using multifactor dimensionality reduction. Expert Rev Mol Diagn 4:795–803

    Article  CAS  PubMed  Google Scholar 

  55. Bush WS, Dudek SM, Ritchie MD (2009) Biofilter: a knowledge-integration system for the multi-locus analysis of genome-wide association studies. Pac Symp Biocomput 368–379

    Google Scholar 

  56. Ritchie MD, Hahn LW, Roodi N et al (2001) Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69:138–147

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  57. Herold C, Steffens M, Brockschmidt FF et al (2009) INTERSNP: genome-wide interaction analysis guided by a priori information. Bioinformatics 25:3275–3281

    Article  CAS  PubMed  Google Scholar 

  58. Hochberg Y, Benjamini Y (1990) More powerful procedures for multiple significance testing. Stat Med 9:811–818

    Article  CAS  PubMed  Google Scholar 

  59. van den Oord EJ (2008) Controlling false discoveries in genetic studies. Am J Med Genet Part B Neuropsychiatr Genet 147B:637–644

    Article  Google Scholar 

  60. Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  61. Browning BL (2008) PRESTO: rapid calculation of order statistic distributions and multiple-testing adjusted P-values via permutation for one and two-stage genetic association studies. BMC Bioinformatics 9:309

    Article  PubMed Central  PubMed  Google Scholar 

  62. Pahl R, Schäfer H (2010) PERMORY: an LD-exploiting permutation test algorithm for powerful genome-wide association testing. Bioinformatics 26:2093–2100

    Article  CAS  PubMed  Google Scholar 

  63. Chanock SJ, Manolio T, Boehnke M et al (2007) Replicating genotype–phenotype associations. Nature 447:655–660

    Article  CAS  PubMed  Google Scholar 

  64. Zollner S, Pritchard JK (2007) Overcoming the winner’s curse: estimating penetrance parameters from case-control data. Am J Hum Genet 80:605–615

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  65. Huedo-Medina TB, Sánchez-Meca J, Marín-Martínez F et al (2006) Assessing heterogeneity in meta-analysis: Q statistic or I2 index? Psychol Methods 11:193–206

    Article  PubMed  Google Scholar 

  66. Evangelou E, Ioannidis JP (2013) Meta-analysis methods for genome-wide association studies and beyond. Nat Rev Genet 14:379–389

    Article  CAS  PubMed  Google Scholar 

  67. Li Y, Willer C, Sanna S et al (2009) Genotype imputation. Annu Rev Genomics Hum Genet 10:387–406

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  68. Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11:499–511

    Article  CAS  PubMed  Google Scholar 

  69. Guan Y, Stephens M (2008) Practical issues in imputation-based association mapping. PLoS Genet 4:e1000279

    Article  PubMed Central  PubMed  Google Scholar 

  70. Howie BN, Donnelly P, Marchini J (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5:e1000529

    Article  PubMed Central  PubMed  Google Scholar 

  71. Biernacka J, Tang R, Li J et al (2009) Assessment of genotype imputation methods. BMC Proc 3(Suppl 7):S5

    Article  PubMed Central  PubMed  Google Scholar 

  72. Browning BL, Browning SR (2009) A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 84:210–223

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  73. Barrett JC, Cardon LR (2006) Evaluating coverage of genome-wide association studies. Nat Genet 38:659–662

    Article  CAS  PubMed  Google Scholar 

  74. Pe’er I, de Bakker PI, Maller J et al (2006) Evaluating and improving power in whole-genome association studies using fixed marker sets. Nat Genet 38:663–667

    Article  PubMed  Google Scholar 

  75. Zeggini E, Ioannidis JP (2009) Meta-analysis in genome-wide association studies. Pharmacogenomics 10:191–201

    Article  PubMed Central  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason H. Moore .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this protocol

Cite this protocol

De, R., Bush, W.S., Moore, J.H. (2014). Bioinformatics Challenges in Genome-Wide Association Studies (GWAS). In: Trent, R. (eds) Clinical Bioinformatics. Methods in Molecular Biology, vol 1168. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-0847-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-0847-9_5

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-0846-2

  • Online ISBN: 978-1-4939-0847-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics