Skip to main content

Epistasis, Complexity, and Multifactor Dimensionality Reduction

  • Protocol
  • First Online:
Genome-Wide Association Studies and Genomic Prediction

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1019))

Abstract

Genome-wide association studies (GWASs) and other high-throughput initiatives have led to an information explosion in human genetics and genetic epidemiology. Conversion of this wealth of new information about genomic variation to knowledge about public health and human biology will depend critically on the complexity of the genotype to phenotype mapping relationship. We review here computational approaches to genetic analysis that embrace, rather than ignore, the complexity of human health. We focus on multifactor dimensionality reduction (MDR) as an approach for modeling one of these complexities: epistasis or gene–gene interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6:95–108

    Article  PubMed  CAS  Google Scholar 

  2. Wang WY, Barratt BJ, Clayton DG, Todd JA (2005) Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 6:109–118

    Article  PubMed  CAS  Google Scholar 

  3. Manolio TA (2010) Genome-wide association studies and assessment of the risk of disease. N Engl J Med 363(2):166–176

    Article  PubMed  CAS  Google Scholar 

  4. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH (2001) Multifactor dimensionality reduction reveals high-order interactions among estrogen metabolism genes in sporadic breast cancer. Am J Hum Genet 69(1):138–147

    Article  PubMed  CAS  Google Scholar 

  5. Franke A et al (2010) Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat Genet 42:1118–1125

    Article  PubMed  CAS  Google Scholar 

  6. Eichler EE et al (2010) Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11:446–450

    Article  PubMed  CAS  Google Scholar 

  7. Williams SM, Canter JA, Crawford DC, Moore JH, Ritchie MD, Haines JL (2007) Problems with genome-wide association studies. Science 316:1840–1842

    PubMed  Google Scholar 

  8. Moore JH, Williams SM (2009) Epistasis and its implications for personal genetics. Am J Hum Genet 85(3):309–320

    Article  PubMed  CAS  Google Scholar 

  9. Moore JH (2010) Bioinformatics challenges for genome-wide association studies. Bioinformatics 26(4):445–455

    Article  PubMed  CAS  Google Scholar 

  10. Bateson W, Saunders ER, Punnett RC, Hurst CC (1905) Reports to the Evolution Committee of the Royal Society, report II. Harrison and Sons, London

    Google Scholar 

  11. Thornton-Wells TA, Moore JH, Haines JL (2004) Genetics, statistics and human disease: analytical retooling for complexity. Trends Genet 20(12):640–647

    Article  PubMed  CAS  Google Scholar 

  12. Phillips PC (2008) Epistasis—the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet 9:855–867

    Article  PubMed  CAS  Google Scholar 

  13. Cordell HJ (2002) Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum Mol Genet 11(20):2463–2468

    Article  PubMed  CAS  Google Scholar 

  14. Cordell HJ (2009) Detecting gene–gene interactions that underlie human diseases. Nat Rev Genet 10:392–404

    Article  PubMed  CAS  Google Scholar 

  15. Phillips PC (1998) The language of gene interaction. Genetics 149(3):1167–1171

    PubMed  CAS  Google Scholar 

  16. Moore JH, Williams SW (2005) Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. Bioessays 27(6):637–646

    Article  PubMed  CAS  Google Scholar 

  17. Tyler AL, Asselbergs FW, Williams SM, Moore JH (2009) Shadows of complexity: what biological networks reveal about epistasis and pleiotropy. Bioessays 31(2):220–227

    Article  PubMed  Google Scholar 

  18. Gibson G (2009) Decanalization and the origin of complex disease. Nat Rev Genet 10:134–140

    Article  PubMed  CAS  Google Scholar 

  19. Moore JH (2005) A global view of epistasis. Nat Genet 37(1):13–14

    Article  PubMed  CAS  Google Scholar 

  20. Moore JH (2003) The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum Hered 56(1–3):73–82

    Article  PubMed  Google Scholar 

  21. Teare MD, Barrett JH (2005) Genetic linkage studies. Lancet 336(9940):1036–1044

    Article  Google Scholar 

  22. Cordell HJ, Clayton DG (2005) Genetic association studies. Lancet 336(9491):1121–1131

    Article  Google Scholar 

  23. Moore JH, Ritchie MD (2004) The challenges of whole-genome approaches to common diseases. J Am Med Assoc 291(13):1642–1643

    Article  CAS  Google Scholar 

  24. Clark AG, Boerwinkle E, Hixson J, Sing CF (2005) Determinants of the success of whole-genome association testing. Genome Res 15:1463–1467

    Article  PubMed  CAS  Google Scholar 

  25. McKinney BA, Reif DM, Ritchie MD, Moore JH (2006) Machine learning for detecting gene–gene interactions: a review. Appl Bioinformatics 5(2):77–88

    Article  PubMed  CAS  Google Scholar 

  26. Jiang R, Tang W, Wu X, Fu W (2009) A random forest approach to the detection of epistatic interactions in case–control studies. BMC Bioinformatics 10(Suppl 1):S65

    Article  PubMed  Google Scholar 

  27. Lunetta KL, Hayward LB, Segal J, Eerdewegh PV (2004) Screening large-scale association study data: exploiting interactions using random forest. BMC Genet 5:32

    Article  PubMed  Google Scholar 

  28. Bureau A, Dupuis J, Falls K, Lunetta KL, Hayward B, Keith TP, Eerdewegh PV (2005) Identifying SNPs predictive of phenotype using random forest. Genet Epidemiol 28(2):171–182

    Article  PubMed  Google Scholar 

  29. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman and Hall, New York

    Google Scholar 

  30. Mitchell T (1997) Machine learning. McGraw-Hill, New York

    Google Scholar 

  31. Breiman L (2001) Random Forests. Machine Learning 45(1):5–32

    Article  Google Scholar 

  32. Cook NR, Zee RY, Ridker PM (2004) Tree and spline based association analysis of gene–gene interaction models for ischemic stroke. Stat Med 23(9):1439–1453

    Article  PubMed  Google Scholar 

  33. McKinney BA, Crowe JE, Guo J, Tian D (2009) Capturing the spectrum of interaction effects in genetic association studies by simulated evaporative cooling network analysis. PLoS Genet 5:e1000432

    Article  PubMed  Google Scholar 

  34. Strobl C, Boulesteix A, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics 8:25

    Article  PubMed  Google Scholar 

  35. Hahn LW, Ritchie MD, Moore JH (2003) Multifactor dimensionality reduction software for detecting gene–gene and gene–environment interactions. Bioinformatics 19(3):376–382

    Article  PubMed  CAS  Google Scholar 

  36. Ritchie MD, Hahn LW, Moore JH (2003) Power of multifactor dimensionality reduction for detecting gene–gene interactions in the presence of genotyping error, phenocopy, and genetic heterogeneity. Genet Epidemiol 24(2):150–157

    Article  PubMed  Google Scholar 

  37. Hahn LW, Moore JH (2004) Ideal discrimination of discrete clinical endpoints using multilocus genotypes. In Silico Biol 4:183–194

    PubMed  CAS  Google Scholar 

  38. Moore JH (2004) Computational analysis of gene–gene interactions in common human diseases using multifactor dimensionality reduction. Expert Rev Mol Diagn 4(6):795–803

    Article  PubMed  CAS  Google Scholar 

  39. Moore JH et al (2006) A flexible computational framework for detecting characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J Theor Biol 241:252–261

    Article  PubMed  Google Scholar 

  40. Moore JH et al (2007) Genome-wide analysis of epistasis using multifactor dimensionality reduction: feature selection and construction in domain of human genetics. In: Zhu X, Davidson I (eds) Knowledge Discovery and Data Mining: Challenges and Realities, IGI Global 17–30

    Google Scholar 

  41. Moore JH (2010) Detecting, characterizing, and interpreting nonlinear gene–gene interactions using multifactor dimensionality reduction. Adv Genet 72:101–116

    Article  PubMed  Google Scholar 

  42. Velez DR, White BC, Motsinger AA, Bush WS, Ritchie MD, Williams SM, Moore JH (2007) A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 31(4):306–315

    Article  PubMed  Google Scholar 

  43. Greene CS, Himmelstein DS, Nelson HH, Kelsey KT, Williams SM, Andrew AS, Karagas MR, Moore JH (2010) Enabling personal genomics with an explicit test of epistasis. Pac Symp Biocomput 2010:327–336

    Google Scholar 

  44. Gui J, Andrew AS, Andrews P, Nelson HM, Kelsey KT, Karagas MR, Moore JH (2011) A robust multifactor dimensionality reduction method for detecting gene–gene interactions with application to the genetic analysis of bladder cancer susceptibility. Ann Hum Genet 75(1):20–28

    Article  PubMed  Google Scholar 

  45. Gui J, Moore JH, Kelsey KT, Marsit CJ, Karagas MR, Andrew AS (2011) A novel survival multifactor dimensionality reduction method for detecting gene–gene interactions with application to bladder cancer prognosis. Hum Genet 129(1):101–110

    Article  PubMed  Google Scholar 

  46. Gui J, Andrew AS, Andrews P, Nelson HM, Kelsey KT, Karagas MR, Moore JH (2010) A simple and computationally efficient sampling approach to covariate adjustment for multifactor dimensionality reduction analysis of epistasis. Hum Hered 70(3):219–225

    Article  PubMed  CAS  Google Scholar 

  47. Calle ML, Urrea V, Malats N, Van Steen K (2010) mbmdir: an R package for exploring gene–gene interactions associated with binary or quantitative traits. Bioinformatics 26(17):2198–2199

    Article  PubMed  CAS  Google Scholar 

  48. Cattaert T, Calle ML, Dudek SM, Mahachie John JM, Van Lishout F, Urrea V, Ritchie MD, Van Steen K (2011) Model-based multifactor dimensionality reduction for detecting epistasis in case–control data in the presence of noise. Ann Hum Genet 75(1):78–89

    Article  PubMed  Google Scholar 

  49. Lou XY, Chen GB, Yan L, Ma JZ, Zhou J, Elston RC, Li MD (2007) A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence. Am J Hum Genet 80(6):1125–1137

    Article  PubMed  CAS  Google Scholar 

  50. Kira K, Rendell LA (1992) A practical approach to feature selection. Proceedings of the ninth international workshop on machine learning, pp 249–256

    Google Scholar 

  51. Kononenko I (1994). Estimating attributes: analysis and extension of Relief. Proceedings of the European conference on machine learning, pp 171–182

    Google Scholar 

  52. Robnik-Siknja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning 53:23–69

    Article  Google Scholar 

  53. Robnik-Sikonja M, Kononenko I (2001) Comprehensible interpretation of Relief’s estimates. Proceedings of the eighteenth international conference on machine learning, pp 433–440

    Google Scholar 

  54. Moore JH, White BC (2007) Tuning ReliefF for genome-wide genetic analysis. Lect Notes Comput Sci 4447:166–175

    Article  Google Scholar 

  55. McKinney BA, Reif DM, White BC, Crowe JE Jr, Moore JH (2007) Evaporative cooling feature selection for genotypic data involving interactions. Bioinformatics 23(16):2113–2120

    Article  PubMed  CAS  Google Scholar 

  56. Greene CS et al (2008) Spatially uniform ReliefF (SURF) for computationally-efficient filtering of gene–gene interactions. BioData Min 2:5

    Article  Google Scholar 

  57. Greene CS, Himmelstein DS, Kiralis J, Moore JH (2010) The informative extremes: using both nearest and farthest individuals can improve Relief algorithms in the domain of human genetics. Lect Notes Comput Sci 6023:182–193

    Article  Google Scholar 

  58. Pattin KA, Moore JH (2008) Exploiting the proteome to improve the genome-wide genetic analysis of epistasis in common human diseases. Hum Genet 124:19–29

    Article  PubMed  CAS  Google Scholar 

  59. Bush WS, Dudek SM, Ritchie MD (2009) Biofilter: a knowledge-integration system for the multi-locus analysis of genome-wide association studies. Pac Symp Biocomput 368–379

    Google Scholar 

  60. Askland K, Read C, Moore J (2009) Pathways-based analyses of whole-genome association study data in bipolar disorder reveal genes mediating ion channel activity and synaptic neurotransmission. Hum Genet 125:63–79

    Article  PubMed  CAS  Google Scholar 

  61. Michalewicz Z, Fogel DB (2004) How to solve it: modern heuristics. Springer, New York

    Book  Google Scholar 

  62. Greene CS et al (2009) Optimal use of expert knowledge in ant colony optimization for the analysis of epistasis in human disease. Lect Notes Comput Sci 5483:92–103

    Article  Google Scholar 

  63. Sinnott-Armstrong NA, Green CS, Cancare F, Moore JH (2009) Accelerating epistasis analysis in human genetics with consumer graphics hardware. BMC Res Notes 2:149

    Article  PubMed  Google Scholar 

  64. Payne JL, Sinnott-Armstrong NA, Moore JH (2010) Exploiting graphics processing units for computational biology and bioinformatics. Interdiscip Sci 2(3):213–220

    Article  PubMed  CAS  Google Scholar 

  65. Greene CS et al (2010) Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS. Bioinformatics 26:694–695

    Article  PubMed  CAS  Google Scholar 

  66. Newman MEJ (2010) Networks: an introduction. Oxford University Press, New York

    Google Scholar 

  67. Strogatz SH (2001) Exploring complex networks. Nature 410:268–276

    Article  PubMed  CAS  Google Scholar 

  68. Andrei A, Kendziorski C (2009) An efficient method for identifying statistical interactors in gene association networks. Biostatistics 10:706–718

    Article  PubMed  Google Scholar 

  69. Chu JH et al (2009) A graphical model approach for inferring large-scale networks integrating gene expression and genetic polymorphism. BMC Syst Biol 3:55

    Article  PubMed  Google Scholar 

  70. Schafer J, Strimmer K (2005) An empirical Bayes approach to inferring large-scale gene association. Bioinformatics 21(6):754–764

    Article  PubMed  Google Scholar 

  71. Hu T, Sinnott-Armstrong NA, Kiralis JW, Andrew AS, Karagas MR, Moore JH (2011) Characterizing genetic interactions in human disease association studies using statistical epistasis networks. BMC Bioinformatics 12:364

    Article  PubMed  CAS  Google Scholar 

  72. Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York

    Google Scholar 

  73. Jeong H et al (2001) Lethality and centrality in protein networks. Nature 411:41–42

    Article  PubMed  CAS  Google Scholar 

  74. Cowper-Sal lari R, Cole MD, Karagas MR, Lupien M, Moore JH (2011) Layers of epistasis: genome-wide regulatory networks and network approaches to genome-wide association studies. Wiley Interdiscip Rev Syst Biol Med 3(5):513–526

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Pan, Q., Hu, T., Moore, J.H. (2013). Epistasis, Complexity, and Multifactor Dimensionality Reduction. In: Gondro, C., van der Werf, J., Hayes, B. (eds) Genome-Wide Association Studies and Genomic Prediction. Methods in Molecular Biology, vol 1019. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-447-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-447-0_22

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-446-3

  • Online ISBN: 978-1-62703-447-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics