Elsevier

Biological Psychiatry

Volume 61, Issue 10, 15 May 2007, Pages 1121-1126
Biological Psychiatry

Original article
Spurious Genetic Associations

https://doi.org/10.1016/j.biopsych.2006.11.010Get rights and content

Background

Genetic association studies are widely used in biomedical research and yet only a minority of positive findings stand the test of replication. I explored the capacity of association studies to produce false positive findings and the impact of various definitions of replication.

Methods

Genetically realistic simulation data of a typical genotyping/analytic approach for 10 single nucleotide polymorphisms (SNPs) in COMT, a commonly studied candidate gene.

Results

Candidate gene studies like those simulated here are highly likely to produce one or more false positive findings at α ≤ .05, the pattern of findings can often be “compelling” or “intriguing,” and false positive findings propagate and confuse the literature unless the definition of replication is precise.

Conclusions

Findings from single association studies constitute “tentative knowledge” and must be interpreted with exceptional caution. For the association method to function as intended, every statistical comparison must be tracked and reported, and integrated replication is essential. Precise replication (the same SNPs, phenotype, and direction of association) is required in the interpretation of multiple association studies.

Section snippets

Simulating Genetic Data

While it is easy to simulate genotypes for single markers, it is more difficult to simulate genotypes across multiple markers in a genomic region that have realistic patterns of linkage disequilibrium. Approaches like the coalescent require unverifiable assumptions about human population history. A recent alternative is HapSample, which produces simulated marker genotypes at actual SNP loci that mirror human allele frequencies and linkage disequilibrium patterns (Wright et al, submitted).

The Probability of a False Positive in an Initial Study

In genetically realistic simulations of 500 cases and 500 control subjects for 10 COMT SNPs and using the analytic package described in Methods and Materials, 968 of 1000 simulations (96.8%) produced at least one false positive at the p ≤ .05 level of significance.

Moreover, the pattern of p-values was often compelling. The median number of p-values ≤ .05 per study was six (interquartile range 3–9 and range 0–36), and the median lowest p-value per study was .0095 (interquartile range .0037–.0190

Conclusions

Using genetically realistic simulation data of a typical genotyping approach for a commonly studied candidate gene, I explored the capacity of association studies to produce false positive findings and the impact of various definitions of replication. The conclusions were striking and unambiguous: 1) candidate gene studies like those simulated here are highly likely to produce one or more false positive findings at α ≤ .05; 2) the pattern of false positive findings from a study can often be

References (44)

  • Framework for a fully powered risk engine

    Nat Genet

    (2005)
  • P. Armitage

    Tests for linear trends in proportions and frequencies

    Biometrics

    (1956)
  • D.J. Balding

    A tutorial on statistical methods for population association studies

    Nat Rev Genet

    (2006)
  • J.C. Barrett et al.

    Haploview: Analysis and visualization of LD and haplotype maps

    Bioinformatics

    (2005)
  • A.V. Buchanan et al.

    Dissecting complex disease: The quest for the Philosopher’s Stone?

    Int J Epidemiol

    (2006)
  • L.R. Cardon et al.

    Association study designs for complex diseases

    Nat Rev Genet

    (2001)
  • P.I. de Bakker et al.

    Efficiency and power in genetic association studies

    Nat Genet

    (2005)
  • M.H. DeGroot

    Probability and Statistics

    (1986)
  • B. Devlin et al.

    Genomic control for association studies

    Biometrics

    (1999)
  • B. Efron et al.

    An Introduction to the Bootstrap

    (1993)
  • W.J. Gauderman

    Sample size requirements for matched case-control studies of gene-environment interaction

    Stat Med

    (2002)
  • P.I. Good

    Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses

    (2004)
  • Cited by (0)

    View full text