Sequence Diversity and Large-Scale Typing of SNPs in the Human Apolipoprotein E Gene

  1. Deborah A. Nickerson1,7,
  2. Scott L. Taylor1,
  3. Stephanie M. Fullerton2,3,
  4. Kenneth M. Weiss2,
  5. Andrew G. Clark3,
  6. Jari H. Stengård4,
  7. Veikko Salomaa4,
  8. Eric Boerwinkle5, and
  9. Charles F. Sing6
  1. 1Department of Molecular Biotechnology, University of Washington, Seattle, Washington 98195, USA; 2Department of Anthropology, 3Institute of Molecular Evolutionary Genetics, Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA; 4National Public Health Institute, Department of Epidemiology and Health Promotion, Helsinki, Finland; 5Human Genetics Center, University of Texas Health Science Center, Houston, Texas 77225, USA; 6Department of Human Genetics, University of Michigan Medical School, Ann Arbor, Michigan 48109, USA

Abstract

A common strategy for genotyping large samples begins with the characterization of human single nucleotide polymorphisms (SNPs) by sequencing candidate regions in a small sample for SNP discovery. This is usually followed by typing in a large sample those sites observed to vary in a smaller sample. We present results from a systematic investigation of variation at the human apolipoprotein E locus (APOE), as well as the evaluation of the two-tiered sampling strategy based on these data. We sequenced 5.5 kb spanning the entireAPOE genomic region in a core sample of 72 individuals, including 24 each of African-Americans from Jackson, Mississippi; European-Americans from Rochester, Minnesota; and Europeans from North Karelia, Finland. This sequence survey detected 21 SNPs and 1 multiallelic indel, 14 of which had not been previously reported. Alleles varied in relative frequency among the populations, and 10 sites were polymorphic in only a single population sample. Oligonucleotide ligation assays (OLA) were developed for 20 of these sites (omitting the indel and a closely-linked SNP). These were then scored in 2179 individuals sampled from the same three populations (n = 843, 884, and 452, respectively). Relative allele frequencies were generally consistent with estimates from the core sample, although variation was found in some populations in the larger sample at SNPs that were monomorphic in the corresponding smaller core sample. Site variation in the larger samples showed no systematic deviation from Hardy-Weinberg expectation. The large OLA sample clearly showed that variation in many, but not all, of OLA-typed SNPs is significantly correlated with the classical protein-coding variants, implying that there may be important substructure within the classical ɛ2, ɛ3, and ɛ4 alleles. Comparison of the levels and patterns of polymorphism in the core samples with those estimated for the OLA-typed samples shows how nucleotide diversity is underestimated when only a subset of sites are typed and underscores the importance of adequate population sampling at the polymorphism discovery stage.

[The sequence data described in this paper have been submitted to the GenBank data library under accession no. AF261279.]

Footnotes

  • 7 Corresponding author.

  • E-MAIL debnick{at}washington.edu; FAX (206) 685-7301

  • Article and publication are at www.genome.org/cgi/doi/10.1101/gr.146900.

    • Received May 3, 2000.
    • Accepted August 17, 2000.
| Table of Contents

Preprint Server