Building a forensic ancestry panel from the ground up: The EUROFORGEN Global AIM-SNP set
Introduction
The prospects for typing 200–300 single nucleotide polymorphisms (SNPs) in one multiplexed sequencing analysis are now much more realistic with the emergence of fast, compact next-generation sequencing systems (NGS), such as Life Technologies Ion Torrent and Illumina MiSeq [1], [2]. SNPs have the benefit of complimenting conventional forensic STR analysis by providing information about the DNA donor that can progress a criminal investigation lacking any leads beyond knowledge of gender. Principal amongst the complimentary data generated by SNP analysis is the inference of genetic ancestry and prediction of common physical traits, with SNP-based analysis of pigmentation now established as a viable investigative tool [3], [4], [5]. Until the development of compact NGS approaches, forensic ancestry analysis centered on small-scale multiplexes of carefully chosen SNPs and Indels, exemplified by a 34-SNP SNaPshot multiplex and a 46-Indel dye-labeled PCR multiplex [6], [7], [8]. Once optimized, we successfully applied these tests to a variety of challenging DNA cases [9], [10], [11], [12] and their combination into 80-marker profiles provides good data depth, short-amplicon PCRs sensitive to degraded DNA and complimentary features including Indel's enhanced ability to detect mixed DNA. However, the original choice of ancestry informative markers, particularly components of the 34-plex SNP test, reflected the state of knowledge of human SNP variation some nine years ago. Now much more extensive SNP catalogs can be screened for suitable candidate markers with major human genome initiatives including HapMap, 1000 Genomes and Complete Genomics publicly releasing project data to allow identification of the best markers for ancestry inference purposes.
We decided to build, from a completely refreshed list of candidates, a new ancestry SNP (AIM-SNP) panel using our own bio-informatics search tools [13], [14] that front-end public genome data. Reconfiguring a forensic AIM-SNP set allows several characteristics to be prioritized: (i) identifying the most powerful differentiators for each population comparison; (ii) finding alternative loci with near-identical frequency distributions due to LD-block correlations [15] when SNP multiplexing problems arise, and (iii) carefully balancing marker combinations to give equivalent levels of differentiation between population groups comprising: Africans, Europeans, East Asians, Native Americans and Oceanians. The third characteristic is the most desirable for ensuring less biased assessments of admixture proportions in individuals with detectable co-ancestry–a significant demographic feature of urban populations and regions with histories of population movement (see Chapter 14 of [16]). However, population differentiation balance is also the most challenging characteristic to achieve, since, of the above five groups, Native American and Oceanian variation is not represented in any of the full human SNP catalogs. Luckily, more than 650,000 SNPs have been characterized for the CEPH Human Genome Diversity Panel (HGDP-CEPH) with two Oceanian populations and five American populations [17], so suitable SNPs can be identified for differentiating these two groups, albeit from much smaller sample sizes.
This paper outlines the AIM-SNPs chosen to construct a set of 128 markers suitable for inclusion in forensic NGS tests. The set maintains near-identical population differentiation balance between admixture contributors originating from the five main continentally defined population groups. Therefore the AIM-SNPs together allow analysis of admixed individuals, provided the co-ancestry contributors themselves are not admixed. The AIMs are applicable to a large proportion of the worldwide distribution of human populations, including regions where populations meet and admixture contributors are not necessarily confined to Europeans, Africans or East Asians, e.g. American contributors in the USA and South America or Oceanians in Australia. However, differentiation of European from Middle East or South Asian sub-groups of Eurasia was ignored in favor of ensuring Oceanian differentiation comparable to the other groups. The possibility of allele frequency bias in the populations used to select AIM-SNPs can still exist so we attempted to minimize this by using at least two geographically separated populations per group. Four populations likely to be divergent from those used for selection were also tested to gauge the degree of allelic heterogeneity they exhibited for the same SNPs. Because size constraints can still apply to PCR multiplexes in all technologies, (forensic NGS tests may include STRs as main components), we also reduced the SNP set to smaller scale subsets while maintaining the population differentiation balance at each stage of reduction. Lastly, we describe Sequenom iPLEX® MALDI-TOF genotyping tests used to validate additional population variation in the AIM-SNPs chosen and to assess each SNP's multiplex performance ahead of porting them to larger-scale NGS chemistries.
Section snippets
Sources of AIM-SNPs and allele frequencies in the five main global population groups
Candidate AIM-SNPs were compiled from three sources: (i) SNP sets previously developed for a range of forensic ancestry test initiatives at Santiago (USC); (ii) allele frequency screens of the Stanford HGDP-CEPH 650 K SNP dataset [17], [18] – identifying SNPs with the highest divergence between targeted population comparisons by finding the top 5% most differentiated in each case, and (iii) AIM-SNP lists published both before and after availability of whole genome scan (WGS) high-density SNP
Characteristics of the ancestry informative SNPs selected
A final set of 122 bi-allelic and 6 tri-allelic SNPs were selected from a total candidate pool of 189 loci (and 12 tri-allelic) and are detailed in Table 1 and Supplementary Table S2A. All candidate SNPs from sources detailed in Section 2.1 are listed in Supplementary Table S2B. Global AIM-SNP allele frequency distributions in five population groups are summarized in Fig. 2. The cumulative PSD values in each group required a smaller number of AFR-informative SNPs and for 28 candidates, Oceanian
Discussion
This study shows the current extensive human genome variation catalogs can be easily accessed and their allele frequency data used to select highly differentiating ancestry informative SNPs. We were able to build sets with a range of sizes that meet the statistical power demands of forensic analysis, while focusing on the key characteristic of population differentiation balance. Although prompted by the previous study of Galanter, that addressed AFR-EUR-AME populations [22], for all but the
Acknowledgements
This work was funded by the EUROFORGEN Node of Excellence (Grant Agreement No. 285487). Studies leading to the reported results were financially supported by the Austrian Science Fund (FWF, P22880-B12). CS is supported by funding awarded by the Portuguese Foundation for Science and Technology (FCT) and co-financed by the European Social Fund (Human Potential Thematic Operational Program SFRH/BD/75627/2010).
References (37)
- et al.
STRait Razor: A length-based forensic STR allele-calling tool for use with second generation sequencing data
Forensic Sci. Int. Genet.
(2013) - et al.
DNA-based eye colour prediction across Europe with the IrisPlex system
Forensic Sci. Int. Genet.
(2011) - et al.
The HIrisPlex system for simultaneous prediction of hair and eye colour from DNA
Forensic Sci. Int. Genet.
(2013) - et al.
Further development of forensic eye color predictive tests
Forensic Sci. Int. Genet.
(2013) - et al.
The SNPforID consortium, inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs
Forensic Sci. Int. Genet.
(2007) - et al.
Revision of the SNPforID 34-plex forensic ancestry test: assay enhancements, standard reference sample genotypes and extended population studies
Forensic Sci. Int. Genet.
(2013) - et al.
Case report: identification of skeletal remains using short-amplicon marker analysis of severely degraded DNA extracted from a decomposed and charred femur
Forensic Sci. Int. Genet.
(2008) - et al.
Eurasiaplex: A forensic SNP assay for differentiating European and South Asian ancestries
Forensic Sci. Int. Genet.
(2013) - et al.
Human genome-wide screen of haplotype-like blocks of reduced diversity
Gene
(2005) - et al.
The SNPforID consortium, Evaluation of the Genplex SNP typing system and a 49-plex forensic marker panel
Forensic Sci. Int. Genet.
(2007)
Informativeness of genetic markers for inference of ancestry
Am. J. Hum. Genet.
Ancestry informative markers
Genetic signatures of strong recent positive selection at the lactase gene
Am. J. Hum. Genet.
An evaluation of potential linkage disequilibrium between the STRs vWA and D12S391 with implications in criminal casework
Forensic Sci. Int. Genet.
Single nucleotide polymorphism typing with massively parallel sequencing for human identification
Int. J. Legal Med.
Straightforward inference of ancestry and admixture proportions through ancestry-informative insertion deletion multiplexing
PLoS One
Applications of autosomal SNPs and Indels in forensic analysis
Forensic Sci. Rev.
A 34-plex autosomal SNP single base extension assay for ancestry investigations
Methods Mol. Biol.
Cited by (98)
Recent advances in Forensic DNA Phenotyping of appearance, ancestry and age
2023, Forensic Science International: GeneticsDevelopment and evaluations of the ancestry informative markers of the VISAGE Enhanced Tool for Appearance and Ancestry
2023, Forensic Science International: GeneticsComparative evaluation of the MAPlex, Precision ID Ancestry Panel, and VISAGE Basic Tool for biogeographical ancestry inference
2023, Forensic Science International: GeneticsA custom hybridisation enrichment forensic intelligence panel to infer biogeographic ancestry, hair and eye colour, and Y chromosome lineage
2023, Forensic Science International: GeneticsOverview of NGS platforms and technological advancements for forensic applications
2023, Next Generation Sequencing (NGS) Technology in DNA Analysis