Helena, the hidden beauty: Resolving the most common West Eurasian mtDNA control region haplotype by massively parallel sequencing an Italian population sample
Introduction
Forensic DNA analyses are routinely performed by determining the “genetic fingerprint”, i.e. the alleles of polymorphic nuclear microsatellite markers that display high diversity, stability, and Mendelian inheritance, and thus allow identification, individualization and pedigree reconstruction [1]. These markers however do not regularly yield results from compromised samples containing degraded or low quantities of nuclear DNA. The haploid, maternally inherited mitochondrial (mt)DNA has become a vital niche in analyzing those samples due to its abundance and stability as multi-copy circular molecule protected in organelles. As a lineage marker, it can be used to exclude identity or corroborate (even distant) maternal relatedness [2], [3], [4], [5], [6].
The outcome of (forensic) mtDNA investigations, besides precise base calling and the availability of high-quality databases [7], [8], [9], [10], mainly depends on the amount of information generated from the individual sample [11], [12]. Because of financial, technical and legal restrictions, the current standard is to sequence (hypervariable parts of) the ∼1.1 kbp non-coding control region (CR) of the ∼16.5 kbp mitochondrial genome (mtGenome), that contains densely concentrated variation due to a higher mutation rate compared to the remaining segment, the coding region (codR) [7], [13], [14]. CR data enable coarse discrimination of numerous maternal lineages [15] and may yield non-identity (“exclusion”) of donors and their maternal relatives, respectively, in many forensic cases [16], [17]. However, identical CR haplotypes are found on different haplogroup backgrounds across the mtDNA phylogeny, as several lineages are poorly defined in the CR [18]. Consequently, a shared CR haplotype (“non-exclusion”) does not necessarily imply that two mtDNAs are identical in their codR, or even belong to the same lineage.
For these reasons, the current partial analyses of the mtDNA molecule greatly restrict and possibly bias (forensic) interpretation, and are particularly problematic for populations with extremely few CR lineages (e.g., Ref. [19]). For example, when working with samples of West Eurasian (sometimes referred to as “Caucasian”) maternal background, it is very likely to encounter diverse clades of haplogroup H that encompasses ∼40% in many European populations [20], [21], [22]. The currently available CR data convey a rather uniform picture of distribution, while pivotal studies at higher levels of phylogenetic resolution have demonstrated a cluster of >100 distinct radiating lineages within this haplogroup, with significant differences in dispersal and frequency (e.g., Refs. [20], [21], [23], [24], [25], [26], [27]). These investigations have so far mostly focused on only few or non-random samples, involved limited codR sequencing, contained no (detailed) information on the donors’ geographic origin or were derived from small regions [10]. Only about one fifth of the H lineages can be distinguished within the CR, but markers are often homoplasic [14], [15], [18], [23] (Fig. 1).
Correspondingly, many West Eurasian individuals exhibit identical CR haplotypes clustering within haplogroup R0 (the CR-MRCA of haplogroup H) only for the lack of further sequence data. This is particularly the case for the most common West Eurasian CR haplotype 16519C 263G 315.1C (relative to the revised Cambridge Reference Sequence, rCRS [28]), observed in numerous sub-clades of haplogroup H [15] with a frequency of 3–4% in any western Eurasian population [22]. It can be anticipated that codR analysis would reveal a high number of different lineages (cf. [12], [29]).
Different strategies have been applied to access codR information in forensic casework [14], [30] but circumvent laborious complete mtGenome Sanger-type sequencing [6], [31], [32]. These have included sequencing only short segments comprising variation considered relevant [33], [34] or, more commonly, targeting distinct markers of a few principal clades in a (usually minisequencing) multiplex assay in order to determine the main haplogroups on macro-region (e.g., Refs. [35], [36]) or even global level [37], [38], [39], or to dissect identical samples or a specific lineage (for haplogroup H, e.g., Refs. [24], [40], [41] (compared in Ref. [42]), [43], [44]). The number of markers that can be included in such an assay is limited; therefore, a selection toward those considered more frequent is usually made. In any approach, countless SNPs are left undetermined and consequently a somewhat rough resolution is yielded – only complete mtGenome sequencing would reveal untargeted private or phylogenetic variation in total.
This loss in discrimination power cannot be compensated even when highly fluctuating “individualizing” markers are included in some assays (cf. [42]). Emerging benchtop high-throughput massively parallel (MPS; or next generation) sequencing solutions, that are very promising in terms of sample throughput, speed, amount of data generated and costs per sample, now make obtaining complete mtDNA information relatively easy. In this pilot study, we investigated the power of entire mtGenome MPS on an identical most common West Eurasian mtDNA CR haplotype sample set, carefully considering methodological challenges [45], [46] and taking advantage of insights from systematic data reviews during recent complete mtGenome etalon (i.e. carefully selected high quality reference, cf. [47]) dataset generation by both Sanger-type sequencing [6], [48] and MPS [49].
Section snippets
DNA samples
An earlier population study [44] included 884 randomly selected individuals representing eight macro-areas of Italy that donated blood samples after informed consent. DNA was extracted using a modified salting out method [50], sequenced for mtDNA CR and typed for 22 codR SNPs to determine the main West Eurasian haplogroups. The samples found to belong to haplogroup R0 were further subjected to a SNP multiplex analysis designed to resolve 17 distinct haplogroup H lineages [44]. In this study, we
Results
Entire mtGenome sequencing revealed an extremely high degree of variation in the samples that shared the CR motif 16519C 263G 315.1C. In total, 28 different haplotypes were discerned within the set of 29 samples analyzed in this study. Two (seemingly) unrelated donors from the same province ([44], Table S1), DB525 and DB559, revealed identical mtGenomes. The haplotype diversity using entire mtGenome information reached 99.8%, corresponding to a random match probability of 0.037 (Table 1). The
MtDNA forensics at its highest resolution
This study strikingly demonstrates the significance of complete mtGenome sequencing in forensic genetic practice: highest mtDNA resolution allowed almost complete discrimination of haplotypes identical in their CR by rendering virtually every one unique in this randomly selected sample. In the previous study that included 39 codR SNPs in addition to the CR [44], a power of discrimination of 72.9% had been reached using the same sample set, compared to 99.8% in this study (Table 1). A
Conclusion and outlook
This study makes the forensic (mito-)geneticist's ultimate desire come true: to discern the “identical” by entering the final genetic phase of mtDNA resolution, the analysis of entire mtGenomes. However, MPS appears – for now – out of reach for most forensic casework laboratories, despite its clear advantages in terms of discrimination power, heteroplasmy detection and phylogenetic assignment, which can act as quality control (cf. [7], [49]). Complete mtGenome Sanger-type sequencing is tedious
Acknowledgments
The authors wish to thank the donors who gave their blood for research and the two anonymous reviewers for their helpful comments. This work was supported by the intramural funding program of the Medical University Innsbruck for young scientists MUI-START, Project 2013042025, the Theodor Körner Fonds zur Förderung von Wissenschaft und Kunst, the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 285487 and the Austrian Science Fund (FWF) [P22880-B12].
References (63)
- et al.
Molecular genetic investigations on Austria's patron saint Leopold III
Forensic Sci. Int. Genet.
(2013) - et al.
Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens
Forensic Sci. Int. Genet.
(2014) - et al.
Extended guidelines for mtDNA typing of population data in forensic science
Forensic Sci. Int. Genet.
(2007) - et al.
Phylogeographic investigations: the role of trees in forensic genetics
Forensic Sci. Int.
(2007) - et al.
MtGenome reference population databases and the future of forensic mtDNA analysis
Forensic Sci. Int. Genet.
(2011) - et al.
The application of mtDNA SNPs to a forensic case
Forensic Sci. Int. Genet. Suppl. Ser.
(2008) - et al.
Titanic's unknown child: the critical role of the mitochondrial DNA coding region in a re-identification effort
Forensic Sci. Int. Genet.
(2011) - et al.
Is it possible to differentiate mtDNA by means of HVIII in samples that cannot be distinguished by sequencing the HVI and HVII regions?
Forensic Sci. Int.
(2000) - et al.
Considerations by the European DNA profiling (EDNAP) group on the working practices, nomenclature and interpretation of mitochondrial DNA profiles
Forensic Sci. Int.
(2001) - et al.
The molecular dissection of mtDNA haplogroup H confirms that the Franco-Cantabrian glacial refuge was a major source for the European gene pool
Am. J. Hum. Genet.
(2004)
EMPOP – a forensic mtDNA database
Forensic Sci. Int. Genet.
A “Copernican” reassessment of the human mitochondrial DNA tree from its root
Am. J. Hum. Genet.
An economical mtDNA SNP assay detecting different mitochondrial haplogroups in identical HVR 1 samples of Caucasian ancestry
Mitochondrion
Forensic mitochondrial coding region analysis for increased discrimination using pyrosequencing technology
Forensic Sci. Int. Genet.
A mitochondrial DNA SNP multiplex assigning Caucasians into 36 haplo- and subhaplogroups
Forensic Sci. Int. Genet. Suppl. Ser.
Evaluation of the 124-plex SNP typing microarray for forensic testing
Forensic Sci. Int. Genet.
Developmental validation of mitochondrial DNA genotyping assays for adept matrilineal inference of biogeographic ancestry at a continental level
Forensic Sci. Int. Genet.
Evaluating the forensic informativeness of mtDNA haplogroup H sub-typing on a Eurasian scale
Forensic Sci. Int.
Evaluation of mitochondrial DNA coding region assays for increased discrimination in forensic analysis
Forensic Sci. Int. Genet.
Testing the performance of mtSNP minisequencing in forensic samples
Forensic Sci. Int. Genet.
Current next generation sequencing technology may not meet forensic standards
Forensic Sci. Int. Genet.
Evaluation of next generation mtGenome sequencing using the Ion Torrent Personal Genome Machine (PGM)
Forensic Sci. Int. Genet.
Application of a west Eurasian-specific filter for quasi-median network analysis: sharpening the blade for mtDNA error detection
Forensic Sci. Int. Genet.
High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq
Forensic Sci. Int. Genet.
A modular real-time PCR concept for determining the quantity and quality of human nuclear and mitochondrial DNA
Forensic Sci. Int. Genet.
Concept for estimating mitochondrial DNA haplogroups using a maximum likelihood approach (EMMA)
Forensic Sci. Int. Genet.
Reduced-median-network analysis of complete mitochondrial DNA coding-region sequences for the major African, Asian, and European haplogroups
Am. J. Hum. Genet.
Inspecting close maternal relatedness: towards better mtDNA population samples in forensic databases
Forensic Sci. Int. Genet.
Application of mtDNA SNP analysis in forensic casework
Forensic Sci. Int. Genet.
Fundamentals of Forensic DNA Typing
‘Mitominis’: multiplex PCR analysis of reduced size amplicons for compound sequence analysis of the entire mtDNA control region in highly degraded samples
Int. J. Legal Med.
Cited by (25)
Sequence diversity of the uniparentally transmitted portions of the genome in the resident population of Catalonia
2022, Forensic Science International: GeneticsCitation Excerpt :All these datasets are deposited to EMPOP and underwent the corresponding EMPOP [28] quality controls (mtDNA reference dataset 1) (Table 1). To increase geographic coverage of the reference datasets, we also downloaded from Genbank in FASTA format the following datasets, all of which were published as population studies: 181 French [49–54], 204 North Africans (32 Mozabites, 19 Egyptian non-Imazighen, 48 Moroccan non-Imazighen, 47 Moroccan Imazighen, and 52 Tunisian non-Imazighen [49,52,55–66]), 65 Portuguese [64,67], 352 Italians and 28 Sardinians [52,53,55,62,68–77] (mtDNA reference dataset 2). For the mtDNA control region analyses, we included an additional set of reference datasets with control region sequences deposited at EMPOP (Table 1).
Searching the undetected mtDNA variants in forensic MPS data
2020, Forensic Science International: GeneticsCitation Excerpt :In this study, the threshold for heteroplasmic mixtures was fixed at 10 % of total coverage therefore only variants with minor allele frequency above this threshold were reported as heteroplasmic sites in the final dataset (Table 1). The choice of the heteroplasmy detection threshold was based on previous studies on the Ion platform, where thresholds ranging from 5 % to 20 % were applied [4,13,15,26,28,33]. Moreover, we checked for very low level heteroplasmic substitutions (< 10 % and > 90 %) throughout the control region in all samples in order to evaluate the reliability of the selected threshold.
The lot-to-lot variability in the mitochondrial genome of controls
2020, Forensic Science International: GeneticsMitochondrial DNA variation in Sub-Saharan Africa: Forensic data from a mixed West African sample, Côte d'Ivoire (Ivory Coast), and Rwanda
2020, Forensic Science International: GeneticsCitation Excerpt :The single haplotype of K1a, a lineage found across West Eurasia according to EMPOP [5], might be attributable to more recent migration to Rwanda and is, intriguingly, shared with the dataset from Somalia [26]. Haplotypes assigned only to basal clades, most notably two L3* representatives in the West African dataset (Table S4), pinpoint the potential of additional sampling and coding region sequencing towards a more detailed haplogroup affiliation [35,36]. Point heteroplasmies (all transitions) were observed at eleven different positions in ten samples of the West African dataset (204Y [twice], 16086Y, 16093Y [four occurrences], 16189Y, 16264Y, 16286Y, 16344Y, 16390R, 16400Y, 16526R, 16527Y) (Table S1) and eight positions in seven samples of the Rwanda (151Y, 152Y, 200R, 248R, 338Y, 16093Y, 16129R, 16172Y) (Table S3).
Validation of NGS for mitochondrial DNA casework at the FBI Laboratory
2020, Forensic Science International: GeneticsCitation Excerpt :Mitochondrial DNA (mtDNA) testing in forensic casework is generally performed with amplicons targeting the non-coding control region (CR) and capillary electrophoresis-based Sanger sequencing. However, over the past five or so years, next generation sequencing (NGS) has been shown to be an equally robust technology for the development of forensic quality mitochondrial DNA sequence data [1–11]. NGS not only offers the capability to reduce workflows, but it also allows for the generation of larger, more informative genetic data sets at higher throughput and overall lower cost per nucleotide than capillary electrophoresis-based methods [12,13].
Resolving mitochondrial haplogroups B2 and B4 with next-generation mitogenome sequencing to distinguish Native American from Asian haplotypes
2019, Forensic Science International: GeneticsCitation Excerpt :However, with next-generation sequencing (NGS), entire mitogenome sequencing is becoming more commonplace now due to gains in cost-effectiveness and efficiency over Sanger sequencing methods. The supplementary coding region (codR) data produced by mitogenome sequencing can substantially increase the discriminatory power of mtDNA by revealing haplogroup-defining variants and lineage variants seen in both the unknown sample and the comparison sample [1–3]. Additionally, due to the enhanced sensitivity of NGS, it is now possible to reconstruct mitogenomes and produce high-quality data from degraded and ancient samples [4–11].