Helena, the hidden beauty: Resolving the most common West Eurasian mtDNA control region haplotype by massively parallel sequencing an Italian population sample

https://doi.org/10.1016/j.fsigen.2014.09.012Get rights and content

Highlights

  • We analyzed the most common West Eurasian mtDNA CR haplotype for the coding region.

  • We found 28 different coding region haplotypes in 29 samples identical in the CR.

  • We therefore increased forensic power of discrimination from 0% to 99.8%.

  • We dissected the most common West Eurasian haplotype into numerous haplogroups.

Abstract

The analysis of mitochondrial (mt)DNA is a powerful tool in forensic genetics when nuclear markers fail to give results or maternal relatedness is investigated. The mtDNA control region (CR) contains highly condensed variation and is therefore routinely typed. Some samples exhibit an identical haplotype in this restricted range. Thus, they convey only weak evidence in forensic queries and limited phylogenetic information. However, a CR match does not imply that also the mtDNA coding regions are identical or samples belong to the same phylogenetic lineage. This is especially the case for the most frequent West Eurasian CR haplotype 263G 315.1C 16519C, which is observed in various clades within haplogroup H and occurs at a frequency of 3–4% in many European populations.

In this study, we investigated the power of massively parallel complete mtGenome sequencing in 29 Italian samples displaying the most common West Eurasian CR haplotype – and found an unexpected high diversity. Twenty-eight different haplotypes falling into 19 described sub-clades of haplogroup H were revealed in the samples with identical CR sequences. This study demonstrates the benefit of complete mtGenome sequencing for forensic applications to enforce maximum discrimination, more comprehensive heteroplasmy detection, as well as highest phylogenetic resolution.

Introduction

Forensic DNA analyses are routinely performed by determining the “genetic fingerprint”, i.e. the alleles of polymorphic nuclear microsatellite markers that display high diversity, stability, and Mendelian inheritance, and thus allow identification, individualization and pedigree reconstruction [1]. These markers however do not regularly yield results from compromised samples containing degraded or low quantities of nuclear DNA. The haploid, maternally inherited mitochondrial (mt)DNA has become a vital niche in analyzing those samples due to its abundance and stability as multi-copy circular molecule protected in organelles. As a lineage marker, it can be used to exclude identity or corroborate (even distant) maternal relatedness [2], [3], [4], [5], [6].

The outcome of (forensic) mtDNA investigations, besides precise base calling and the availability of high-quality databases [7], [8], [9], [10], mainly depends on the amount of information generated from the individual sample [11], [12]. Because of financial, technical and legal restrictions, the current standard is to sequence (hypervariable parts of) the ∼1.1 kbp non-coding control region (CR) of the ∼16.5 kbp mitochondrial genome (mtGenome), that contains densely concentrated variation due to a higher mutation rate compared to the remaining segment, the coding region (codR) [7], [13], [14]. CR data enable coarse discrimination of numerous maternal lineages [15] and may yield non-identity (“exclusion”) of donors and their maternal relatives, respectively, in many forensic cases [16], [17]. However, identical CR haplotypes are found on different haplogroup backgrounds across the mtDNA phylogeny, as several lineages are poorly defined in the CR [18]. Consequently, a shared CR haplotype (“non-exclusion”) does not necessarily imply that two mtDNAs are identical in their codR, or even belong to the same lineage.

For these reasons, the current partial analyses of the mtDNA molecule greatly restrict and possibly bias (forensic) interpretation, and are particularly problematic for populations with extremely few CR lineages (e.g., Ref. [19]). For example, when working with samples of West Eurasian (sometimes referred to as “Caucasian”) maternal background, it is very likely to encounter diverse clades of haplogroup H that encompasses ∼40% in many European populations [20], [21], [22]. The currently available CR data convey a rather uniform picture of distribution, while pivotal studies at higher levels of phylogenetic resolution have demonstrated a cluster of >100 distinct radiating lineages within this haplogroup, with significant differences in dispersal and frequency (e.g., Refs. [20], [21], [23], [24], [25], [26], [27]). These investigations have so far mostly focused on only few or non-random samples, involved limited codR sequencing, contained no (detailed) information on the donors’ geographic origin or were derived from small regions [10]. Only about one fifth of the H lineages can be distinguished within the CR, but markers are often homoplasic [14], [15], [18], [23] (Fig. 1).

Correspondingly, many West Eurasian individuals exhibit identical CR haplotypes clustering within haplogroup R0 (the CR-MRCA of haplogroup H) only for the lack of further sequence data. This is particularly the case for the most common West Eurasian CR haplotype 16519C 263G 315.1C (relative to the revised Cambridge Reference Sequence, rCRS [28]), observed in numerous sub-clades of haplogroup H [15] with a frequency of 3–4% in any western Eurasian population [22]. It can be anticipated that codR analysis would reveal a high number of different lineages (cf. [12], [29]).

Different strategies have been applied to access codR information in forensic casework [14], [30] but circumvent laborious complete mtGenome Sanger-type sequencing [6], [31], [32]. These have included sequencing only short segments comprising variation considered relevant [33], [34] or, more commonly, targeting distinct markers of a few principal clades in a (usually minisequencing) multiplex assay in order to determine the main haplogroups on macro-region (e.g., Refs. [35], [36]) or even global level [37], [38], [39], or to dissect identical samples or a specific lineage (for haplogroup H, e.g., Refs. [24], [40], [41] (compared in Ref. [42]), [43], [44]). The number of markers that can be included in such an assay is limited; therefore, a selection toward those considered more frequent is usually made. In any approach, countless SNPs are left undetermined and consequently a somewhat rough resolution is yielded – only complete mtGenome sequencing would reveal untargeted private or phylogenetic variation in total.

This loss in discrimination power cannot be compensated even when highly fluctuating “individualizing” markers are included in some assays (cf. [42]). Emerging benchtop high-throughput massively parallel (MPS; or next generation) sequencing solutions, that are very promising in terms of sample throughput, speed, amount of data generated and costs per sample, now make obtaining complete mtDNA information relatively easy. In this pilot study, we investigated the power of entire mtGenome MPS on an identical most common West Eurasian mtDNA CR haplotype sample set, carefully considering methodological challenges [45], [46] and taking advantage of insights from systematic data reviews during recent complete mtGenome etalon (i.e. carefully selected high quality reference, cf. [47]) dataset generation by both Sanger-type sequencing [6], [48] and MPS [49].

Section snippets

DNA samples

An earlier population study [44] included 884 randomly selected individuals representing eight macro-areas of Italy that donated blood samples after informed consent. DNA was extracted using a modified salting out method [50], sequenced for mtDNA CR and typed for 22 codR SNPs to determine the main West Eurasian haplogroups. The samples found to belong to haplogroup R0 were further subjected to a SNP multiplex analysis designed to resolve 17 distinct haplogroup H lineages [44]. In this study, we

Results

Entire mtGenome sequencing revealed an extremely high degree of variation in the samples that shared the CR motif 16519C 263G 315.1C. In total, 28 different haplotypes were discerned within the set of 29 samples analyzed in this study. Two (seemingly) unrelated donors from the same province ([44], Table S1), DB525 and DB559, revealed identical mtGenomes. The haplotype diversity using entire mtGenome information reached 99.8%, corresponding to a random match probability of 0.037 (Table 1). The

MtDNA forensics at its highest resolution

This study strikingly demonstrates the significance of complete mtGenome sequencing in forensic genetic practice: highest mtDNA resolution allowed almost complete discrimination of haplotypes identical in their CR by rendering virtually every one unique in this randomly selected sample. In the previous study that included 39 codR SNPs in addition to the CR [44], a power of discrimination of 72.9% had been reached using the same sample set, compared to 99.8% in this study (Table 1). A

Conclusion and outlook

This study makes the forensic (mito-)geneticist's ultimate desire come true: to discern the “identical” by entering the final genetic phase of mtDNA resolution, the analysis of entire mtGenomes. However, MPS appears – for now – out of reach for most forensic casework laboratories, despite its clear advantages in terms of discrimination power, heteroplasmy detection and phylogenetic assignment, which can act as quality control (cf. [7], [49]). Complete mtGenome Sanger-type sequencing is tedious

Acknowledgments

The authors wish to thank the donors who gave their blood for research and the two anonymous reviewers for their helpful comments. This work was supported by the intramural funding program of the Medical University Innsbruck for young scientists MUI-START, Project 2013042025, the Theodor Körner Fonds zur Förderung von Wissenschaft und Kunst, the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 285487 and the Austrian Science Fund (FWF) [P22880-B12].

References (63)

  • W. Parson et al.

    EMPOP – a forensic mtDNA database

    Forensic Sci. Int. Genet.

    (2007)
  • D.M. Behar et al.

    A “Copernican” reassessment of the human mitochondrial DNA tree from its root

    Am. J. Hum. Genet.

    (2012)
  • S. Köhnemann et al.

    An economical mtDNA SNP assay detecting different mitochondrial haplogroups in identical HVR 1 samples of Caucasian ancestry

    Mitochondrion

    (2009)
  • H. Andréasson et al.

    Forensic mitochondrial coding region analysis for increased discrimination using pyrosequencing technology

    Forensic Sci. Int. Genet.

    (2007)
  • M. Mikkelsen et al.

    A mitochondrial DNA SNP multiplex assigning Caucasians into 36 haplo- and subhaplogroups

    Forensic Sci. Int. Genet. Suppl. Ser.

    (2008)
  • K. Krjutškov et al.

    Evaluation of the 124-plex SNP typing microarray for forensic testing

    Forensic Sci. Int. Genet.

    (2009)
  • L. Chaitanya et al.

    Developmental validation of mitochondrial DNA genotyping assays for adept matrilineal inference of biogeographic ancestry at a continental level

    Forensic Sci. Int. Genet.

    (2014)
  • L. Pereira et al.

    Evaluating the forensic informativeness of mtDNA haplogroup H sub-typing on a Eurasian scale

    Forensic Sci. Int.

    (2006)
  • M. Nilsson et al.

    Evaluation of mitochondrial DNA coding region assays for increased discrimination in forensic analysis

    Forensic Sci. Int. Genet.

    (2008)
  • A. Mosquera-Miguel et al.

    Testing the performance of mtSNP minisequencing in forensic samples

    Forensic Sci. Int. Genet.

    (2009)
  • H.J. Bandelt et al.

    Current next generation sequencing technology may not meet forensic standards

    Forensic Sci. Int. Genet.

    (2012)
  • W. Parson et al.

    Evaluation of next generation mtGenome sequencing using the Ion Torrent Personal Genome Machine (PGM)

    Forensic Sci. Int. Genet.

    (2013)
  • B. Zimmermann et al.

    Application of a west Eurasian-specific filter for quasi-median network analysis: sharpening the blade for mtDNA error detection

    Forensic Sci. Int. Genet.

    (2011)
  • J.L. King et al.

    High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq

    Forensic Sci. Int. Genet.

    (2014)
  • H. Niederstätter et al.

    A modular real-time PCR concept for determining the quantity and quality of human nuclear and mitochondrial DNA

    Forensic Sci. Int. Genet.

    (2007)
  • A.W. Röck et al.

    Concept for estimating mitochondrial DNA haplogroups using a maximum likelihood approach (EMMA)

    Forensic Sci. Int. Genet.

    (2013)
  • C. Herrnstadt et al.

    Reduced-median-network analysis of complete mitochondrial DNA coding-region sequences for the major African, Asian, and European haplogroups

    Am. J. Hum. Genet.

    (2002)
  • M. Bodner et al.

    Inspecting close maternal relatedness: towards better mtDNA population samples in forensic databases

    Forensic Sci. Int. Genet.

    (2011)
  • S. Köhnemann et al.

    Application of mtDNA SNP analysis in forensic casework

    Forensic Sci. Int. Genet.

    (2011)
  • J.M. Butler

    Fundamentals of Forensic DNA Typing

    (2010)
  • C. Eichmann et al.

    ‘Mitominis’: multiplex PCR analysis of reduced size amplicons for compound sequence analysis of the entire mtDNA control region in highly degraded samples

    Int. J. Legal Med.

    (2008)
  • Cited by (25)

    • Sequence diversity of the uniparentally transmitted portions of the genome in the resident population of Catalonia

      2022, Forensic Science International: Genetics
      Citation Excerpt :

      All these datasets are deposited to EMPOP and underwent the corresponding EMPOP [28] quality controls (mtDNA reference dataset 1) (Table 1). To increase geographic coverage of the reference datasets, we also downloaded from Genbank in FASTA format the following datasets, all of which were published as population studies: 181 French [49–54], 204 North Africans (32 Mozabites, 19 Egyptian non-Imazighen, 48 Moroccan non-Imazighen, 47 Moroccan Imazighen, and 52 Tunisian non-Imazighen [49,52,55–66]), 65 Portuguese [64,67], 352 Italians and 28 Sardinians [52,53,55,62,68–77] (mtDNA reference dataset 2). For the mtDNA control region analyses, we included an additional set of reference datasets with control region sequences deposited at EMPOP (Table 1).

    • Searching the undetected mtDNA variants in forensic MPS data

      2020, Forensic Science International: Genetics
      Citation Excerpt :

      In this study, the threshold for heteroplasmic mixtures was fixed at 10 % of total coverage therefore only variants with minor allele frequency above this threshold were reported as heteroplasmic sites in the final dataset (Table 1). The choice of the heteroplasmy detection threshold was based on previous studies on the Ion platform, where thresholds ranging from 5 % to 20 % were applied [4,13,15,26,28,33]. Moreover, we checked for very low level heteroplasmic substitutions (< 10 % and > 90 %) throughout the control region in all samples in order to evaluate the reliability of the selected threshold.

    • The lot-to-lot variability in the mitochondrial genome of controls

      2020, Forensic Science International: Genetics
    • Mitochondrial DNA variation in Sub-Saharan Africa: Forensic data from a mixed West African sample, Côte d'Ivoire (Ivory Coast), and Rwanda

      2020, Forensic Science International: Genetics
      Citation Excerpt :

      The single haplotype of K1a, a lineage found across West Eurasia according to EMPOP [5], might be attributable to more recent migration to Rwanda and is, intriguingly, shared with the dataset from Somalia [26]. Haplotypes assigned only to basal clades, most notably two L3* representatives in the West African dataset (Table S4), pinpoint the potential of additional sampling and coding region sequencing towards a more detailed haplogroup affiliation [35,36]. Point heteroplasmies (all transitions) were observed at eleven different positions in ten samples of the West African dataset (204Y [twice], 16086Y, 16093Y [four occurrences], 16189Y, 16264Y, 16286Y, 16344Y, 16390R, 16400Y, 16526R, 16527Y) (Table S1) and eight positions in seven samples of the Rwanda (151Y, 152Y, 200R, 248R, 338Y, 16093Y, 16129R, 16172Y) (Table S3).

    • Validation of NGS for mitochondrial DNA casework at the FBI Laboratory

      2020, Forensic Science International: Genetics
      Citation Excerpt :

      Mitochondrial DNA (mtDNA) testing in forensic casework is generally performed with amplicons targeting the non-coding control region (CR) and capillary electrophoresis-based Sanger sequencing. However, over the past five or so years, next generation sequencing (NGS) has been shown to be an equally robust technology for the development of forensic quality mitochondrial DNA sequence data [1–11]. NGS not only offers the capability to reduce workflows, but it also allows for the generation of larger, more informative genetic data sets at higher throughput and overall lower cost per nucleotide than capillary electrophoresis-based methods [12,13].

    • Resolving mitochondrial haplogroups B2 and B4 with next-generation mitogenome sequencing to distinguish Native American from Asian haplotypes

      2019, Forensic Science International: Genetics
      Citation Excerpt :

      However, with next-generation sequencing (NGS), entire mitogenome sequencing is becoming more commonplace now due to gains in cost-effectiveness and efficiency over Sanger sequencing methods. The supplementary coding region (codR) data produced by mitogenome sequencing can substantially increase the discriminatory power of mtDNA by revealing haplogroup-defining variants and lineage variants seen in both the unknown sample and the comparison sample [1–3]. Additionally, due to the enhanced sensitivity of NGS, it is now possible to reconstruct mitogenomes and produce high-quality data from degraded and ancient samples [4–11].

    View all citing articles on Scopus
    View full text