Introduction

Autosomal STR markers are well-established and highly effective tools widely used for genetic identity and relationship testing1. X chromosome STRs, a complementary tool to autosomal STR and mitochondrial DNA (mtDNA) markers, can be used in forensic investigations such as complex kinship analysis2. For example, X-STR loci are especially useful for half-sister deficiency paternity cases3,4. Moreover, higher mean exclusion chance (MEC) values are obtained when using X chromosome markers in trios involving daughters4.

The use of X-STRs requires a precise knowledge of not only allele and haplotype frequencies but also the genetic linkage and linkage disequilibrium (LDE) status among markers5. Linkage refers to the co-segregation of closely located loci in a pedigree, while LDE measures allele co-segregation at a population level6. In our unpublished data obtained from Southern Han family samples, the analyzed 19 X-STR loci multiplex system included seven clusters of closely linked markers: DXS10148-DXS10135-DXS8378, DXS10159-DXS10162-DXS10164, DXS7132-DXS10079-DXS10074-DXS10075, DXS6809-DXS6789, DXS7424-DXS101, DXS10103-HPRTB-DXS10101 and DXS10134-DXS7423 (located at Xp22, the centromere, Xq12, Xq21, Xq22, Xq26, and Xq28, respectively and each spanning less than 3 cM, similar to the previous research5) which increasing the power of discrimination for joint consideration of many X STRs at a time. LDE can be assessed from allele and haplotype frequencies and alleles of closely linked X chromosomal loci can be evaluated as a haplotype rather than single STRs. However, grouping markers into haplotypes may lead to partially redundant information (corresponding to reduce the markers used in multiplex system) when performing kinship testing7. Therefore, it is necessary to investigate the LDE of the 19 above-mentioned markers and to calculate the efficacy of these loci through single locus and haplotype frequency analyses to assess their potential use in forensic practices.

Results and Discussion

Polymorphism

The genotyping results of the 932 unrelated individuals from the four ethnic groups were successfully typed with the newly developed 19 X-STR loci multiplex system. Allele frequencies between female and male samples in all ethnic groups were not significantly different in the examined loci based on a Wilcoxon signed-ranks test (p ≤ 0.05). Hardy-Weinberg equilibrium (HWE) tests were performed on female samples. Based on a significance level of 0.05, the DXS10079 and DXS7424 markers in the Southern Han population; DXS10135 and DXS10134 in the Tibetan population; DXS10148, DXS10159 and DXS101 in the Uighur population; and DXS6809 in the Hui population all showed departures from HWE. However, no significant deviations from HWE were observed after Bonferroni corrections (P = 0.05/171 = 0.00029).

For these 932 samples, the number of observed alleles varies from 8 to 32 across the different loci. The allele frequencies are shown in Supplementary Tables S1–S10 and the power of discrimination in those females (PDf) and males (PDm), the polymorphism information content (PIC), the observed heterozygosity (Ho), the expected heterozygosity (He), the mean exclusion chance (MEC), the combined power of discrimination for the females (CDPf) and males (CDPm), and the combined mean exclusion chance in duo cases (CMECd) for the 19 loci in the Southern Han, Tibetan, Uighur and Hui ethnic groups were all shown in Tables 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10. The typing results for the 9947A control DNA were consistent with those reported in the X chromosome database shown in Supplementary Tables S1–S10. Ho and He are both greater than 0.7 for all markers and, specifically, greater than 0.75 for the DXS8378, DXS10162, DXS10164, DXS7424, DXS7423, DXS10148, DXS10135, DXS10159, DXS10101 and DXS10134 markers. The PIC values of all the selected loci were greater than 0.6 except for those of the DXS8378 marker in the Southern Han and Hui populations, the DXS10164 marker in all groups, and the DXS7423 marker in the Southern Han, Tibetan and Hui populations. The finding of low PIC value in DXS7423 was consistent to the result in Guanzhong Han, Shaanxi province, Western China8. The PIC values for the DXS10134, DXS10135, DXS10148 and DXS10101 markers were all greater than 0.8 across all ethnic groups. Meanwhile, the PIC values for the DXS10164 and DXS7423 markers were less than 0.5, which is consistent with the results of Liu et al.9. We found that DXS10134, DXS10079, DXS10135, and DXS10101 were the most polymorphic loci. All markers possessed high forensic efficiency values within the studied population samples, supporting the benefits of using multiplexes in forensic practices.

Table 1 Forensic parameters of 19 X-STR loci among the four ethnic populations.
Table 2 Forensic parameters of 19 X-STR loci among the four ethnic populations.
Table 3 Forensic parameters of 19 X-STR loci among the four ethnic populations.
Table 4 Forensic parameters of 19 X-STR loci among the four ethnic populations.
Table 5 Forensic parameters of 19 X-STR loci among the four ethnic populations.
Table 6 Forensic parameters of 19 X-STR loci among the four ethnic populations.
Table 7 Forensic parameters of 19 X-STR loci among the four ethnic populations.
Table 8 Forensic parameters of 19 X-STR loci among the four ethnic populations.
Table 9 Forensic parameters of 19 X-STR loci among the four ethnic populations.
Table 10 Forensic parameters of 19 X-STR loci among the four ethnic populations.

Linkage disequilibrium

A previous study showed that LDE between markers more than 5 Mb apart is unlikely10. To validate this theory, LDE was estimated for all pairs of markers in the four population groups. In addition, gametic associations were tested for all pairs of loci in the male samples11. The P values for the LDE exact tests are listed in Table 11. Significant associations were found between all pairs, including between DXS10103 and DXS10101 in all four ethnic groups; between DXS10159 and DXS10162, DXS6809 and DXS6789, HPRTB and DXS10101 in the Tibetan population; and between DXS10074 and DXS10075 in the Uighur population. These pairs showed a significant LDE even after Bonferroni correction (P = 0.05/171 = 0.00029). These results suggested that these loci pairs could be treated as haplotype clusters or blocks. For markers showing strong LDE, population data could directly lead to the estimation of haplotype frequencies. The haplotype frequencies and the forensic parameters for DXS10103-DXS10101 in all four ethnic groups; for DXS10159-DXS10162, DXS6809-DXS6789, and DXS10103-HPRTB-DXS10101 in the Tibetan population; and for DXS10074 –DXS10075 in the Uighur population are shown in Supplementary Tables S11–S15. Seventy-five haplotypes were observed for the DXS10103-DXS10101 pair in all 631 male samples, and the PIC and PDm values for this haplotype were both greater than 0.9. The DXS10103-DXS10101 pair was had also been treated as haplotype in Shanghai Han and Taiwanese Han populations in previous studies12,13.

Table 11 P value for LDE in four ethnic groups.

There are 11 X-STR loci that are also used for genetic testing in the Investigator Argus X-12 human identification kit (Qiagen, Hilden, Germany)12. These 11 shared loci were marked with an asterisk in Fig. 1. According to previous studies, even when the physical distance between loci is very small, recombination and crossing-over might still happen14. While DXS101-DXS7424 and DXS6789-DXS7424 were previously reported to be in linkage disequilibrium in a northwestern Italian population and other populations15,16, no evidence for LDE in DXS101-DXS7424 was observed in this study. Further studies should be performed to more thoroughly assess the linkage between markers and better define the proposed linkage groups.

Figure 1: The ideogram of the X-chromosome describes the genetic positions of the 19 X-STR loci and their physical location in the X chromosome.
figure 1

Distances from the p-telomere are shown in cM and Mb. Asterisks (*) indicate the 11 X-STR loci that are shared with the Investigator Argus X-12 kit (Qiagen, Hilden, Germany).

The forensic statistical parameters found for the five haplogroups are shown in Table 12. PIC values of all loci were greater than 0.95 except for DXS10159-DXS10162 in the Tibetan population and DXS10074-DXS10075 in the Uighur population. The He values are all greater than 0.95, and the haplotype diversity values are greater than 0.95 except for DXS6809-DXS6789 and DXS10103-HPRTB-DXS10101 in the Tibetan population and for DXS10103-DXS10101 in the Hui population. The PDf values are all greater than 0.99, and the MECd values are all greater than 0.9 except for DXS10159-DXS10162 in the Tibetan population. All haplotypes showed high forensic efficiency values that reflect their utility for forensic uses.

Table 12 Forensic statistical parameters of the five haplogroups.

Comparisons among the four ethnic groups

Allele frequency distribution comparisons were performed among these four ethnic populations. The allele frequency distribution showed significant differences for most of the loci among these four Chinese ethnic groups; based on these results, population analyses were performed separately for each individual population (Supplementary Table S16). Significant differences were found for 11 loci between the Han and Tibetan populations, for 1 locus between the Han and Hui populations, and for 16 loci between the Han and Uighur populations. Based on these results, the Hui population is genetically closer to the Southern Han populations than to the Tibetan and Uighur populations.

The allele frequencies of these four Chinese populations were also compared with those from other populations, including the Chinese Northern Han population17, a Korean population18, a population from Japan19, a population from northern Germany20, the Polish Tatars21, a northern Italian population22, a population from Spain23, and an Ecuadorian Kichwa population24 (Tables S17–S20). We found no significant differences between the Southern Han and Northern Han populations. This result was not consistent with Shin’s findings25, probably because of the different loci assayed. Meantime, the allele frequency distribution comparisons between Southern Han and Guanzhong Han,which study concerning the same panel as our8, presented no significant differences in Table S22. While the value are much greater among Guanzhong Han and Tibet. Uighur. Hui than Southern Han ethnic groups in PIC, He, CDPf, CDPm CMECt and CMECd 8 in Table S23. We did find significant differences for most of the loci among the Southern Han, Tibetan, Uighur, Japanese, Northern German, Polish Tatars, Northern Italian, Spanish and Ecuadorian Kichwa populations (Supplementary Tables S17–S20). However, we found no significant differences among the Southern Han, Hui and Korean populations, except for the DXS8378 and DXS6789 loci.

The F-statistic (Fst) is often used in forensic sciences to measure population substructure23. The maximum observed Fst value was 0.01142 (p = 0.00000 ± 0.0000) for the Tibetan and Uighur populations, whereas the minimum Fst value was 0.00128 (p = 0.46847 ± 0.0572) for the Southern Han and Hui populations (Table 13). These results were consistent with the existence of population substructure within the above mentioned populations. However, these results differ from previous STR studies that showed the smallest and the largest genetic distance between the Southern Han and Uighur populations and the Tibetan and Hui populations respectively26. A possible explanation for this discrepancy might be that the Hui populations assayed in the two studies are from different geographical regions in China (Kansu and Sinkiang in a previous study and Ningxia Hui Autonomous region in our study).

Table 13 Computing conventional F-Statistics from haplotype frequencies in four ethnic groups.

Forensic efficiency parameter data

The forensic efficiency parameter data were calculated based on the observed haplotype frequencies when loci were in LDE and allele frequencies in the four ethnic groups, respectively. Therefore, each haplotype is supposed to behave as an allele. The 19 markers are treated as 18 loci in the Southern Han population, as 15 loci in the Tibetan population, as 17 loci in the Uighur population and as 18 loci in the Hui population. The CDPf value was 1.000000000000000, the CDPm value was over 0.999999999997940, the CMECd value was above 0.999999991939326, and the CMECt value was above 0.999999999989069 (Table 14). The CDP and CMEC values were in declining when LDE loci was treated as haplotype rather than just separated. Contributed to this theory, the values of CDPm and CMEC shown smaller in our Southern Han study than in Guanzhong Han which calculated the forensic statistical parameters on allele frequencies8. These results showed that the 19 X-STR loci were highly polymorphic and could provide valuable information for forensic analysis13. This set of markers may indeed be very useful for kinship testing, as well as for human identification.

Table 14 Combined Forensic efficiency parameters calculated according to both allele frequencies and haplotype frequencies of the 19 X-STR loci in four ethnic group respectively.

A recombination study of two-generation families with two or more children

Pairwise linkage studies and recombination fraction (θ) calculations were performed for the 19 X-STR loci. The maximum likelihood (LOD) scores for all pairwise linkage analyses in females are shown in the Supplementary Table S21. Several marker pairs showed significant linkage (maximum LOD scores >3). The number of informative meioses ranged from 48 to 87. LOD scores and recombination fractions for adjacent X-STR markers are listed in Table 15. The recombination fraction estimation is necessary for the calculation of likelihood ratios when linked markers are used. It has been previously shown that X-STR recombination rates among populations may differ27,28. In our study, recombination among the STR clusters was inferred from Southern Han families with two or more children. We did not observe many recombination events between tightly linked markers, though they had been previously found by other researchers between the DXS10079-DXS10074 and the DXS6809-DXS6789 markers with physical distances <1.0 Mb29. As suggested by previous reports, recombination estimates should be taken with caution when closely linked X-STRs are considered as stable haplotypes in kinship analysis30. However, no recombination events were observed within the seven linked clusters in our study. In our study, the recombination fractions observed for all pairs are in the 95% CIs. More family samples and/or more generation pedigrees are needed to obtain a better estimation of recombination events.

Table 15 The recombination study of 40 two-generation families with two or more children.

Phylogenetic analyses

As shown in Table 16, the Reynolds study findings showed that the smallest genetic distance between the Southern Han and the Hui populations (0.00128) followed by the Southern Han and the Tibetan populations (0.00631) and the Tibetan and Hui populations (0.00722). As to the largest genetic distance, first one was between the Tibetan and Uighur populations (0.01149), followed by the Han and Uighur populations (0.01075) and the Hui and Uighur populations (0.00900). Based on the Reynolds study, multidimensional scaling (MDS) analysis was performed to evaluate the phylogenetic relationships among the four Chinese ethics groups (Fig. 2) (the significance of the MDS plot data was confirmed using a chi-square test). The Tibetan and Uighur populations at the upper portions of MDS plot segregated as distant outliers, revealing that the Hui and Han population were more genotypic resembling, which may due to their geographical proximity and historic distributions. A possible explanation is that intra-population marriages are more frequent in Han and Hui populations, while inter-population marriages are more common in Tibetan and Uighur populations.

Table 16 Reynolds genetic distance between populations.
Figure 2: 3-D multidimensional scaling (MDS) plot of the four populations (Han, Tibetan, Uighur and Hui) built using Matlab and based on the Reynolds genetic distances.
figure 2

Han short for Southern Han.

Conclusions

In this study, we investigated genetic polymorphisms in four Chinese ethnic groups. We tested linkage disequilibrium in 19 X-STR loci and found that these X-STR loci were not independent from each other. Haplotypes of loci in LDE was crucial and meaningful to calculate the exact value of CDP and CMEC in relationship identification case and kinship testing. Hence, allele and haplotype frequencies were both considered when we calculated forensic parameters in this study. In addition, the results indicated that most X-STR allele frequency were shown in a specific population. What is more, the different STR loci applied in genectic distanct calculation contribute to the estimation of far or close relationship among the ethnic groups. Moreover, to achieve a better understanding of genetic structure and inter-population relationships, larger sample sizes from wider geographic area are needed for further evaluation.

Materials and methods

Sample collection and DNA extraction

In this study, we collected blood from 932 individuals with no relationship from four ethnic groups in Mainland China with informed consent. Han is the main ethnic group in China, while Tibetan, Uighur and Hui populations are minorities. Our sample included 308 Han subjects (106 females and 202 males) from the Guangdong, Jiangxi, Hunan, and Guangxi Zhuang Autonomous Region in Southern China; 213 Tibetan subjects (61 females and 152 males) from Lhasa City in Tibet Autonomous Region; 211 Uighur subjects (66 females and 145 males) from Korla City in Xinjiang; and 200 Hui subjects (68 females and 132 males) from the Ningxia Hui Autonomous region. Additionally, 40 two-generation Southern Han families with two or more children (94) were tested for the recombination study. AmpFlSTR Identifiler PCR kit purchased from Applied Biosystems, were utilized. Each potential blood donor was investigated for their aboriginal ancestry before and after sample collecting. Only unrelated individuals were sampled. Human blood samples were collected upon approval by the Ethics Committee at the Institute of Forensic Sciences, Ministry of Justice, P R China. All the methods were carried out in accordance with the approved guidelines of the Institute of Forensic Sciences, Ministry of Justice, PR China.

We extracted DNA from samples with magnetic beads (DNA IQ System) on the Maxwell 16 Research System (Promega, Madison WI, USA) and made quantification analysis by 7500 Real-time PCR System following the Human DNA Quantification Kit instruction manual (Thermo Fisher Scientific). Co-amplification of 19 X-STR loci (DXS7423, DXS10148, DXS10159, DXS6809, DXS7424, DXS8378, DXS10164, DXS10162, DXS7132, DXS10079, DXS6789, DXS101, DXS10103, DXS10101, HPRTB, DXS10075, DXS10074, DXS10135 and DXS10134) was performed by following the protocol described in the validation research31. For PCR experiment, 1 μL of template DNA, 4 μL of reaction mix, 2 μL of primers, 0.2 μL of A-Taq DNA polymerase, and sdH2O were added to a volume of 10 μL solution for reaction. The same cycling parameters were selected for the direct amplification of our samples31, with a 1.2 mm punch from FTA blood cards.

Markers and genotyping

The amplified products were resolved and detected by capillary electrophoresis (CE) with PO denaturing polymers (Thermo Fisher Scientific) in the AB 3130xl Genetic Analyzer (Applied Biosystems, Foster City, CA) following the manufacturer’s manual. The 9947A cell line (Promega, Madison WI, USA) was used as a positive control in all experiments. Negative controls were also included in all experiments. The CE conditions were as follows: sample injection for 5 s at 3 kV, electrophoresis at 15 kV for 1500 s at 60 °C. Gene fragment sizes were determined with GeneMapper ID software (v.3.5) at the detection threshold of 50 RFU.

Analytical method

The allele and haplotype frequencies for the 19 X-STR were calculated using PowerStat version 1.2 (Promega, Madison WI, USA)32. For the male samples33, pairwise LD between all pairs of the 19 loci and HWE were tested for each locus using Powermarker software (version 3.25)34. For the female samples, Fst and Reynolds genetic distances were calculated using ARLEQUIN software(version 3.5)35. MATLAB software (version R2013a) was conducted to obtain forensic parameters based on following allele and haplotype frequencies: Ho, He, PIC36, PDf, PDm. While MEC were measured by referring to methods proposed by Desmarais et al.37, while CDPf, CDPm, CMECd, CMECt and the MDS plot were calculated according to Zhang et al.13. The maximum LOD scores and θ were estimated using the Mendel v12 software based on the LOD method described in ref. 38. Then, 95% CIs for θ were computed using this online tool http://statpages.org/confint.html. Allele and haplotype frequency distributions for the four ethnic groups were compared with a Chi-square test using SPSS 16.0 with 10,000 permutations39.

Additional Information

How to cite this article: Yang, X. et al. Genetic analysis of 19 X chromosome STR loci for forensic purposes in four Chinese ethnic groups. Sci. Rep. 7, 42782; doi: 10.1038/srep42782 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.