Introduction

At present, short tandem repeats (STRs) are widely used in forensic cases for paternity testing and individual identification in worldwide forensic DNA laboratories1,2,3,4. The universal use of STRs leads to the establishment of well constructed forensic DNA databases and development of simple typing methodologies. However, the STR-based genotyping in forensic applications has some limitations as follows, the relatively long amplicon size of STR would have negative influence in analyzing highly degraded DNA samples from crime cases5,6,7; artifacts, such as stutter peaks would add ambiguity to mixture analysis8,9; and the relatively high mutation rate of STR (approximately 10−3)10 would confound the kinship results. To overcome such limitations for STR loci, single nucleotide polymorphisms (SNPs) are considered as an alternative and supplementary markers11. SNPs have smaller amplicons than STRs; do not produce stutter peaks in typing profilers; and have a lower mutation rate (approximately 10−8)12,13. Nevertheless, SNP-based genotyping is usually complex, expensive and platform-dependent, hard to be conducted in common forensic laboratories14,15.

Insertion deletion polymorphisms (InDels) are recently gained the attention of forensic scientists. InDels as biallelic polymorphic markers that caused by the insertion or deletion of bases, combine the desirable characteristics of both SNPs and STRs. Similar to SNPs, InDels have small amplicons and relatively low mutation rates16. However, as length polymorphisms, InDels can be genotyped using capillary electrophoresis which is available in common forensic laboratories. InDels are the second abundant DNA polymorphisms after SNPs17. About 20% of the variations in human genome are InDels18. In 2002, Weber et al.18 reported 2000 biallelic human InDels and the population data in four groups (Europeans, Africans, Japaneses and Native Americans). Later in 2006, Mills et al.19 provided an initial map of InDels with more than 415,000 polymorphisms. Since then, InDels were used for a wide range of purposes such as study the biogeographic ancestry of human population20, use as genetic markers in natural populations21,22 and assess individual interethnic admixture and population substructure23. So far, one commercial InDel kit is available, i.e the Qiagen Investigator DIPplex kit (Qiagen, Hilden, Germany) and some population data obtained using this kit were published22,24,25,26. The allele frequencies of InDels are different among population groups in geographically separated areas. This makes InDels got the potential as ancestry informative markers18,27. However, at the same time, this also makes it necessary for us to assure the population indices before use InDels in new populations.

Xibe is one of the 56 ethnic groups officially recognized by the People's Republic of China. The Xibe group is widely distributed in the northern part of China from Xinjiang Uygur Autonomous Region in the west to Jilin and Liaoning provinces in the east (http://english.peopledaily.com.cn/102759/7567650.html). In the 6th China population census in 2010, the population of Xibe is 190,481, ranking the 31st in all the ethnic groups in China.

In this study, we used the mentioned Investigator DIPplex kit to obtain the population data of the studied Xibe ethnic group. To date, no genetic data of these 30 InDel loci was reported in Xibe group. The present study provided the population data and enriched the genetic informational resources of Chinese minority ethnic groups. The data were then used to calculate the forensic and population parameters and to make comparisons with other populations reported previously, providing information for the potential use of these loci in forensic cases and furthering the understanding of the genetic relationships between the Xibe group and other groups.

Methods

Sample preparation

Bloodstain samples were collected from 223 unrelated healthy Xibe individuals living in Ili, Xinjiang Uygur Autonomous Region, China. Before sample collection, all the participants signed the informed consent after given an explanation about this study. Volunteers in this study should have ancestors living in the region for more than three generations and have no common ancestry tracing back more than three generations. The study was conducted in accordance with the human and ethical research principles of Xi'an Jiaotong University Health Science Center, China and approved by the ethics committee of Xi'an Jiaotong University Health Science Center. The genomic DNA was extracted from bloodstain samples using the Chelex-100 method as described by Walsh et al.28.

InDel typing

In this study, the commercially available InDel kit: Investigator DIPplex kit (Qiagen, Hilden, Germany) was used for InDel genotyping of 30 InDel loci. According to the manufacturer's protocol, we optimized the PCR volume to 25 μL, containing 5 μL Reaction Mix A, 5 μL Primer Mix and 0.6 μL Multi Taq2 DNA Polymerase, the template DNA and nuclease-free water. Amplification was carried out by a GeneAmp PCR System 9700 Thermal Cycler (Applied Biosystems, Foster City, CA, USA) under the following conditions: initial denaturation at 94°C for 4 min, followed by 30 cycles of 94°C for 30 s, 61°C for 2 min, 72°C for 75 s and additional 60 min at 68°C. Electrophoresis was performed using the ABI PRISM 3130 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) under the conditions described in the manufacturer's recommendations using the denaturing polymer POP-4. Fragment sizing was supported using the BTO 550 (Qiagen, Hilden, Germany) as internal lane standard. The alleles were genotyped using the GeneMapper® ID software v3.2 (Applied Biosystems, Foster City, CA, USA). Control DNA 9948 was used as amplification positive control.

Quality control

The study was conducted following ISFG recommendations on the analysis of the DNA polymorphisms as described by Schneider29.

Statistical analyses

Allele frequencies and forensic efficiency parameters including observed (Ho) and expected (He) heterozygosity, Hardy-Weinberg equilibrium (HWE), polymorphic information content (PIC), power of exclusion (PE), discrimination power (DP) and typical paternity index (TPI) were calculated using the modified powerstat (version1.2) spreadsheet. Fst value was used to measure variance in allele frequencies among different populations. Population structure analysis was conducted using the structure program (version 2.2)30. Principal component analysis (PCA) based on allele frequencies was performed in MATLAB 2007a (MathWorks Inc., USA). Phylogenetic reconstruction was conducted utilizing the genetic distance and using phylogenetic analysis (DISPAN) program.

Results and discussions

Forensic parameter analysis

Allele frequencies and forensic statistical parameters of the 30 InDel loci were shown in Table 1. The Ho ranged from 0.1704 at HLD118 locus to 0.5247 at HLD92 locus while the He ranged from 0.1559 at HLD118 locus to 0.4997 at HLD101 locus. In the test of HWE, the genotype frequency distributions showed no significant deviations from expectations, with the lowest p-value of 0.0660 at HLD97 locus. PIC of all selected loci ranged from 0.1437 (HLD118) to 0.3749 (HLD101). The highest PE was found at HLD92 locus (0.2100), with the lowest found at HLD118 locus (0.0222). The values of DP were in the range of 0.2827 (HLD118)-0.6322 (HLD101). The highest, lowest and average TPI were 1.0519 (HLD92), 0.6027 (HLD118) and 0.8725, respectively. The cumulative power of exclusion (the probability to exclude the unrelated male from putative father using the full-loci data, indicating the forensic efficiency of the loci in paternity testing in the group) and combined discrimination power (the probability to distinguish two randomly chose individuals using the full-loci data, indicating the forensic efficiency of the loci in individual identification in the group) were 0.9867 and 0.9999999999902 for the DIPplex kit in the studied Xibe group, respectively. The value of cumulative power of exclusion was relatively low, which indicated that the DIPplex kit should only be used as a complement for autosomal STRs in paternity cases, such as in cases with mutations on autosomal STRs. Meanwhile, the value of combined discrimination power was high enough to give an acceptable level of discrimination in forensic identification cases.

Table 1 Allele frequency distribution and forensic statistical parameters of the 30 Indel loci in Chinese Xibe ethnic group (n = 223)

Clustering by structure analysis

We analyzed the population structures of the studied Xibe group and 10 referenced groups (South Korean31, (Madrid) Central Spanish, Basque (Northern Spanish)32, Hungarian33, Dane34, Beijing Han (Northern Han Chinese), Tibetan, Uigur, Kazak22 and Guangdong Han (Southern Han Chinese)35) by the structure program. The result was shown in Fig. 1. At K = 2, the clusters were anchored by Europe and Asia and the 4 European groups and 5 Asian groups were constituted almost entirely by green and red component, respectively. At the same time, Kazak and Uigur group showed a mixed constitution of both green and red component. At K > 2, the 4 European groups and the 2 Eurasian groups (Kazak and Uigur groups) all had a mean component representing the European descent, while the 5 Asian groups had partial membership in K-1 clusters with similar membership proportions. According to the structure manual30, this indicated that the 5 groups showed similar membership proportions at the 30 loci. Moreover, as mentioned by Rosenberg et al., groups which had similar membership proportions in different clusters might reflect continuous gradations in allele frequencies across regions or admixture of neighboring groups36.

Figure 1
figure 1

Clustering analysis by structure for the full-loci dataset assuming K = 2–7.

Population names were labeled beneath.

Principal component analysis

PCA was performed among the studied Xibe group and other 10 reference groups on the 30 InDel loci. The result was shown in Fig. 2. The first two principal components defined 85.17% of the total variance, with the first and second component accounted for 77.62% and 7.55%, respectively. According to the figure, the 4 European groups and 5 Asian groups distributed in the right and left part, respectively, with the 2 Eurasian groups (Kazak and Uigur groups) located between them, that is, in the middle of the plot. The studied Xibe group clustered in the upper left quadrant, close to Beijing Han, Guangdong Han and South Korean populations, indicating the close genetic relationships between the studied Xibe group and these groups.

Figure 2
figure 2

PCA based on 30 InDel loci of Xibe and 10 reference groups.

Interpopulation differentiation

The studied Xibe group was compared with previously published groups at the 30 InDel loci utilizing analysis of molecular variance method. The Fst and p-values were shown in Table 2. Statistically significant differences (p < 0.05) were observed between the studied Xibe group and South Korean group at 4 loci; Madrid group at 17 loci; Basque group at 17 loci; Hungarian group at 18 loci; Dane group at 16 loci; Beijing Han group at 2 loci; Tibetan group at 6 loci; Uigur group at 9 loci; Kazak group at 7 loci; Guangdong Han group at 6 loci. According to the results, East Asian (South Korean, Beijing Han and Guangdong Han groups) and Eurasian groups (Kazak and Uigur groups) had fewer differences (significant differences found at less than 10 loci) with the Xibe group than Europe groups (significant differences found at more than 15 loci), being consistent with the geographic distances between the studied Xibe group and these groups. Among the 30 loci, the highest ethnic diversity was obtained at 6 loci (HLD39, HLD58, HLD64, HLD84, HLD99 and HLD111) with significant differences found between the studied Xibe group and 7 other compared groups, followed by HLD81, HLD118 and HLD131 (6 groups). The lowest ethnic diversity was obtained at 4 loci (HLD70, HLD92, HLD101 and HLD124) with no significant difference found between the studied Xibe group and all other compared groups. The results showed that, ethnic diversity varied between different InDel loci. Some loci showed no significant difference even between groups from different continent, while others would have significant differences between groups with relatively close relationship. Hence, study of more InDel loci in more ethnic groups should be useful for screening InDel loci with different ethnic diversities for different purposes. And the genetic profile would also help us to gain a better understanding of the national evolutionary history.

Table 2 Fst and p-values of pairwise InDel loci between Chinese Xibe group and other groups at 30 InDel loci

Phylogenetic analysis

Phylogenetic reconstruction was conducted to illustrate the genetic relationships between the studied Xibe group and the reference groups using the unweighted pair-group method with arithmetic means method (UPGMA). The UPGMA tree was shown in Fig. 3. The dendrogram showed 2 main clusters. The first cluster was composed of two branches: one included Hungarian, Madrid (Central Spanish), Dane, Basque Country (Northern Spanish); the other included Kazak and Uigur, respectively. The second cluster was composed of Beijing Han, Guangdong Han, South Korean, Xibe and Tibetan groups. The result was consistent with the above mentioned results of structure and PCA. The Xibe group was first clustered with the South Korean group, followed by Beijing Han and Guangdong Han group, then the Tibetan group. Zheng et al. constructed a phylogenetic tree based on 17 Y-STR loci (AmpFlSTR® Y-filer™ PCR Amplification kit, Applied Biosystems, Foster City, CA, USA) revealed that the Xibe group from Xinjiang was clustered with Chinese Korean group before clustered with Shandong Han population37. A study on mitochondrial DNA of Xibe group also reported an unrooted Neighbor-Joining tree indicating that Xibe group from Xinjiang had a close relationship with Chinese Korean group, Northern Han group and Southern Han group and the relationship between Xibe and Korean group was closer than that between Xibe and the two Han groups38. Since the ancestors of Chinese Korean ethnic group migrated from the Korean peninsula from about the late 17th century, Chinese Korean ethnic group and South Korean group had the same ancestry origin39. Before the Qing government moved the Xibe ethnic group with people of some other ethnic minorities to Xinjiang to consolidate and reinforce the northwestern border defenses in the mid-18th century, they lived in the northeast China, where was also the residence of Chinese Korean group39. The close geographic distance may lead to intermarriage and gene flow would be one of the reasons that cause the close relationship between Xibe and Korean group.

Figure 3
figure 3

Phylogenic tree constructed by the unweighted pair-group method with arithmetic means based on the 30 InDel loci of the Xibe group and 10 reference groups.

Conclusions

In summary, the results indicated that these 30 loci should only be used as a complement for autosomal STRs in paternity cases but could provide an acceptable level of discrimination in forensic cases in the studied Xibe group. Analyses of structure, principal component analysis, interpopulation differentiations and phylogenetic tree revealed the studied Xibe group had a close relationship with South Korea, Beijing Han and Guangdong Han groups. Further studies of comparison between Xibe group and more reference groups would be helpful for the better understanding of the Xibe genetic background.