Introduction
In modern forensic science, DNA profiling has become an important tool for human identification and paternity testing. Short tandem repeat (STR) markers, usually composed of 13-17 loci, and recently expanded to 21 or more loci, have generally been used for DNA profiling [
1‐
3]. However, advances in sequencing technologies have enabled the production of large amounts of single-nucleotide polymorphism (SNP) data, and this led to a discussion about the availability of SNP markers in the field of forensic science.
Compared to STRs, SNP markers have the advantage of low mutation rate, small amplicon size, which is advantageous for analysis of degraded samples, and fast and automated analysis [
4,
5]. On the other hand, more SNPs are needed to approach the match probability of STR panels since bi-allelic SNPs are less polymorphic than STRs. Krawczak [
6] and Gill [
7] reported that 50-60 SNPs with allele frequencies close to 0.5 are required to have the same discriminatory power as STR panels. Ayres [
8] suggested that the number of SNPs with allele frequencies in the range [0.3, 0.5] required for the standard trio (father-mother-child) case and duo (father-child) case is 50-60 and 70-80, respectively. However, these studies assumed the use of independent markers. When the number of markers increases, the probability of genetic linkage increases. Since the use of markers that are not independently transmitted can affect the results of the forensic analysis, linkage should be considered in forensic calculations [
9,
10].
Several bi-allelic autosomal marker panels, such as the SNPforID multiplex (52 SNPs) [
11] and the IISNP panel (86 SNPs) [
12‐
14], were developed for human identification and paternity testing. However, if the alleged father (AF) is the close relative of the true father (TF), there may be cases where the number of SNP loci used in existing panels is not enough to perform paternity testing [
15]. In addition, these panels were selected based on allele frequencies of various human populations. As allele frequencies may vary by population, markers selected based on allele frequencies of a certain population may not sufficiently reflect allele frequencies of another population. Paternity testing using incorrect allele frequencies can lead to erroneous results [
16,
17]. Thus, several studies have developed forensic SNP panels for a specific population [
18,
19]. Lee et al. [
20] and Kim et al. [
21] selected highly informative SNPs from Koreans for forensic purposes and provided a database, but the number of markers was 24 and 30, respectively, which was insufficient to perform paternity testing.
In this study, we aimed to select bi-allelic autosomal SNP markers for paternity testing for Korean individuals based on likelihood ratio (LR) principles, where genetic evidence is evaluated by calculating the LR [
22]. Korean SNP data were screened to collect candidate markers. Allele frequencies of retained SNPs were calculated, and based on this information, we selected the appropriate number of markers using simulated family data. Moreover, we examined the performance of final set of SNPs in real cases.
Discussion
After the usefulness of SNP-based human identification and paternity testing was discussed, several sets of forensic SNP markers were developed. SNPforID [
11] and IISNP [
14] are universal forensic SNP panels for various populations. However, SNPforID panel consists of 52 loci, which is an insufficient number of markers to perform paternity testing of duo cases [
8]. Børsting et al. [
28] observed that false association occurred in some duo cases when using SNPforID panel. In addition, according to NCBI dbSNP (
https://www.ncbi.nlm.nih.gov/snp/), 19 and 7 SNPs in SNPforID and IISNP, respectively, had an MAF value lower than 0.3 in East Asians based on 1000 Genomes Project data. SNPs with a low MAF are less informative and may not be the best choice for forensic analysis in East Asians (Supplementary Fig.
1). Furthermore, since there are various populations within East Asians, it is unclear whether the existing allele frequency database is accurate for Koreans. It is important to accurately estimate allele frequencies of the population to reduce errors in forensic analysis [
16,
17]. Although several studies have selected forensic SNP marker sets for Koreans and provided allele frequency information [
20,
21], these panels are expected to be unsuitable for paternity testing because they consist of fewer than 50 SNPs, which are suggested to be needed for the analysis of trio cases [
8].
In the present study, we selected and tested the appropriate number of bi-allelic autosomal markers for paternity testing in Korean individuals. We considered difficult cases when choosing the number of markers. There were special cases where false inclusion occurred when the TF was a close relative of the AF [
29,
30]. These problems were solved by supplementing additional markers [
31,
32]. We aimed to solve these problems with only autosomal SNPs by selecting a sufficient number of loci and focus on the duo case because there are special cases where genotype of one of the parents is not available.
Of 352,228 SNPs, 200 candidates were selected from 8621 unrelated Korean samples after filtering processes. These markers were non-functional, and had a high MAF (≥ 0.49) and an FST (< 0.01) value between Ansan and Ansung. To minimize the effects of genetic linkage and LD, we selected only SNPs located far from each other with a low level of LD (r2 < 0.01) between different loci in the population. However, it was still not far enough to assume that these markers were transmitted independently. Thus, we calculated LRs by considering genetic distances from the genetic map of the East Asian population (Han Chinese in Beijing, China). Based on allele frequencies and genetic positions of 200 candidate SNPs, we randomly generated 10,000 families and calculated the LR for parentage. Based on our simulation results, we finally selected highly informative 160 SNP loci to remove falsely included cases. Using these final set of 160 SNPs, all 332,092 comparisons in real cases were determined for paternity and non-paternity.
In summary, we selected 160 SNPs for paternity testing based on allele frequencies in Koreans. Our study showed that using 160 autosomal SNPs with an MAF close to 0.5 in paternity testing would be sufficient to remove the risk of false inclusion. Considering that SNP has a lower mutation rate, which reduces the probability of false exclusion, our final set of SNPs seems to be useful for paternity testing.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.