Introduction

Type 2 diabetes mellitus (T2D) is a chronic multi-factorial disease underlying polygenic predisposition and environmental risk factors. The first wave of genome-wide association studies (GWAS) has led to the discovery of T2D susceptibility loci.1 Most T2D loci so far been identified were derived from populations of European ancestry with a few East Asian groups. Recent trans-ethnic fine-mapping identifies additional population-specific signals that increase the trait variance explained.2 However, identification of the genetic components of T2D has been limited by genetic heterogeneity across populations of ethnically diverse ancestries.

As an integrative analysis approach to T2D-risk prediction, genetic-risk scores (GRS) can be an efficient and effective method of constructing genome-wide risk measurements from GWAS findings.3, 4, 5 Recently, genetic-risk assessment studies for T2D have been reported by evaluating the predictive value of cumulative genetic scores.6, 7 However, predictive values of GWAS-derived genetic scores have been affected by ethnic-specific determinants of genotype frequencies, phenotypic effect sizes and disease incidence.8, 9, 10 Further development and improvement of a multi-single-nucleotide polymorphism (SNP) GRS led to our better understanding for complex disease prediction and prevention in an independent ethnic population. In this study, we evaluated genetic contributions of a GRS for T2D susceptibility in the Korean population.

Materials and methods

Study participants

The participants for this study were selected from the Korean Association Resource study. The Korean Association Resource study was collected through the Korean Genome Epidemiologic Study project. Total 10 038 people in the 40–69 years old living in Ansung and Ansan was started in 2001–2002. For more information about study were already reported.11, 12 On the basis of the criteria of WHO (World Health Organization) diagnosis guideline, 1042 subjects were included as T2D cases according to the following criteria: (1) past-medical and family history of T2D; (2) fasting plasma glucose ⩾7 mmol l−1 or plasma glucose 2-h after ingestion of 75 gm oral glucose ⩾11.1 mmol l−1; and (3) age of disease onset ⩾40 years. The inclusion criteria of normal controls (n=2943) were as follows: (1) no past-medical and family history of T2D; and (2) fasting plasma glucose<5.6 mmol l−1 and plasma glucose 2-h after ingestion of 75 gm oral glucose <7.8 mmol l−1. Procedures were in accordance with institutional guidelines and approved by an institutional review committee. Written informed consent was obtained from all study participants. The clinical characteristics of the subjects are shown in Table 1.

Table 1 Descriptive characteristics of T2D case and control subjects at baseline

SNP imputation and GRS construction

Imputation of genotypes to the 1000 Genomes phase I integrated variant call set release (version 3) in NCBI build 37 (hg19) as a reference panel was carried out using the IMPUTE (high imputation quality: proper info >0.5). Of these, we dropped SNPs with a posterior probability score <0.90, low genotype information content (info <0.5), Hardy–Weinberg equilibrium (HWE) (P<1 × 10−7), minor allele frequency (MAF)<0.01, and SNP missing rate >0.1. To construct GRSs using previous T2D susceptibility loci, we considered 72 GWAS loci from the GWAS catalog (https://www.ebi.ac.uk/gwas/). In addition to the SNP (rs2233580 in PAX4) specific to East Asians, we selected 54 SNPs having the same direction of effect as the original reports. A total of 55 SNPs were available in this study (Supplementary Table 1). Of the GRS-55 remaining loci, we further constructed the GRS-19 using 19 SNPs showing nominal significance and consistent direction of effect in the Korean population. On the basis of the Bonferroni threshold (significance, 0.05/69=0.0007), four variants (rs4402960, rs7756992, rs10811661 and rs1111875) remain significant after correction for multiple testing (data not shown). To maximize the statistical power of a genetic prediction model, all SNPs with a nominal P-value<0.05 were included in the following risk score analysis. The weighted GRS was calculated by multiplying the number of risk alleles for each individual by each β-coefficient obtained from the association statistics. In addition, the cumulative number of risk alleles was scored using an additive model (0 for homozygous of non-risk alleles, one for heterozygous of alleles and two for homozygous of the risk alleles for the effect allele).

Statistical analysis

In performing association tests, SNPs were analyzed with R (2.15.1) software package, PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/) and SAS programs (version 9.1; SAS institute Inc, Cary, NC, USA). T2D associations were tested by logistic regression analysis in an additive genetic model (1-d.f.) with or without adjustment for age, sex and body mass index as covariates. Quantitative traits for fasting plasma glucose and glycated hemoglobin (HbA1c) levels were tested by multiple linear regression analysis.

Results

The baseline participant characteristics in the Korean Association Resource study (n=3985) are listed in Table 1. According to T2D diagnostic criteria, we selected T2D cases (n=1042) and normal controls (n=2943), respectively. Demographic and clinical variables in T2D individuals were predominantly higher than the controls. Of non-diabetic subjects at baseline examination, 651 participants were found to be incident T2D cases (cumulative incidence proportion: 12.35%) in a prospective 10-year follow-up study. To evaluate comprehensive risk assessment for previous GWAS loci, we constructed a GRS-55 by summing 55 risk alleles available from the 1000 Genomes-based imputation data. In addition, a GRS-19 was constructed using 19 association signals based on nominal significance and consistent direction of effect in the Korean population.

In mean difference tests between the GRS-19 and the GRS-55, mean value of weighted GRS-19 (wGRS-19) was significantly higher than that of the wGRS-55 (wGRS-19: P=1.75 × 10−28 and wGRS-55: P=5.32 × 10−27) (Table 2). The distribution of the number of risk alleles in T2D cases shifted to the right (Figure 1a). Although there was no difference in C-statistics between stratified groups, the wGRS-19 was significantly associated with increased T2D risk (wGRS-19: P=6.11 × 10−32, odds ratio per risk allele=2.60, 95% confidence interval=2.22–3.04; wGRS-55; P=2.00 × 10−28, odds ratio per risk allele=2.02, 95% confidence interval=1.78–2.28) after controlling for area, age, sex and body mass index (Table 3). In a weighted quartile analysis, an increased number of risk allele was associated with an increased odds ratio for T2D risk (Figure 1b).

Table 2 Comparison of the mean number of risk alleles in the GRS-55 and the GRS-19
Figure 1
figure 1

Distribution of genotype score and cumulative effect of the wGRS-19. (a) Distribution of the number of risk alleles. Red and blue bars indicate T2D cases and controls, respectively. (b) Quartile-based odd ratios for T2D risk alleles. A full color version of this figure is available at the Journal of Human Genetics journal online.

Table 3 Associations of the GRS-55 and the GRS-19 with T2D risk

To assess improvement in risk prediction of the wGRS-19, we further evaluated cumulative incidence rates of T2D, fasting plasma glucose and glycated hemoglobin (HbA1c) levels in a longitudinal 10 year follow-up study. Incidence of T2D in wGRS-19 is associated with increase in the highest quartile (Figure 2a). The mean values for fasting plasma glucose, GLU60, GLU120 and HbA1c traits were calculated by quartile-based genotype-risk scores only in non-diabetic control subjects. These findings demonstrate that the wGRS-19 is associated with consistently additive risk effects in glycemic status (Figure 2b). However, compared with those of fasting glucose levels (n=5907), the number of valid observations (non-missing) is relatively small for calculating the P-values in 2 h glucose levels (n=5107), indicating a marginal improvement.

Figure 2
figure 2

Cumulative incidence rates of T2D over 10 years of follow-up. (a) The 10 year cumulative incidence rates of T2D according to the quartile-based genotype-risk score. (b) The 10 year cumulative incidence rates of FPG and glycated hemoglobin (HbA1c) levels according to the quartile-based genotype-risk score. FPG, fasting plasma glucose. A full color version of this figure is available at the Journal of Human Genetics journal online.

Discussion

Conventional GWASs focused on the effect of individual SNPs have been typically limited by disease causality and complexity underlying etiological heterogeneity.13, 14, 15 In recent years, polygenic-risk scores have generated much interest for assessing the explanatory power of risk variants in the clinical management and prevention of T2D.16 To evaluate population differences in cumulative risk allele load for T2D, we conducted a comprehensive genetic-risk assessment for previously reported T2D loci in the Korean population.

As a novel functional locus carrying a missense mutation, imputed SNP rs2233580 in the PAX4 gene is specific to the pathogenesis of T2D in East Asian populations.17, 18, 19 In SNP prediction analyses using functional assessment tools,20 rs2233580 was significantly associated with potentially damaging or deleterious functional impacts (FS score=0.98). Regulatory motif analyses demonstrated PAX4 enrichment for a SNP rs163184 in the KCNQ1 gene.21 Furthermore, we investigated whether risk alleles of the GRS-19 were population-specific in ancestral super-populations (African (AFR), American (AMR), East Asian (ASN) and European (EUR)) of the 1000 Genomes Project data.21 In contrast to completely monomorphic in AFR, AMR and EUR, the SNP (rs2233580) has ASN-specific low-risk allele frequency (RAF=0.09). In addition, the majority of GRS-19 variants were consistently found to be associated with differential risk allele distributions: relatively higher (9 loci, RAF ranged from 0.42 to 0.88) and relatively lower (four loci, RAF=0.28–0.64) than other populations (AFR, AMR and EUR) (Supplementary Table 2). These findings indicate that different fractions in cumulative risk allele loads of a GRS are associated with heterogeneities of allele frequency, effect size and population structure in genetic architecture underlying T2D risk.8, 22

Recently, cross-sectional case–control studies using a GWAS-derived GRS have indentified population genetic signatures to T2D susceptibility in Japanese,23 Chinese24 and Singapore25 populations. However, there still remain specific fractions in cumulative risk allele load of a GRS between ethnic groups in East Asia (Supplementary Figure 1 and Supplementary Table 3). Only an SNP (rs10811661) in CDKN2A/B was found to be associated with T2D risk as a shared variant. Notably, a SNP (rs7903146) in TCF7L2 known as a European-derived T2D susceptibility variant was not incorporated into a risk assessment model in this study. A genetic-risk evaluation in African Americans was similar to observations in European-derived populations, which is primarily driven by the rs7903146 variant of TCF7L2.26 Given substantial differences in MAF and LD (indirect mapping) between different ethnic groups, constructing a GRS based on population-specific signals might be important in improving the accuracy of risk prediction models.

This study has some limitations. Despite the modest genetic effects of GWAS loci so far identified, our study focused on common risk factors. It still has limited power to detect an effect of low-frequency alleles or rare variants. In addition, we did not account for epigenetic modifications and wide spread environmental factors in terms of missing heritability.27, 28, 29, 30 A recent study reported that a GRS comprising 14 central obesity loci is associated with increased T2D risk.31 Obesity is both a confounder and an intermediate factor for T2D development.32 Their seeming inseparability has led to multiple confounding effects on phenotypic variance. Nonetheless, a major strength of our study is that we examined a comprehensive genetic-risk assessment associated with changes in T2D risk and T2D-related traits over 10-years of follow-up.

In conclusion, we suggest that the GRS-19 was significantly associated with an increased risk of fasting plasma glucose, glycated hemoglobin and incident T2D in a prospective 10-year epidemiological cohort study in Korea. Further integrated analytical approaches on epistatic effects might expand new possibilities to improve genetic-risk prediction in evidence-based healthcare and public health.