Introduction
Type 2 diabetes and its associated complications pose a major global healthcare burden. It is estimated that 552 million people worldwide will be affected by diabetes by the year 2030 and a majority of the affected will be Asians [
1,
2]. Due to an exponential population growth, ageing population and increased rate of urbanisation, there is a rapidly emerging diabetes epidemic in Asia [
3]. Exploring the underlying genetic architecture of type 2 diabetes in Asian populations may improve our understanding of the pathogenesis of this devastating disease and aid in the development of novel, effective and safe therapeutic alternatives to reduce its risk.
Candidate gene and genome-wide association studies (GWAS) have identified ∼60 loci associated with type 2 diabetes and related traits (fasting glucose, fasting insulin and 2 h glucose), but a majority of the heritability remains unexplained [
4]. Most of these loci were initially identified from studies of European ancestry, with the exception of
KCNQ1,
UBE2E2,
C2CD4A/B,
PTPRD,
SRR,
SPRY2,
PEPD,
KCNK16,
MAEA,
GCC1-PAX4,
PSMD6 and
ZFAND3, which were first discovered in East Asian groups [
5‐
8]. The transferability of the risk variants of type 2 diabetes genes across different populations has not been consistently observed. In some cases, this discrepancy may reflect substantial differences in the affect allele frequency between race/ethnic groups. For example, in an earlier study of
TCF7L2, the strongest risk variant associated with type 2 diabetes in multiple European populations was found in the T allele of rs7903146 at the 5′ end of the gene [
9]. However, the frequency of the T allele of rs7903146 in Asian individuals was rather rare (minor allele frequency 2–2.5%). Instead, rs290487, located in a linkage disequilibrium (LD) block at the 3′ end of the gene, was associated with type 2 diabetes in Chinese individuals, suggesting a distinct genetic variation of
TCF7L2 in East Asians compared with that in Europeans [
10]. Similarly, in another study examining the transferability of type 2 diabetes loci from European studies in 10,718 individuals of Chinese, Malay and East Asian Indian ethnicities, there was evidence of a population-specific effect, allelic heterogeneity and LD variations at
CDKAL1 and
HHEX/IDE/KIF11 loci in all three cohorts [
11].
Recent studies suggest that there is a phenotypic distinction in the clinical presentation of type 2 diabetes between East Asians and Europeans [
1,
3,
12,
13], hence the importance of delineating the specific susceptibilities in each group. Thus, examining population-specific signals may help to detect the underlying causal variant(s) that affect(s) different populations and may provide insights into the functional biology that may differ among different ethnic groups [
14].
Fine mapping through dense genotyping of a locus of interest represents one approach for detecting population-specific variants. This approach has been successfully applied on a locus-by-locus basis for different diseases (e.g.
SORT1 at the 1p13 locus for myocardial infarction and LDL-cholesterol [
15] or
ZNF365D in Crohn’s disease [
16]). The Metabochip was developed to fine-map multiple metabolic and cardiovascular-related loci simultaneously in a cost-effective manner [
17]. Approximately 43,292 of the 196,725 single-nucleotide polymorphisms (SNPs) on the Metabochip, including many less-common and rare variants from the 1,000 Genome Project, were selected to fine-map the previously identified type 2 diabetes and related-trait loci.
Here we report the association results for these fine-mapping SNPs on the Metabochip in a case–control study of 4,535 unrelated Chinese individuals with type 2 diabetes and 4,800 non-diabetic controls.
Methods
Genotyping and quality control
Blood samples were obtained from participants and DNA samples were extracted from buffy coats using the QIAamp DNA mini Kit (Qiagen, Valencia, CA, USA). Genotyping with the Metabochip [
17] was performed at the Hudson-Alpha Biotechnology Institute in Huntsville, AL, USA and at the Medical Genetics Institute and the Clinical and Translational Science Institute of CSMC. Infinium technology [
18] was used for genotyping participants on the 200K Metabochip, following the manufacturer’s protocol (Illumina, San Diego, CA, USA). Genotypes were automatically called by GenCall, a clustering algorithm, in Genome Studio as an initial screen, and data from the two genotyping centres were combined before a trained specialist at CSMC manually reviewed the cluster plots.
SNPs with a missingness >2%, minor allele frequency (MAF) <1%, departure from the Hardy–Weinberg equilibrium (
p < 10
−7), located on the sex chromosomes or which were monomorphic were removed (ESM Table
1). While a total of 93,235 SNPs passed quality control (QC) measures, only those related to the 50 type 2 diabetes and related-trait loci on the Metabochip were analysed (
n = 18,638,
n = 9,055 after LD pruning).
Cryptic relatedness was defined at PI-HAT (PI) >0.12. Where there were family members in the cohorts, a majority of the related family members were first and second degree relatives. In these cases, only one individual from each family is represented in the current study.
Principal component analysis (PCA) using EIGENSTRAT was conducted to evaluate for potential population stratification among study participants and also to map the participants with the population panels from the International HapMap 3 dataset [
19]. Any participants who did not cluster together with HapMap Chinese samples were excluded for further association analyses. Ten Eigenvalues were generated and participants greater than 10 SD from any component were also excluded from the analysis.
In total, participants with a missingness >2%, excessive heterozygosity, cryptic relatedness (n = 1,324), sex mismatch (n = 151), missing identity numbers (n = 460), ambiguous diabetes status (n = 390) or population outliers (n = 199) as defined by PCA were removed, leaving 9,335 participants for analysis.
Discussion
We have demonstrated that a majority of loci associated with type 2 diabetes discovered in European populations appear to also serve as susceptibility loci for the same trait in the Chinese population. Of the 50 loci tested, 14 of the loci met our locus Bonferroni criteria and another 24 were nominally significant. Furthermore, we identified a total of seven novel ethnic-specific variants for type 2 diabetes in the Chinese population using a fine-mapping approach. Of particular interest, two independent SNPs lie at the 3′ end of the CDKAL1 gene. These latter data thus split the CDKAL1 gene into two loci, the 5′ end of which is seen in both Europeans and East Asians and the 3′ end of which appears to be a novel independent locus for type 2 diabetes in Chinese individuals.
Our most important finding may well be the identification of two peaks on
CDKAL1. All previously reported SNPs of
CDKAL1 in type 2 diabetes (rs7756992 [
26], rs7754840 [
27,
28], rs4712523 [
27,
29,
30], rs10946398 [
31], rs9465871 [
31,
32], rs4712524 [
5], rs9295474 [
11], and rs10440833 [
25]) lie within the 5′ end of the gene, and many of these SNPs are also observed in Chinese individuals [
11,
33,
34]. None of the previously reported SNPs of
CDKAL1 in type 2 diabetes lie within the 3′ end of the gene. Our finding was possible because
CDKAL1 was one of the five selected loci to be fine-mapped on the Metabochip [
17].
CDKAL1 catalyses a methyl-thio group, which possibly causes misfolding of proinsulin [
35] and inhibits pancreatic CDK5/p35 complex [
26], thereby altering beta cell function and insulin production. Earlier GWAS studies found variants at the 5′ of
CDKAL1 in individuals with impaired insulin secretion but the functional variant has yet to be determined. We therefore used an available database to discover whether the two novel SNPs at the 3’ end of
CDKAL1 (rs7773318 and rs9465994) were eSNPs [
24]. Though both rs7773318 and rs9465994 are neither eSNPs nor in LD with previously reported eSNPs (rs9460563, rs9460612, rs59633892, rs62404554, and rs10946439) on
CDKAL1, we note that eSNPs located at the 5′ end of
CDKAL1 are mostly
trans-acting regulators, while the eSNPs located at the 3’ end of
CDKAL1 are all
cis-acting. This observation supports the concept that SNPs in the 3’ end of
CDKAL1 regulate the expression of this gene.
In this study, although we chose a locus-specific Bonferroni correction (a less stringent statistical cut-off for association), we also performed a more stringent statistical analysis for unlinked markers. We found that five loci remained significant after correction for multiple testing; however, only one locus had a putative novel SNP in the East Asian population. This locus was later found through conditional analysis to be highly correlated with an SNP previously reported in a European population.
Of the 50 tested loci, 14 loci were significant in the Chinese population after locus Bonferroni correction. Collectively, a total of 38 loci (76%, 38/50 loci) transferred to the Chinese population with at least a nominal significance, highlighting a great deal of genetic homogeneity for type 2 diabetes between the European and Chinese populations.
In this study, 12 of the 50 loci were not observed to be significant. In comparison with earlier studies involving Chinese individuals, our result is similar (i.e. non-significant in both this study and other Chinese cohorts) for
NOTCH2 [
33],
SLC2A2 [
36],
WFS1 [
33,
37],
GCK [
38] and
HNF1A [
39,
40] but different (i.e. non-significant in this study but was significant in other Chinese cohorts) for
MTNR1B [
41‐
43],
GCK [
41,
42] and
SREBF1 [
44]. Comparisons could not be made for
KLF14,
TP53INP1,
CHCHD9/TLE4,
HMGA2 and
ZFAND6 as these genes were not tested in other Chinese cohorts. Although our result was not significant for
MTNR1B,
GCK and
SREBF1, the direction of effect was concordant with other Chinese [
41,
42] and European [
25,
45] studies for both
MTNR1B and
GCK, but was unavailable (no proxy) for
SREBF1.
Through conditional analysis, a total of seven potential secondary signals were identified. To illustrate ethnic specificity, we give an example for SNP rs11024184 on KCNQ1. Using HapMap, the allele frequency of the A allele is seen in 9.2% of East Asians, but in as many as 53.3% of Europeans. SNP rs11024184 lies 25 kb upstream of rs2237897 (the previously reported European SNP) on KCNQ1 and the two SNPs are neither in LD with each other nor on the same LD block. Furthermore, rs11024184 does not tag any other SNP in the region at r
2 > 0.8. Collectively, these data suggest this is an independent signal found in the Chinese population.
Comparing the Chinese and European populations, among the other six potential secondary signals, the minor allele frequency is similar for GLIS3 (rs12378556; rs10974438) but different for IDE/KIF11/HHEX (rs10882091) and IRS1 (rs2138157). For CDKAL1, the minor allele frequency is similar for rs9465994 but different for rs7773318.
There are several strengths to this study. First, this is a homogenous group of Chinese individuals, recruited at seven principal sites in Taiwan, with well-defined phenotype and ethnically matched controls. Second, to our knowledge, this is the first Metabochip study using fine mapping of type 2 diabetes and related traits in East Asians. Third, and most importantly, using this fine-mapping approach allows for the redefining of the association signals at previously established loci and the identification of a novel locus at the 3′ end of
CDKAL1, which to date is only observed in the Chinese population. There are also several limitations. The first is the disparity of the regions covered on the Metabochip. Some regions are more extensively fine-mapped than others, thus there is a higher probability and opportunity to uncover independent signals at these regions. Second, in the most recent report from the MAGIC (Meta-Analyses of Glucose and Insulin-related traits Consortium) and DIAGRAM (DIAbetes Genetics Replication And Meta-analysis) consortia, a number of additional type 2 diabetes loci have been identified in the European population [
46,
47]. We examined, in this report, the 50 loci known to be associated with type 2 diabetes or its related traits at the time of this investigation. Last, the Metabochip is a pre-designed genotyping array of cardiovascular and metabolic traits discovered in the European population. Thus, the Metabochip is designed to test for SNPs and loci only on the platform and is not designed to discover novel SNPs and loci not previously related to cardiovascular or metabolic traits in a genome-wide fashion.
In summary, we have identified a few ethnic-specific variants and demonstrated a novel independent type 2 diabetes locus at the 3′ end of CDKAL1 in the Chinese population. These findings provide initial clues to differences in the genetic architecture underlying type 2 diabetes among various ethnic populations.
Acknowledgements
We thank all the investigators and staff who contributed to this study by collecting the data used and presented in this manuscript. We are also grateful to the patients and their families for their participation.