Introduction
Since the introduction of genome-wide association studies (GWAS), over 50 regions in the genome have been associated with susceptibility to type 1 diabetes [
1‐
6], but less research has examined the genetic determinants of age at diagnosis (AAD) of type 1 diabetes. One study, limited to specific SNPs in regions associated with type 1 diabetes, identified that the MHC, IL-2 (
IL2) and renalase (
RNLS) gene regions showed evidence of association with AAD [
7,
8]. However, the question has never been examined in a genome-wide fashion. Identification of genes associated with the initiation of the anti-islet autoimmunity, which is in most cases established by the age of 3 years [
9], could help to establish the earliest events in the disease process.
Here we aimed to identify genetic determinants of AAD in a more powerful approach using data from an extensive SNP panel, the custom array ImmunoChip [
4], by combining data from independent cases and affected sib-pairs (ASPs) to increase sample size and improve the genetic map through imputation.
Methods
We analysed data from six cohorts, independent cases from the UK Genetic Resource Investigating Diabetes (GRID) cohort [
2], the Northern Irish GRID (NI) cohort (used here for the first time), and the Finnish IDDMGEN (Tyypin 1 Diabetekseen Sairastuneita Perheenjäsenineen) and T1DGEN (Tyypin 1 Diabeteksen Genetiikka) cohorts [
10], in addition to ASPs from the Type 1 Diabetes Genetics Consortium (T1DGC) cohort [
11] (from north America, Europe, Asia and the UK) and the UK Warren cohort [
12]. The majority of individuals were diagnosed in childhood, with 92% diagnosed at less than 20 years of age. Genotyping (see electronic supplementary material [
ESM] Genotyping) was performed on 16,015 affected individuals, 8683 (54%) independent participants and 7332 (46%) ASPs (Table
1). Quality control was performed prior to analysis to minimise the risk of reporting false-positive results (ESM Quality control, ESM Figs
1–
3).
Table 1
Baseline characteristics and inclusion in the primary AAD analysis after quality control
GRID | UK | Independent | 6799 | 6736 | 8 (4, 11) | 164,953 |
IDDMGEN | Finland | Independent | 1111 | 1073 | 9 (5, 12) | 156,343 |
NI | NI | Independent | 524 | 509 | 7 (4, 10) | 156,343 |
T1DGEN | Finland | Independent | 249 | 249 | 16 (10, 24) | 156,343 |
Warren | UK | ASP | 907 | 839 | 10 (5, 15) | 156,343 |
T1DGC | Asia | ASP | 960 | 919 | 10 (5, 14) | 167,537 |
T1DGC | Europe | ASP | 2521 | 2485 | 11 (6, 17) | 167,537 |
T1DGC | USA | ASP | 2593 | 2544 | 8 (4, 13) | 167,537 |
T1DGC | UK | ASP | 351 | 342 | 8 (4, 11) | 167,537 |
Total | All | All | 16,015 | 15,696 | 9 (5, 12) | 150,381 |
Association discovery using ImmunoChip data
As we had a population that comprised related and independent cases, from a variety of cohorts containing individuals from multiple countries between and within cohorts, it was crucial to account for population structure, to avoid reporting spurious associations. We did this by performing an inverse-variance weighted meta-analysis [
16], in which we stratified the samples by cohort and examined the effect of each SNP on log
e AAD (assuming an additive mode of inheritance), adjusting for sex and the top five principal components within the cohort to account for population structure.
However, in cohorts with related individuals, standard principal components analysis may not correctly identify the population structure, as the population-level clusters are confounded by the relatedness between individuals. Therefore, in these cohorts, we used principal components analysis in related-samples, PC-AiR [
17], which estimates principal components by identifying a subset of genetically dissimilar individuals and performs principal components analysis on this subset, before using these principal components to estimate the ancestry of the remaining individuals in the cohort. This was performed using the GENESIS R package (
https://www.bioconductor.org/packages/devel/bioc/html/GENESIS.html, version 2.2.7) [
18]. We applied a variance-components model using the GenABEL R package (
http://www.genabel.org/packages/GenABEL, version 1.8-0, Grammar-Gamma method) [
19] to analyse the effect of each SNP in cohorts of related individuals, which takes into account relatedness between individuals. ESM Fig.
4 provides a schematic overview of the association discovery meta-analysis procedure.
We also performed the association discovery analysis using an alternative approach. First, we fitted a linear mixed model [
20] with the log
e AAD as the outcome, adjusting for sex as a fixed effect and including random effects for cohort, country and family identifier. We then used the residuals from the linear mixed model as the outcome variable and tested the association of each SNP using a linear regression model. We called this approach the ‘residual-based model’, and it has been proposed by Aulchenko et al [
21] to make genome-wide analysis for related individuals possible. Since our aim was eventually to fine-map statistically significant regions, the advantage of the residual-based model is that the residuals can also be used as the outcome variable in a fine-mapping analysis for continuous traits. ESM Fig.
5 illustrates the steps of this second approach used for variants discovery.
SNPs were declared to be associated with AAD if the
p value was less than a genome-wide significance threshold of 5 × 10
−8. We also highlight regions associated with a false discovery rate (FDR) [
22] of less than 0.05, to identify the regions next most likely to influence AAD. In this analysis, we used a stringent definition, removing the MHC region before calculating the FDR, as including all the highly associated SNPs from the MHC can inflate the threshold below which SNPs are declared to be associated, increasing the probability of reporting false-positive results.
To add further evidence to the detected associated SNPs, we combined cases with 19,510 control individuals (ESM Table
1) and performed an inverse-variance weighted meta-analysis across cohorts, examining the effect of the SNPs on risk of type 1 diabetes overall and in those who were diagnosed at less than 5 years of age. Cohorts of independent individuals were analysed by fitting a logistic regression model adjusting for the top five principal components and examining the effect of the SNP of interest on risk of type 1 diabetes. Cohorts of related individuals were analysed using a generalised linear mixed model association test [
23], using the GMMAT R package (
https://www.hsph.harvard.edu/han-chen/software/, version 0.7-1), adjusting for the top five principal components as fixed effects, and using a kinship matrix to define the covariance structure of the random effect included in the model. We present ORs for the SNPs associated with AAD for their association with type 1 diabetes overall and for those diagnosed at less than 5 years of age, to compare the direction of effect between analyses, with a consistent direction of effect adding further evidence that the association was genuine.
Discussion
In the first ImmunoChip analysis examining the AAD of type 1 diabetes, we found, as expected, that the MHC was the major genetic influence, while the 6q22.33 region was a second associated region. We performed fine-mapping across the 6q22.33 region, which contains the adjacent
PTPRK and
THEMIS genes, and has never previously been associated with AAD [
7,
8] or type 1 diabetes, but has been reported to be associated with susceptibility to other autoimmune diseases [
33‐
35].
We identified two haplotypes in the region that were associated with younger AAD. These haplotypes showed evidence of influencing type 1 diabetes susceptibility in those diagnosed at less than 5 years of age but not for other age groups. This implies that the region impacts on risk of type 1 diabetes at a young age but not once the immune system is more fully developed. In an era of increasing sample sizes, this study highlights the benefit of refining a phenotype in order to identify SNPs associated with a subset of individuals who develop a disease. In this case, we highlight a region associated with early-diagnosed type 1 diabetes, but this approach can be applied to heterogeneous diseases to more accurately identify the main genetic determinants in a particular subset. Analyses of the immunology and pancreas histology of type 1 diabetes do reveal distinct autoimmune features in children diagnosed under age 5 years [
36]. Genetic findings such as ours will help to identify the key cells and tissues involved, pointing, in this case, to the thymus being particularly important in early, aggressive disease.
PTPRK and
THEMIS are both important for transition of double-positive (CD4
+, CD8
+) thymocytes to single-positive thymocytes [
37,
38], and there is a reduction in number of mature CD4
+ T cells in mice that have both genes knocked out, over and above the effect that each gene has independently, indicating that they are both vital to thymopoiesis [
39]. Chromosome conformation capture analyses have identified the
PTPRK promoter as a target of disease-associated sequences [
40], supporting its candidacy as the causal gene for AAD.
The 6q22.33 region has been associated with other autoimmune diseases: the index SNP for coeliac disease is rs55743914 [
33], which is contained in group 2 in our analysis, and the minor allele is associated with increased risk of coeliac disease and also younger AAD for type 1 diabetes. The secondary signal for coeliac disease (rs72975916) is contained in group 3, and the minor allele is associated with reduced risk of coeliac disease and older AAD for type 1 diabetes, so the directions of effect between AAD of type 1 diabetes and coeliac disease are consistent.
However, the lead SNP for multiple sclerosis, rs802734 [
35], which is contained in group 2 in this analysis, has the opposite direction of effect (the minor allele being protective against multiple sclerosis). This signal in multiple sclerosis was not replicated in a larger ImmunoChip analysis [
41], and hence the risk of multiple sclerosis at this SNP may also depend on another factor, for example age at onset. Just one signal was detected in the region as associated with Crohn’s disease, rs9491891 [
42], which is contained in group 3, with a direction of effect also opposite to that for type 1 diabetes and coeliac disease. The age at onset of multiple sclerosis and Crohn’s disease in the cohorts in these analyses is older than the AAD of the type 1 diabetic individuals in our dataset, while there can be a long delay in coeliac disease between onset and diagnosis [
43], so it may be that the difference in effect direction could be to do with a changing immune system with age.
Another possibility is that the SNPs affecting early-diagnosed type 1 diabetes are affecting a different pathway, tissue or cell type from the same SNPs that have the opposite effect in multiple sclerosis and Crohn’s disease; that is, an increased level of a protein in one cell type might increase the risk of type 1 diabetes, while the increase in that same protein in a different cell type might protect against multiple sclerosis or Crohn’s disease. There is no evidence that the 6q22.33 region is associated with age at onset of multiple sclerosis or Crohn’s disease, although there are very few individuals in these analyses who were diagnosed in childhood [
35,
42] and it is difficult to assess in coeliac disease given the time between onset and diagnosis [
43]. We hypothesise that there is co-localisation in the region between AAD of type 1 diabetes and coeliac disease, given the similar genetic risk variants and also the fact that individuals diagnosed with type 1 diabetes at a young age are more likely to have coeliac disease [
44]. Our analysis offers a genetic explanation for this phenomenon.
There was some evidence (FDR <0.05) of an association with AAD at one other region, 1q24.3. This region contains the
FASLG gene and has been shown to be associated with type 1 diabetes itself [
5]; therefore it might be involved in a pathway that acts early in the disease course of type 1 diabetes, leading to the anti-islet autoimmunity that we now know is established in most cases by the age of 3 years [
9,
45].
A potential limitation of our study is that the majority (92%) of individuals with type 1 diabetes were diagnosed at less than 20 years old, and it is unlikely that we have identified all the variants associated with the AAD of type 1 diabetes. However, there is scope to perform a similar analysis in a population with more individuals diagnosed at over 18 years of age when data are generated in the future. Finally, some caution should be taken when interpreting the association at 1q24.3 (near FASLG), as the association did not reach a stringent genome-wide significance.
In conclusion, we have identified a novel AAD region at 6q22.33, as well as confirmed the well-established association of the MHC. The two risk haplotypes at 6q22.33 show evidence of association with type 1 diabetes in individuals diagnosed at less than 5 years of age and might thus guide therapeutic strategies in those with early-diagnosed type 1 diabetes.
Acknowledgements
We gratefully acknowledge all individuals who provided biological samples or data for this study. This research uses resources provided by the T1DGC, a collaborative clinical study sponsored by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), the National Institute of Allergy and Infectious Diseases (NIAID), the National Human Genome Research Institute (NHGRI), the National Institute of Child Health and Human Development (NICHD) and the JDRF, and supported by grant U01 DK062418 from the US National Institutes of Health. Further support was provided by grants from the JDRF (9-2011-253 and 5-SRA-2015-130-A-N) and the Wellcome Trust (091157 and 107212) to the Diabetes and Inflammation Laboratory at Oxford and Cambridge Universities.
We obtained DNA samples from the British 1958 Birth Cohort collection, funded by the UK Medical Research Council and the Wellcome Trust. This study makes use of data generated by the Wellcome Trust Case Control Consortium, funded by Wellcome Trust award 076113; a full list of the investigators who contributed to the generation of the data is available from
www.wtccc.org.uk/. We thank the University of Virginia for their support in genotyping samples from the Northern Irish GRID cohort, the IDDMGEN and T1DGEN cohorts from Finland and the Warren cohort using the T1DGC grant (U01 DK062418) and JDRF grant (JDRF 9-2011-530).
We thank N. J. Cooper for allowing us to use the imputation map generated to examine AAD variants across the entire genome.