Study population
The Scottish Diabetes Research Network Type 1 Bioresource (SDRNT1BIO) is a cohort of people clinically diagnosed as type 1 diabetes aged 16 years and older at recruitment. Questionnaire data and samples obtained on the day of recruitment were linked to clinical data from the Scottish Care Information Diabetes Collaboration electronic health record [
7]. The cohort comprises one third of the adult population with type 1 diabetes in Scotland and its representativeness has been described in detail [
8]. In Scotland, most people diagnosed with diabetes do not have auto-antibodies measured at diagnosis, and the clinical diagnosis of type 1 is based on age at diagnosis, time to insulin, any history of ketoacidosis, and exclusion of monogenic subtypes of diabetes.
Of 6127 people recruited into the study, there were 6076 with a clinical diagnosis of type 1 diabetes or latent autoimmune diabetes of adulthood after excluding those diagnosed with monogenic subtypes of diabetes (intentionally recruited for the cohort) or diabetes from other causes. Median age at onset was 21 (interquartile range 12 to 31) years, and median duration of diabetes at enrolment was 21 (interquartile range 11 to 31) years. In 120 of these individuals, more than 1 year had elapsed from diagnosis to starting insulin, ascertained from prescription records and questionnaire responses.
Laboratory measurements
Non-fasting serum samples were obtained at clinic visit in 5928 of those clinically diagnosed as type 1. The median time from sampling to freezing at − 80
∘C was 2 h 15 min (interquartile range 1 h 30 min to 3h 10 min). Plasma glucose measured in these blood samples was greater than 5 mmol/l in 88% of individuals. Non-fasting random C-peptide levels in people with type 1 diabetes are highly correlated with C-peptide levels after a mixed meal [
9]. C-peptide measurements on these samples were undertaken at the Exeter Clinical Laboratory using the Roche electrochemiluminescence assay [
10], with a lower limit of detection of C-peptide of 3 pmol/l. Autoantibodies to glutamic acid decarboxylase (GAD65), tyrosine phosphorylase-related protein 2 (IA2) and zinc transporter 8 (ZnT8) were measured at the Exeter laboratory, which participates in the Diabetes Antibody Standardisation Programme [
11].
Antibody titres exceeding the 97.5th percentile of the reference range were scored as positive. The 97.5th percentiles for GAD and IA2 are 11 and 7.5 World Health Organization (WHO) units/ml respectively. For ZnT8, the 97.5th percentile was 65 WHO units/ml in those aged up to age 30 years and 9.1 in those aged more than 30 years. Those with at least one antibody level above the reference range were classified as autoantibody-positive. These autoantibody measurements were used in combination with C-peptide measurements used to identify possible misdiagnosed cases of type 2 diabetes. The rationale for this was that in those who have residual beta-cell function as indicated by high C-peptide levels, we would expect autoantibodies to be still present if diabetes were caused by autoimmmune beta cell damage. This classification based on C-peptide levels and autoantibody status was validated by examining genotypic scores as described in the “
Results” section.
Genotyping
The cohort was typed with the Illumina Human Core Exome 24 1.0 chip at the Center for Public Health Genomics, University of Virginia. After qualitychecks, genotypes were available on
r
of those clinically diagnosed as type 1 diabetes. Genotypes were phased and imputed to the UK10K reference panel with the EAGLE algorithm [
12], and the imputed genotypes were filtered to exclude SNPs with minor allele frequency less than 0.02 or proportion of information extracted less than 0.7.
Calculation of genotypic scores from summary statistics
Genotypic risk scores for type 1 diabetes and type 2 diabetes were computed using the GENOSCORES platform described elsewhere [
13]. Univariate regression coefficients from publicly available meta-analyses [
14,
15] were supplemented with single-SNP scores for additional type 1 diabetes-associated loci reported in a further meta-analysis from which only one SNP per locus was published [
16]. Other meta-analyses that did not give the magnitudes and signs of the effects could not be used to calculate scores. Diabetes-associated SNPs were filtered at a threshold
p value of 10
−5. Locus-specific scores were generated for regions containing at least one SNP with
p value less than 10
−6 and separated from other filtered SNPs by a gap of at least one megabase. All other filtered SNPs were combined into a residual genome-wide score. The threshold
p value used to designate a genomic region as a diabetes-associated locus limits the number of regions thus designated but does not make any difference to the genome-wide score.
The GENOSCORES platform adjusts the locus-specific scores for linkage disequilibrium between SNP genotypes by premultiplying the vector of univariate SNP coefficients, obtained from summary GWAS results, by the generalized inverse of the correlation matrix between these genotypes. This correlation matrix was estimated from the 1000 Genomes European ancestry reference panel. The relative weights of the SNPs obtained by this procedure approximate the weights that would be obtained by fitting a multivariate regression model to the individual-level data. In principle, this method should capture additive effects across each genomic region, but not interaction effects between alleles at the same locus (dominance) or different loci (epistasis).
After restricting to SNPs that were contained in the type 1 Bioresource genotype dataset, this procedure generated 41 locus-specific scores for type 1 diabetes and 60 locus-specific scores for type 2 diabetes. There were five risk loci that were common to both types of diabetes—BMP8A, HLA region, CENPW, ASCC2 and BCAR1/CTRB1/CTRB2. Although the HLA region is not generally considered an established risk locus for type 2, it was included as a locus-specific score based on the criterion of at least one SNP with p value less than 10−6. For each diabetes type, locus-specific scores and the residual genome-wide score were summed over loci to obtain the full genome-wide score.
For type 1 diabetes, separate scores were constructed for HLA serotypes and for other SNPs in the HLA region. HLA serotypes at the DQB1 and DRB1 loci were imputed from the untyped SNPs using the HIBAG program [
17] with reference serotypes based on all European ancestry individuals in the 1000 Genomes panel [
18]. Alleles at these loci were grouped as follows: 0301 to 0304 at the DRB1 locus as DR3, 0401 to 0413 at the DRB1 locus as DR4 and 0302 to 0305 at the DQB1 locus as DQ8. Serotypes at these two loci were classified into six groups—DR3/DR4-DQ8, DR3/DR3, DR4-DQ8/DR4-DQ8, DR4-DQ8/X, DR3/X and X/X—to which score weights were assigned as published by Oram et al. [
19]. The HLA region-specific polygenic score for type 1 diabetes was regressed on this HLA serotype score, and the residuals from this regression were included in the analysis as the “HLA residual” score. The HLA region was excluded from the genome-wide score for type 2, so that the type 2 score could be used to discriminate between liability to type 2 and liability to type 1.
Each locus-specific score was scaled to unit standard deviation so that effect sizes could be compared. The genome-wide genotypic scores were standardized to have zero mean and unit standard deviation in White British participants without diabetes in UK Biobank.
Comparison with genotypic scores in UK Biobank participants
To validate the classification of diabetes type in the SDRNT1BIO cohort, we compared the distributions of genotypic scores in these groups with the genotypic scores in UK Biobank participants with and without diabetes whose self-reported ethnicity was White British. Of these participants, 16,427 reported that they had been diagnosed with diabetes (excluding those diagnosed only with gestational diabetes). One thousand four hundred thirteen of these were categorized as type 1 diabetes based on questionnaire report that they had been diagnosed before age 50 years and started insulin within a year of diagnosis, and the remaining 15,014 were categorized as type 2 diabetes.
Statistical analysis
For modelling associations of clinical covariates, age at onset was transformed by taking the square root, and C-peptide levels were transformed by taking the logarithm to base 10, and setting the log transform of values below detection threshold to zero. As preliminary analysis showed that these associations varied with age at onset, interaction terms with age at onset were included in these models.
To allow for relatedness, the relationship matrix was computed from the unimputed genotypes and the R package
GMMAT [
20] was used to fit linear mixed models for age at onset and for C-peptide. The model for C-peptide was adjusted for sex, age at onset and duration. Fitting this linear mixed model yields an estimate of heritability and allows genome-wide SNP association tests to be computed efficiently from the gradient of the log-likelihood (efficient score) at the null.