Cohorts and clinical risk factors
We obtained data from 437,192 participants of British ancestry (as indicated by the data field “ethnicity background”) from the UK Biobank [
12]. Among these participants, 21,102 (4.8%) were reported to have been diagnosed with diabetes at the time of recruitment, as indicated by self-report of a physician-made diagnosis. Among these individuals with type 2 diabetes, prevalent CHD was determined by the criteria proposed by Khera et al. based on medical history of myocardial infarction or coronary revascularization [
10]: 1898 (9.0%) had clinical records in at least one of the data fields with International Statistical Classification of Diseases and Related Health Problems (ICD)-9 codes of 410, 411.0, 412, or 429.79, or ICD-10 codes of I21, I23, I24.1, or I25.2, or procedure records in at least one of the data fields with Office of Population Censuses and Surveys (OPCS)-4 codes of K40.1–K40.4, K41.1–K41.4, K45.1–K45.5, K49.1–K49.2, K49.8–K49.9, K50.2, K75.1–K75.4 or K75.8–K75.9.
In order to test the association of the CHD PRS with CHD risk factors, we created binary variables for hypertension, lipids, poor glycemic control, obesity and smoking using clinically-relevant thresholds. Doing so enabled comparison of effect sizes of the CHD PRS across risk factors. Specifically, we first obtained systolic blood pressure (SBP), diastolic blood pressure (DBP), low density lipoprotein (LDL) and triglyceride (TG) levels from these same individuals. Following the 2019 Standards of Medical Care in Diabetes recommended by the American Diabetes Association (ADA). Then, we defined systolic hypertension as SBP ≥ 140 mmHg, diastolic hypertension as DBP ≥ 90 mmHg, high LDL as LDL ≥ 1.8 mmol/L and hypertriglyceridemia as TG ≥ 5.6 mmol/L [
13]. Active usage of antihypertensive or lipid-lowering drugs may control blood pressure or blood lipid levels below these clinical cutoffs. Therefore, we further defined hypertension as having either systolic hypertension, diastolic hypertension, or taking antihypertensive drugs; and hyperlipidemia as having either high LDL, hypertriglyceridemia, or taking lipid-lowering drugs. Active medications were determined based on the data field “medication for cholesterol, blood pressure or diabetes”. Poor glycemic control was defined as having hemoglobin A1c (HbA1c) test level ≥ 8.0% (64 mmol/mol), corresponding to a less stringent HbA1c goal appropriate for patients with long-standing diabetes [
13]. Obesity was defined as having a BMI ≥ 30 kg/m
2. To generate a smoking history binary outcome, we used the data field “ever smoked”. We determined whether a sample had a family history of heart disease based on whether heart disease had been reported in any parent or sibling at the time of recruitment as indicated by data fields under the category “family history”. The UK Biobank has not quantified atherosclerotic burden.
The MCCD cohort was established between 2013 and 2015, by identifying patients undergoing coronary angiograms for clinical diagnosis or treatment who also had a diagnosis of type 2 diabetes by ADA criteria [
14], which include fasting plasma glucose ≥ 7.0 mmol/L, or 2-h plasma glucose ≥ 11.1 mmol/L during a 75-g oral glucose tolerance test, or HbA1c ≥ 6.5% (48 mmol/mol), or random plasma glucose ≥ 11.1 mmol/L in patients with classic symptoms of hyperglycemia. Each individual underwent a clinically indicated coronary angiogram at one of two McGill University teaching hospitals: the Jewish General Hospital and the Royal Victoria Hospital. Blood pressure, blood lipid levels and anthropometric indices were measured at recruitment. Systolic hypertension, diastolic hypertension, high LDL, hypertriglyceridemia and obesity were defined using the same criteria described above for UK Biobank. Hypertension and hyperlipidemia were further determined in combination with self-reported active usage of antihypertensive drugs and lipid-lowering drugs. Self-reported current or previous smokers were regarded as having a smoking history. Participants who had at least one parent, sibling or child that had had a heart attack and/or congestive heart failure upon recruitment were regarded as having a family history of heart disease. All participants consented for participation in this study and the study was approved by the Research Ethics Board of McGill University.
Definition of atherosclerotic burden in the MCCD cohort
As performed previously [
15], we defined multivessel stenosis as having at least two lesions, each with ≥ 50% stenosis, influencing at least two of the four major coronary arteries (left main coronary artery, right coronary artery, left circumflex artery and left anterior descending artery). Each stenotic lesion was graded by a board-certified cardiologist who had additional training in angiography. Participants with a saphenous vein graft were considered to have multivessel stenosis. We also classified participants by the number of stenotic lesions (defined as lumen occlusion ≥ 50% in one of the four major coronary arteries): 0–1, 2, 3 and ≥ 4 lesions.
Genotyping, genotype imputation and calculation of CHD PRS
All participants in the UK Biobank, were genome-wide genotyped using Affymetrix arrays [
16] and their genotypes were imputed to the Haplotype Reference Consortium [
17]. Genotyping details for UK Biobank have been described previously [
16].
In the MCCD cohort, DNA was extracted and genome-wide genotyped using the Axiom Biobank array at the McGill University and Genome Quebec Innovation Centre. We excluded 14 samples with a genotyping call rate below 97.5%. We selected 541,272 markers that matched the human genome reference GRCh37 (hg19) from the 686,052 genotyped markers and used these markers for genotype imputation. We conducted pre-phasing and imputation using EAGLE2 [
18] and PBWT [
19] respectively, on the Sanger Imputation Service online computation platform (
https://imputation.sanger.ac.uk. Accessed February 14, 2019). We chose the Haplotype Reference Consortium reference panel r1.1 [
17] as reference since it has the largest set of haplotypes to enable imputation.
We next generated the CHD PRS, as developed by Khera et al. [
10] using LDpred [
20] to derive a CHD PRS for each sample in both cohorts. After imputation, we selected autosomal markers having an information quality value (imputation INFO score) > 0.3. 6012,299 (90.7%) in the diabetic UK Biobank cohort and 6262,956 (94.46%) in the MCCD cohort among the 6630,150 CHD PRS SNPs were genotyped or imputed and none of these SNPs contained missing genotypes. Since the CHD PRS does not contain any DNA polymorphisms with ambiguous strands (A/T or C/G), information from all imputed markers was utilized. We standardized the derived CHD PRS to have a mean of zero and a standard deviation of one in the two cohorts, respectively.
Ethnicity estimation in the MCCD cohort
Owing to different patterns of linkage disequilibrium, allele frequency and genetic architecture, prediction in a different population other than the population on which the PRS was trained generally impairs accuracy [
21]. While we retrieved a British-only diabetic population in the UK Biobank, the MCCD cohort contained samples of mixed ancestries. To define the ethnicity of each participant in the MCCD cohort, we first selected a representative subset of 162,811 SNPs from the genotyped and/or imputed autosomal markers. Selection of these genetically independent markers was performed by the linkage disequilibrium (LD)-based pruner implemented in PLINK version 1.9 [
22] with argument–
indep-
pairwise 50 5 0.5. We next retrieved whole-genome genotyping information from 1668 participants of the 1000 Genomes Project with defined ancestry: 661 Africans, 504 East Asians and 503 Europeans [
23]. The same 162,811 SNPs were used in these individuals. We combined the 352 samples in our study with the 1668 reference samples and performed principal component analysis using R package SNP Relate version 3.8 [
24]. We assigned putative ancestry (European/Non-European) to each sample based on overlap with the corresponding population. Our primary analyses included people of European ancestry since this was the largest population cluster. All analyses were then repeated including individuals of non-European ancestry.
Association study of CHD PRS with CHD and traditional clinical risk factors
In the UK Biobank cohort, we first tested the association between CHD PRS and CHD amongst individuals with type 2 diabetes using a logistic regression model adjusting for fixed effects of age and sex. In both the UK Biobank and the MCCD cohorts, we then tested for associations between CHD PRS and traditional clinical risk factors for CHD. We adopted logistic regression models to test for associations between CHD PRS and hypertension, systolic hypertension, diastolic hypertension, hyperlipidemia, high LDL, hypertriglyceridemia, poor glycemic control, obesity, smoking and family history of heart disease respectively. For continuous traits, we also derived the standardized beta coefficients using linear regression models. Tests performed on the diabetic UK Biobank cohort were adjusted for sex and age; in the MCCD cohort, analyses were adjusted for sex, age and hospital of recruitment; analyses separately by sex were also performed.
Analysis of CHD PRS and atherosclerosis in the MCCD cohort
We performed logistic regression using multivessel stenosis as the outcome, as well as ordinal logistic regression using graded atherosclerosis severity based on the number of atherosclerotic lesions as the outcome (described above). Both analyses were performed among samples of putative European ancestry and repeated using all samples using sex, age and hospital of recruitment as covariates. To assess potential hospital-based effects we repeated analyses separately for each hospital. To better address the effect of CHD PRS on samples recruited at different hospitals, we meta-analyzed the coefficients of the CHD PRS in the aforementioned logistic and ordinal logistic regression models using a linear mixed-effect model implemented in the
rma.uni function of the R package metafor version 2.0-0 [
25].