Study design
We designed a prospective case-cohort study within the population-based Monitoring of Trends and Determinants in Cardiovascular Disease (MONICA)/Cooperative Health Research in the Augsburg Region (KORA) Augsburg cohort study (1984–2002) [
19]. As part of the international WHO MONICA project, three independent cross-sectional population-based studies (surveys) covering the city of Augsburg (Germany) and two adjacent counties were conducted in 1984/85 (S1), 1989/90 (S2) and 1994/95 (S3) to estimate the prevalence and distribution of cardiovascular risk factors among individuals aged 25 to 64 (S1) or 25 to 74 years (S2, S3). The study complies with the declaration of Helsinki. Approval was obtained by local ethic committees and informed consent was given from all patients. The total number of participants was 13,427 (6,725 men and 6,702 women). All subjects were prospectively followed within the framework of the MONICA/KORA studies [
20]. The present study was restricted to subjects aged 35 to 74 years at baseline, since the incidence of type 2 diabetes is low in younger subjects. Altogether 10,718 persons (5,382 men and 5,336 women) of this age range participated in at least one of the three baseline surveys. After exclusion of 1,187 subjects with missing blood samples and 1,595 participants with prevalent type 2 diabetes, incident diabetes other than type 2 diabetes (e.g. type 1 or secondary diabetes), with self-reported, but not validated incident type 2 diabetes, without follow-up information or with a follow-up time of <1 year, the source population for the present study comprised 7,936 subjects (3,894 men and 4,042 women).
From the source population, a random sample was selected stratifying by sex and survey leading to a subcohort of 1,885 participants. After exclusion of subjects with missing DNA samples and missing values for risk factors, the final subcohort included 1,687 subjects (910 men, 777 women).
Additionally, all incident type 2 diabetes cases in the source population were selected, including subjects for whom the treating physician clearly reported the diagnosis or for whom the diagnosis was mentioned in the medical records or who were taking antidiabetic medication. The number of incident type 2 diabetes cases until December 31st, 2002 was 555 (329 men, 226 women). After exclusion of subjects with incomplete information on relevant variables, the present study including the subcohort and incident type 2 diabetes cases, was based on 2,067 participants (307 men, 191 women with incident type 2 diabetes; 835 men, 734 women without incident type 2 diabetes). Mean follow-up time (± SD) was 10.1 (± 4.9) years. The final stratum-specific sample sizes of this subcohort were used together with the stratum-specific sizes of the source population to compute sampling fractions, and the inverse of the sampling fractions yielded survey- and sex-specific sampling weights.
All cross-sectional analyses concerning SNP frequencies and tests for departures from Hardy-Weinberg-equilibrium were performed in a random sample of the whole study population with available DNA (i.e. without prior to exclusion of subjects without follow-up information, prevalent diabetes, etc.). This sample included 1,968 subjects (1,069 men, 899 women).
Selection and genotyping of polymorphisms
For the SNP selection, the National Center for Biotechnology Information SNP database dbSNP Build 124 was used [
21]. SNPs were chosen on the basis of density, frequency and occurrence in or near functional regions like exons and hypothetical promoter regions and hypothetical transcription factor-binding sites. In addition, all up to then known haplotype tagging SNPs were taken into account.
PCR primers were designed by Sequenom's MassArrayAssayDesign program. Genotyping analyses were carried out by means of matrix-assisted laser desorption ionization-time of flight analysis of allele dependent primer extension products as described elsewhere [
22]. Genotyping calls were made in real time with MassArray RT software (Sequenom, San Diego, USA). Negative controls were included in all assays. To control for reproducibility of genotyping data, 12.5% of randomly selected samples were genotyped in duplicate. The discordance rate was 0.3%. Each SNP was tested for departures from Hardy-Weinberg-equilibrium by means of a chi-square test or Fisher's exact test depending on allele frequency.
Assessment of demographic, lifestyle and clinical characteristics
Standardized interviews were conducted by trained medical staff (mainly nurses) to assess information concerning sociodemographic variables, smoking habits, leisure time physical activity level and alcohol consumption. In addition, participants underwent a standardized medical examination and a nonfasting venous blood sample was obtained. Detailed information on all survey methods has been described elsewhere in detail [
23]. TC and HDL-C were measured by enzymatic methods (CHOD-PAP, Boehringer Mannheim, Germany). HDL-C was precipitated with phosphotungstic acid and magnesium ions.
Statistical analysis
Means or proportions for baseline demographic and clinical characteristics were computed using the SAS procedures SURVEYREG or SURVEYFREQ which estimate standard errors appropriate to the sampling scheme. Tests of differences between subjects with and without incident type 2 diabetes were based on these procedures. In case of non-normality, tests were carried out with log-transformed variables and results were presented as geometric means with antilogs of standard errors of the log means.
Cox proportional hazards regression analysis was used to assess the association between polymorphisms within the
TLR4 gene and incident type 2 diabetes. Due to the case-cohort design, standard errors were corrected using a "sampling weight" approach developed by Barlow (1994) [
24]. Since sex-related differences seem to play a role in the development of diabetes [
23,
25], all analyses were done separately for men and women and carried out for each
TLR4 SNP with a multivariate-adjusted model including age, body mass index (BMI), systolic blood pressure (SBP), TC/HDL-C, as well as the categorical variables survey, smoking status (never smoker, former smoker, current smoker), alcohol consumption (men 0, 0.1–39.9, ≥ 40 g/d; women 0, 0.1–19.9, ≥ 20 g/d) and physical activity (inactive vs. active, i.e. regular physical activity of ≥1 hour/week in both summer and winter). This model was respectively notated as "main effect model". To assess whether the impact of
TLR4 variants on incident type 2 diabetes was modified by cholesterol levels, interaction terms of
TLR4 variants and TC/HDL-C were additionally included to the main effect model ("interaction effect model"). Hazard ratios are presented with their 95% confidence intervals. P-values are based on robust variance estimates using the Barlow approach.
As measures for pairwise linkage disequilibrium (LD) between each pair of SNP loci, Lewontin's disequilibrium coefficient D' and the squared correlation coefficient were calculated. Haplotype reconstruction was performed within blocks of high D' using the expectation-maximization algorithm haplo.em [
26]. To avoid large reconstruction errors resulting from missing data, haplotype estimation is based only on subjects with complete genotype information. Due to the study design, haplotype estimation for analysis of incident type 2 diabetes had to be performed separately for cases and non-cases. For association analysis within the population-based subcohort, no distinction had to be made for haplotype estimation. Haplotypes with frequencies <1% were collected into a separate group of rare haplotypes ("haplo rare"). The most frequent haplotype was used as the reference category. The effect of haplotypes on incident type 2 diabetes was assessed in an analogous way as for single SNPs. Due to the continuous coding of the expected number of haplotypes, an additive effect had to be assumed in haplotype association analysis.
The global significance level of 5% was corrected for the number of independent tests following the Bonferroni procedure. The number of independent tests was calculated as the number of effective loci obtained through spectral decomposition of the correlation matrix of all SNPs analyzed [
27]. Therefore, the significance level for single tests was reduced to α = 0.01, corresponding to an overall significance level of α = 0.05. All statistical analyses were performed using the statistical package SAS Version 9.1 (SAS Institute, Cary, NC) and the statistical analysis software package R, Version 2.4.1 [
28].