Study design and participants
Details of the study design and some major results of the CHNS have been described elsewhere [
14‐
18]. Briefly, CHNS is an ongoing, national, multipurpose, longitudinal, open cohort study initiated in 1989 and has been followed up every 2–4 years. The CHNS rounds have been completed in 1989, 1991, 1993, 1997, 2000, 2004, 2006, 2009, 2011, and 2015. The response rates had been estimated as about 80% from 1989 to 2006 at an individual level [
14]. By 2011, the provinces included in the CHNS constituted 47% of China’s population [
15].
The present study was based on 7 rounds of CHNS data from 1997 to 2015 (1997, 2000, 2004, 2006, 2009, 2011, and 2015), including a total of 94,532 person-waves (
n=32,572). We first excluded participants who were pregnant or <18 years old. Among the remaining participants (
n=25,960; including 76,500 person-waves), those with missing diabetes diagnosis (
n=146; including 1034 person-waves) or with only one survey wave (
n=8919; including 8919 person-waves) were further excluded. Therefore, a cohort based on 16,895 participants (66,547 person-waves) with two or more survey waves was identified, and the first survey round was considered as baseline. The characteristics of the included (
n=16895; including 66,547 person-waves) and excluded (
n=9065; including 9,953 person-waves) population were shown in Additional file
1: Table S1. Of the 16,895 participants, 444 participants with diabetes (fasting plasma glucose ≥7.0 mmol/L or glycated hemoglobin (HbA1c) ≥6.5% or previously diagnosed by a physician) at baseline, 103 with missing dietary protein data, and 88 with extreme dietary energy data (male: >4200 or <800 kcal/day; female: >3600 or <600 kcal/day) [
19] were further excluded. Finally, a total of 16,260 participants were included in the final analysis (Additional file
1: Fig. S1).
The institutional review boards of the University of North Carolina at Chapel Hill and the National Institute of Nutrition and Food Safety, and the Chinese Center for Disease Control and Prevention, approved the study. Each participant provided their written informed consent. The data and study materials that support the findings of this study can be found from the CHNS official website (
http://www.cpc.unc.edu/projects/china).
Dietary nutrient intakes
Dietary measurements in CHNS are described in detail elsewhere [
20]. Briefly, both individual and household level data were collected in each survey round. Dietary information was collected by 3-day dietary recalls in combination with using a 3-day food-weighted method to assess cooking oil and condiment consumption. The 3 consecutive days were randomly allocated from Monday to Sunday and are almost equally balanced across the seven days of the week for each sampling unit. Nutrient intakes were calculated using the China food composition tables (FCTs) [
21‐
23]. The accuracy of 24-h dietary recall designed to assess energy and nutrient intake has been validated [
20].
In the analyses, 3-day average intakes of dietary macronutrients and micronutrients in each round were calculated. Repeated 3-day dietary recalls may reduce the day-to-day variation of dietary intake and collect more complete food information. Moreover, all values of each nutrient in the analyses, if not specified, were presented as the cumulative averages, using all results from baseline to the last visit prior to the date of new-onset diabetes, or using all results among participants without new-onset diabetes, to represent long-term dietary intake status and minimize within-person variation.
Furthermore, in the present study, total protein was divided into specific sourced proteins. Food sources constituting these subtypes are shown in Additional file
1: Table S2 [
24,
25]. Of those, whole and refined grain, processed and unprocessed red meat, poultry, fish, egg, and legumes were the 8 major sources of proteins in this population.
The variety score of protein sources was calculated as the sum of total numbers of the 8 major food sources of proteins consumed in the appropriate quantities during the study period. The appropriate quantity for each major food source of protein means a window of consumption level (% of energy, shown in the results section) where the risk of new-onset diabetes is relatively lowest. In other words, if participants consumed one of the 8 major food sources of protein at an appropriate quantity during the entire study period, they will get one point, with a maximal score of 8. The variety score of protein sources may account for both types and quantity of proteins intake.
Assessment of blood pressure and other covariates
After the participants had rested for 5 min, seated blood pressure (BP) was measured by trained research staff using a mercury manometer, following the standard method. Triplicate measurements on the same arm were taken in a quiet and bright room. The mean systolic blood pressure (SBP) and diastolic blood pressure (DBP) of the three independent measures were used in the analysis.
Information on age, sex, urban or rural residents, region, education level, occupation, physical activity, smoking, and drinking status was obtained from the questionnaires at each follow-up survey. Height and weight were measured following a standard procedure with calibrated equipment. Body mass index (BMI) was calculated as weight (kg) by height squared (m
2). The level of physical activity was the product of the self-reported time spent in each activity multiplied by specific metabolic equivalent (MET) values [
26]. For all the nondietary covariates, we used the baseline year measurements [
27].
Statistical analysis
Population characteristics are presented as mean ± standard deviations (SDs) for continuous variables and proportions for categorical variables. Differences in population characteristics by dietary total protein intake quintiles (% of energy, <10.6, 10.6 ≤11.6, 11.6 ≤12.6, 12.6 ≤14.0, ≥14.0) were compared using ANOVA tests, Kruskal-Wallis test, or chi-square tests, accordingly.
The year of each participant’s first entry into the survey was considered as a baseline. The follow-up person-time for each participant was calculated from baseline until a first new-onset diabetes diagnosis (the middle date between the survey of the first diagnosis and the nearest survey before), the last survey round before the participant’s departure from the survey, or the end of the latest survey (2015), whichever came first. Participants were censored on the date of the last survey round before the participant’s departure from the survey, or the end of the latest survey. Incidence rates of new-onset diabetes, expressed as person-years, were calculated as the sum of follow-up years for participants.
Variables that are known to be traditional or suspected risk factors for diabetes or variables that showed significant differences among different protein levels were chosen as the covariates in the adjusted models. The relations of energy from total protein, proteins from different food sources (whole and refined grain, processed and unprocessed red meat, poultry, fish, egg, and legumes) with new-onset diabetes were estimated using Cox proportional hazards models. We built the isocaloric models to estimate the relative risk of new-onset diabetes. The rationale is that in the isocaloric models, the reduction of one macronutrient as a percent of total energy intake will be replaced by the same proportion of energy from another macronutrient. Model 1 included the adjustments with age, sex, body mass index (BMI), occupations at baseline, and cumulative average total energy intake. Model 2 included the adjustments in model 1 plus education level, region, smoking status, SBP, DBP, urban or rural residents, and physical activity (low, moderate, high) at baseline, as well as cumulative average fiber intake, sodium to potassium intake ratio and fat intake (% of energy). Moreover, mutual adjustments for a cumulative average intake of other sources of dietary proteins (% of energy) were further included for the association between proteins from different food sources and new-onset diabetes. To test the proportional hazards assumption, the significance of the interaction between exposures and log-transformed follow-up time was assessed, and no clear evidence of violation was detected. We also used restricted cubic splines (RCS) with 4 knots (20%, 40%, 60%, 80% of proteins intake) to express the potentially non-linear relationship of variety score of protein sources, energy from total protein and proteins from different food sources (whole and refined grain, processed and unprocessed red meat, poultry, fish, egg, and legumes) with new-onset diabetes with adjustments in model 2.
The appropriate quantity for each major food source of protein was determined by assessing different sources of proteins intake (% of energy) as categorical variables (quartiles or quintiles), and choosing the corresponding protein categories with the relatively lowest risk of new-onset diabetes. For each protein whose proportion of non-consumers was less than 20%, participants were divided into five groups according to quintiles of the protein intake, and quintile 1 was used as the reference. For each protein whose proportion of non-consumers was over 20%, consumers were divided into four groups according to quartiles and non-consumers were used as the reference. Moreover, possible modifications of the association between total protein intake, variety score of protein source, and new-onset diabetes were evaluated by stratified analyses and interaction testing.
There were missing values on BMI (n= 1522), smoking status (n= 49), SBP (n= 1476), DBP (n= 1477), education level (n= 344), occupations (n=152), and physical activity (n=180) at baseline, and those with missing values of covariates were excluded in the main analysis. Furthermore, multiple imputations were used to handle missing covariates in the sensitivity analysis.
We consider a two-sided P value<0.05 as statistically significant in all analyses. All statistical analyses were conducted using R version 3.6.1.