Statistical analysis
The characteristics of the participants were presented as mean (SD) and median (quartile 1 [Q1], quartile 3 [Q3]) for continuous variables with normal and skewed distribution, respectively, and as frequency (percentage) for categorical variables. The differences of participant characteristics between baseline and follow-up were tested using paired t test and Wilcoxon signed rank test for continuous variables with normal and skewed distribution, respectively, and using the chi-squared test for categorical variables.
We used a cross-lagged panel design to investigate the bidirectional relationship between BMI and gut microbial features (Fig.
1B). Cross-lagged path analysis is a form of path analysis that examines reciprocal, longitudinal relationships among a set of inter-correlated variables [
23‐
25]. This method tested the effect of baseline gut microbiota on subsequent BMI (ρ1 in Fig.
1B) and the effect of baseline BMI on subsequent gut microbiota (ρ2 in Fig.
1B) simultaneously, adjusted for autoregressive effects. Before performing the cross-lagged path analysis, we performed the linear regression analysis and got the residual of the baseline and follow-up values of BMI, adjusted for potential confounders, including age, sex, smoking status, alcohol status, education, income, physical activity, and total energy intake and then standardized the residual into
Z-scores; gut microbial features were processed in the same manner with an additional adjustment for Bristol stool score. Pearson correlation coefficients of
Z-transformed BMI and gut microbial features at baseline and follow-up were calculated, adjusted for the time interval (years) between two time points. The cross-lag path coefficients (ρ1 and ρ2) showed in Fig.
1B were estimated simultaneously based on the correlation matrix. All parameters in the cross-lagged path analysis were estimated through constructing structural equation model by R package lavaan (version 0.6–8) [
26]. The validity of model fitting was evaluated by the standardized root mean square residual (SRMR) and comparative fit index (CFI) [
27].
We performed a principal coordinates (PCo) analysis based on Bray-Curtis dissimilarity using metagenomic data at the species level, and the first two PCo (PCo1 and PCo2) were obtained to reflect the β-diversity of the gut microbiota. We calculated α-diversity indices (Observed species, Shannon index, Simpson index, Pielou’s evenness) based on the relative abundance of the species by using the R package vegan (version 2.5-7) [
28]. We examined the temporal relationships between BMI and α-diversity, β-diversity, and individual microbes using the cross-lagged path analysis. The Benjamini-Hochberg (BH) method was used to control the false discovery rate (FDR). Given that high-dimensional tests were performed, associations with FDR < 0.25 were considered statistically significant for per-species test. The identified BMI-associated species were selected for subsequent analysis. The stratified analysis by sex for the association of BMI with the identified species was performed, and the heterogeneity in the effect sizes between females and males were tested using the Cochran-Q test [
29]. We further did sensitivity analysis to investigate the temporal relationship between WC and gut microbiota. We assessed the prospective associations between dietary factors (vegetable intake, fruit intake, fish intake, red and processed meat intake, and dairy intake) and identified gut microbes using the multivariable linear regression models, adjusted for age, sex, BMI, smoking status, alcohol status, education, income, physical activity, total energy intake, Bristol stool score, time interval, and corresponding baseline microbe abundance. Each dietary factor was divided into higher and lower groups based on the median value.
To replicate the results from the above GNHS participants, we used the repeated-measured fecal metagenome data available 1 year apart from 43 healthy participants aged 18–40 years in the HMP cohort [
9,
11]. We obtained the relative abundance of microbiota data through the R package curatedMetagenomicData (version 1.10.2) [
30], and the corresponding phenotype data from the dbGaP (
https://dbgap.ncbi.nlm.nih.gov/; study accession: phs000228.v4.p1) [
31,
32]. The relative abundances of the species were log-transformed before formal analysis. Since BMI was only available at baseline, we assessed the prospective association of baseline BMI with follow-up microbes using multivariable linear regression models, adjusted for age, sex, race (white/not white), time interval, and corresponding baseline microbe abundance. We used the meta-analysis with a random effects model and inverse-variance weights to integrate the results from GNHS and HMP cohorts, and assessed the heterogeneity between them using
I2 and Cochran-Q test [
29]. We considered the association between baseline BMI and follow-up microbes to be replicable if
pmeta<0.05,
pheterogeneity>0.05,
I2<50%, and the same direction of associations between the two cohorts. The meta-analysis and Cochran-Q test were conducted using the R package metafor (version 3.0-2) [
33].
In the GNHS, to explore the impact of long-term weight change on gut microbiota, participants were divided into four weight change patterns: stable normal (
n = 205), normal to adiposity (
n = 23), adiposity to normal (
n = 21), and stable adiposity (
n = 156). Participants who were underweight at either time point were excluded (
n = 21). Adiposity was defined as overweight or obesity in this study. According to the suggestion of Working Group On Obesity In China for Chinese populations [
34], underweight, normal weight, overweight, and obesity were defined as BMI<18.5, 18.5≤BMI≤23.9, 24≤BMI≤27.9, and BMI≥28, respectively. The influence of weight change pattern on follow-up microbial features (α-diversity: Observed species, Shannon index, Simpson index, Pielou’s evenness; β-diversity: PCo1 and PCo2; and microbes) was assessed using multivariable linear regression models, adjusted for age, sex, smoking status, alcohol status, education, income, physical activity, total energy intake, Bristol stool score, time interval, and corresponding baseline microbial features. We only included BMI-associated microbes in the analysis of the association between weight change pattern and gut microbes. The stable normal group was served as the reference group when conducting the above analyses. The potential collinearity among covariates was assessed by variance inflation factor (VIF) using R function vif in R package car (version 3.0-12), and VIF > 10 indicates collinearity among variables. After fitting 16 multivariable linear regression models for microbial features (10 identified BMI-associated microbes plus Observed species, Shannon index, Simpson index, Pielou’s evenness, PCo1 and PCo2), we obtained 16 VIF for each covariate and the ranges of them were as follows: age (1.175–1.187), sex (1.486–1.525), smoking status (1.336–1.343), alcohol status (1.167–1.180), education (1.306–1.321), income (1.384–1.408), physical activity (1.068–1.074), total energy intake (1.130–1.137), Bristol stool score (1.086–1.124), time interval (1.064–1.077), corresponding baseline microbial features (1.029–1.102), which confirmed that there was no collinearity among covariates.
We explored the associations between the above identified species and HOMA-IR using linear mixed-effect models by R package lme4 (version 1.1-27.1) [
35], adjusted for age, sex, smoking status, alcohol status, education, income, physical activity, and total energy intake. In the secondary analyses, we analyzed the association of identified species with fasting insulin, fasting glucose, and HbA1c using linear mixed-effect models, adjusted for the same covariates. We log-transformed insulin resistance-related phenotypes with skewed distribution (HOMA-IR, fasting insulin, and fasting glucose) before analysis. All phenotypes and microbes were then standardized into
Z-scores. We also assessed whether dietary factors (vegetable intake, fruit intake, fish intake, red and processed meat intake, and dairy intake) contributed to the insulin resistance-related phenotypes by using multivariable linear regression models, adjusted for age, sex, BMI, smoking status, alcohol status, education, income, physical activity, total energy intake, time interval, and corresponding baseline levels of insulin resistance-related phenotypes. Each dietary factor was divided into higher and lower groups based on the median value.
To investigate whether there were potential mediation effects of gut microbes on the association between adiposity and insulin resistance, we performed a mediation analysis using the R package mediation (version 4.5.0) [
36]. The baseline overweight/obesity status was served as the exposure, and follow-up gut microbes and insulin resistance-related phenotypes were served as mediators and outcomes, respectively (Fig.
1C). The corresponding baseline gut microbes and insulin resistance-related phenotypes were adjusted in the statistical models. The covariates included age, sex, smoking status, alcohol status, education, income, physical activity, total energy intake, Bristol stool score, and time interval. Before performing mediation analysis, we examined the prospective associations between weight group (underweight, normal weight, and adiposity) and insulin resistance-related phenotypes (HOMA-IR, fasting insulin, fasting glucose, and HbA1c) using multivariable linear regression models, adjusted for age, sex, smoking status, alcohol status, education, income, physical activity, total energy intake, time interval, and corresponding baseline levels of insulin resistance-related phenotypes.
One identified species
Lachnospiraceae bacterium 3 1 57FAA CT1 belonged to Lachnospiraceae which is involved in the production of short-chain fatty acid (SCFA) through fermenting plant polysaccharides [
37,
38]. In addition,
Lachnospiraceae bacterium 3 1 57FAA CT1 has phylogenetic similarity to known butyrogenic gut bacteria [
39]. The butyrate-producing pathway PWY-5022 (4-aminobutanoate degradation V) was extracted from the pathway data obtained from functional profiling of the metagenomic samples. We used a linear mixed-effect model to examine the association between
Lachnospiraceae bacterium 3 1 57FAA CT1 and PWY-5022, adjusted for age, sex, smoking status, alcohol status, education, income, physical activity, total energy intake, and Bristol stool score. Additionally, the associations between
Lachnospiraceae bacterium 3 1 57FAA CT1 and other pathways in addition to PWY-5022 were also assessed using linear mixed-effect models, adjusted for the same covariates described above. We only included microbial pathways with a minimum detective relative abundance of 0.01% in at least 10% of the samples for this analysis. Unless otherwise noted, FDR<0.05 was considered statistically significant in this study.