Introduction
Metabolic dysfunction-associated steatotic liver disease (MASLD, previously known as non-alcoholic fatty liver disease (NAFLD) [
1], is increasingly recognized as a crucial contributor to cirrhosis, liver failure, and hepatocellular carcinoma (HCC) development [
2]. Recent studies have shown a significant increase in MASLD rates, with approximately 59% of HCC cases being associated with MASLD [
3]. Furthermore, with the growing incidence of MASLD, predictive models from various studies predict a 130% increase in MASLD-HCC cases by 2030 [
4]. Notably, 25–50% of MASLD-HCC cases emerge in patients without cirrhosis or at early fibrosis stages [
5]. This is the challenge in HCC surveillance, from the fact that abdominal ultrasound is not typically recommended for patients without cirrhosis [
2,
6,
7]. Therefore, alternative markers are needed to identify patients with an increased risk of progressing to HCC.
The pathogenesis of MASLD and MASLD-HCC is complex and influenced by multiple factors, including type 2 diabetes mellitus (T2DM), hypertension, dyslipidemia, alterations in host gene expression, and genetic and environmental factors [
8]. MASLD is also associated with various gastrointestinal diseases, such as gallstones, gastric cancer, and esophageal cancer, as well as with interactions involving intestinal bacteria [
9,
10]. Gut microbiome imbalance or dysbiosis is increasingly recognized as a key factor in MASLD. Acting as a ‘virtual metabolic organ,’ the gut microbiome influences liver function via the gut–liver axis, with components and metabolites such as lipopolysaccharides and bile acids directly impacting the liver through the portal vein [
11]. A substantial proportion of MASLD-HCC cases can develop even without cirrhosis, underscoring the need to understand these microbial-driven mechanisms that go beyond traditional risk factors [
12]. A prior study focusing on the Asian demographic has revealed that gut dysbiosis plays a significant role in MASLD [
13]. The study conducted a metagenome analysis comparing patients without NAFLD to those with NAFLD-cirrhosis and found that a combination of gut microbiome data and aspartate aminotransferase (AST) levels accurately distinguished early-stage cirrhosis [
14]. Furthermore, gut microbiota significantly influences metabolic interactions with host cells and their metabolic processes [
15]. Previous research focused on investigating the associations between the gut microbiome and liver tumor gene expression through transcriptomic profiling in patients with hepatitis B virus-related HCC. This research revealed that
Bacteroides,
Lachnospiracea incertae sedis, and
Clostridium XIVa were prevalent in HCC patients and correlated with a group of host genes, which involved in tumor immune microenvironment [
16]. Among host gene expressions, several genes such as
CALM1 and
RPL7p24 correlated with treatment method, cirrhosis, and tumor (BCLC) classification [
17], while expression of
GDF15, a member of the transforming growth factor–β (TGF–β) superfamily, was also associated with disease progression [
18]. Other studies confirmed transcriptomic profiles from peripheral blood mononuclear cells (PBMCs) by measuring gene expression levels with real-time PCR, showing that
DICER1,
GMPS,
NCOR1, and
BHLHE40 could serve as diagnostic and prognostic biomarkers for personalized HCC diagnosis [
19,
20]. Recent findings also highlight a bidirectional interplay where microbial metabolites can induce host epigenetic modifications, further influencing MASLD progression to HCC [
12]. However, the relationship between gut microbiome and host genes in patients with MASLD is scarce and not well understood.
The transition from MASLD to HCC involves complex interactions among multiple biological systems. Advances in high-throughput sequencing have enabled integrated analysis of omics data, such as gut microbiome and transcriptomic profiles, alongside genetic variations [
21,
22]. Transcriptome profiling of peripheral blood mononuclear cells (PBMCs) provides a minimally invasive approach to assess systemic immune responses and immunological alterations in liver diseases [
17,
20]. Combining PBMC transcriptome and gut microbiome analyses facilitates comprehensive investigation of host–microbe interactions and supports the discovery of biomarkers for early detection and targeted interventions in MASLD-related HCC.
In this study, we investigated the interactions between the gut microbiome and host genes to identify diagnostic biomarkers for early-stage HCC in patients with MASLD using a multi-omics approach. In the discovery phase, gut microbiome profiles were obtained from fecal specimens via 16 S rRNA sequencing, and host gene expression was assessed through PBMC transcriptomic profiling using RNA sequencing. Candidate biomarkers were identified based on gut microbe–host gene interactions and validated by real-time PCR.
Methods
Study design and setting
This study was a cross-sectional observational study conducted at King Chulalongkorn Memorial Hospital, Thailand, between 2022 and 2023. The primary objective was to compare gut microbiome composition and host gene expression among healthy controls, patients with MASLD at different fibrosis stages, and patients with MASLD-associated hepatocellular carcinoma (MASLD-HCC). The study protocol was approved by the Institutional Review Board of Chulalongkorn University (IRB Nos. 049/63, 957/64, and 981/64), and all participants provided written informed consent. All procedures complied with the Declaration of Helsinki. Both the discovery and validation phases were within the same recruitment framework, which limits generalizability.
Healthy controls included males and females aged ≥ 18 years without liver steatosis or metabolic and chronic diseases such as diabetes, hypertension, or dyslipidemia. MASLD patients were adults aged ≥ 18 years diagnosed with metabolic dysfunction-associated steatotic liver disease, MRI-PDFF grade ≥ 1 (defined as MRI-PDFF ≥ 5.4%) [
23,
24]. Based on these criteria, participants were stratified into two groups: F01 (< 2.6 kPa) and F234 (>3.0 to >4.7 kPa), corresponding to “early or mild fibrosis” (F01) and “late or significant fibrosis” (F234) stage MASLD, respectively. MASLD-associated hepatocellular carcinoma (MASLD-HCC) patients were adults aged ≥ 18 years with HCC diagnosed histologically or radiologically according to American Association for the Study of Liver Diseases (AASLD) criteria, showing arterial phase hyperattenuating and portal venous phase hypoattenuation on CT or MRI.
Exclusion criteria for MASLD and MASLD-HCC groups included chronic viral hepatitis B or C (confirmed by HBsAg and anti-HCV antibodies), autoimmune hepatitis, alcoholic liver disease, HIV infection, cirrhotic complications, other liver diseases, known malignancies (other than HCC in the MASLD-HCC group), and excessive alcohol intake. To reduce confounding influences on gut microbiota and host metabolic profiles, all participants were instructed to discontinue antibiotics, probiotics, prebiotics, fermented foods, and proton pump inhibitors for at least four weeks prior to enrollment. Imaging data were interpreted by a radiologist blinded to clinical and laboratory information.
Specimen collection
Participants were instructed to collect approximately 1 gram of feces using DNA/RNA Shield Fecal Collection Tubes (Zymo Research, US). The collected fecal samples were vigorously shaken and stored at −80 °C until further processing. Blood samples were collected in EDTA tubes from healthy controls, MASLD patients, and MASLD patients with HCC. Peripheral blood mononuclear cells (PBMCs) were isolated using Lymphoprep™ Density Gradient Medium (Serumwerk Bernburg AG, Bernburg, Germany) at 2,800 rpm for 15 min at room temperature. The isolated PBMCs underwent two washes and were stored in phosphate-buffered saline (PBS) before being kept at −80 °C until further processing.
The ZymoBIOMICS DNA Miniprep Kit (Zymo Research Corp, Irvine, CA, USA) was employed to extract DNA from the fecal samples, adhering to the manufacturer’s guidelines. The concentration and purity of the total DNA were evaluated using a DeNovix™ UV-Vis spectrophotometer, and the DNA was stored at −20 °C until further experimentation. Subsequently, for amplification-based 16 S rRNA gene sequencing, the V4 hypervariable regions were amplified utilizing the forward primer 515 F and the reverse primer 806R [
25]. This process was followed by paired-end sequencing on the Illumina MiSeq platform (Illumina, San Diego, CA, USA) conducted at Mod Gut Co., Ltd. (Bangkok, Thailand).
Data quality was evaluated with FastQC and summarized with MultiQC [
26]. All data were analyzed using the amplicon sequence variant (ASV) approach. This was done pre-processed by using the DADA2 following with QIIME2 pipeline [
27]. In total, 3974 amplicon sequencing variants (ASVs) were identified across all samples. Retention rates per sample ranged from 26.48% to 78.63% (with an average of 54.5%). The ASV count table comprised a total of 2,754,539 counts, with counts per sample ranging from a minimum of 5,970 to a maximum of 202,706 (averaging 43,040). The details for a summary from 16S rRNA sequencing were available in Supplementary Table 8. The Silva 138.1 prokaryotic SSU was used as reference taxonomy database [
28]. Then the relative abundance and alpha diversity were computed using the ‘phyloseq’ (v.1.42.2) and ‘microbiome’ R packages (v.1.20.0). Principal coordinate analysis (PCoA) using Bray-Curtis distances was conducted using an online tool, MicrobiomeAnalyst, available at
https://www.microbiomeanalyst.ca/ [
29]. Furthermore, permutational multivariate analysis of variance (PERMANOVA) was conducted to detect differences in the overall gut microbial community concerning clinical conditions in beta-diversity [
30].
Predictive pathway analysis was performed using MetaCyc Metabolic Pathway databases through Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt2) was evaluated to predict functional profiles of microbial communities base on the marker gene [
31].
Differences in estimated abundance of functional profiles between groups were assessed using Statistical Analysis for Metagenomic Profile (STAMP) [
32]. Welch’s t-test, adjusted for multiple testing using the Benjamini-Hochberg false discovery rate (FDR) was applied. Significant findings were defined as corrected
p-values < 0.05.
Total RNA was extracted from PBMC using TRIzol™ LS Reagent (Thermo Fisher Scientific Inc, CA, US) following the manufacturer’s guidelines. Prior to RNA sequencing, RNA quality assessment included measuring total RNA concentration with a Qubit RNA assay kit (Thermo Fisher Scientific Inc, CA, US) and RNA integrity using TapeStation RNA ScreenTape Analysis (Agilent, CA, US). The RNA samples underwent quality control checks, meeting the criteria of a total RNA concentration of ≥ 50 ng/µl in more than 10 µl and an RNA integrity of > 6.5, ensuring their suitability for RNA sequencing. The library reads were then prepared, multiplexed, and loaded on an Illumina HiSeq sequencer (Illumina, San Diego, CA, USA). The library preparation and sequencing procedure was conducted by Vishuo Biomedical (Singapore).
Following RNA extraction from PBMCs in the validation set, the total RNA was quantified using a DeNovix™ UV-Vis spectrophotometer. Subsequently, the RNA was promptly utilized as a template for synthesizing cDNA, using random primers, the RevertAid First Strand cDNA Synthesis Kit (Thermo Fisher Scientific Inc, CA, US) following the manufacturer’s guidelines. The product of the first strand cDNA synthesis was stored at −20 °C until the validation step.
The raw RNA sequencing data was obtained in FastQ format, with each sample demonstrating a proportion of quality reads more than 85% of the total reads. Data processing was performed using the nf-core/rnaseq analysis pipeline (v.3.10) [
33]. The summary of RNA sequencing read was accessible in Supplementary Table 9. Initially, raw read data underwent trimming for adaptor sequences and quality using Trim Galore. Subsequently, sequencing reads were aligned to the GRCh38 human reference genome using STAR, with alignments projected onto the transcriptome. Downstream quantification at the BAM level was conducted using Salmon. The pipeline further generated a gene expression matrix and an extensive quality control (QC) report.
Differential expression analysis of bacterial and host-genes
Differential expression analysis of bacterial and host genes was conducted using the DESeq2 package (v1.38.3) in R, which is a powerful tool employed for analyzing RNA-seq and microbiome data. The analysis compared the subgroup of patients with MASLD, F01 or F234 versus MASLD-HCC. For host transcriptomic data, normalization was performed using the median-of-ratios method implemented in DESeq2, and a threshold of log2 fold change (lfc) > 3.0 (up- or downregulated) with FDR-adjusted p-values (
padj) < 0.05 was applied to highlight the most robust transcriptomic changes. For microbiome data, differentially abundant genera were selected using FDR
pdj < 0.05. A Venn diagram was generated separately for microbiome (Fig.
4A) and transcriptomic (Fig.
4B) analyses to visualize overlapping features between the MASLD-F01 and MASLD-F234 groups compared to MASLD-HCC. In total, 84 DEGs from PBMCs transcriptomic data and 7 bacterial genera from gut microbiome data were identified. These features were then subjected to Spearman’s rank correlation analysis to examine host-microbe associations.
Correlation analysis and enrichment analysis
The correlation analysis was conducted between the expression data of host-genes and the abundance data of taxa at the genus level, as defined previously for the gut microbiome. Spearman’s rank test correlation was employed for analysis counts gene expression and microbiome abundance. We calculated the Spearman rank correlation coefficients and their respective p-values using the ‘scipy.stats’ function in Python (v.3.9.7). A total of 588 statistical tests were conducted, resulting from the combination of 84 genes and 7 taxa.
Functional enrichment analyses were performed using Gene Ontology (GO), which provides valuable insights into the functional significance of differentially expressed genes in the context of biological processes, molecular functions, and cellular components, and ConsensusPathDB (CPDB) analysis to elucidate the functional and signaling pathways of Differentially Expressed Genes (DEGs) from the transcriptomic data.
Validation of bacterial and host-genes by quantitative real‑time PCR (qPCR) analysis
The candidates bacterial and host-genes were selected based on the DESeq2, and Spearman’s correlation result intersect between F01 and F234 group compared to MASLD-HCC. The candidates from host-genes including COL10A1, RIMS3 and ENAH were validated in PBMCs and Veillonella were validated in fecal sample of healthy control, patients with MASLD and MASLD-HCC by qPCR.
The qPCR reaction using 4X CAPITAL™ qPCR Green Master Mix HROX (Biotechrabbit, Berlin, Germany) following the manufacturing instruction. The primer and condition for bacteria and host-genes in the validation set were shown in the Supplementary Table 10. The reactions were carried out on a QuantStudio 5 Real-Time PCR System (Applied Biosystems, Carlsbad, CA, USA). The analysis of qPCR was performed by a duplicate of each sample with positive controls for each target gene and negative controls for interpretation.
The Veillonella gene fragment from bacteria was amplified using Taq DNA polymerase (TIANGEN Biotech Co., Ltd, Beijing, China) and subsequently cloned into the T&A TM cloning vector kit (Yeastern Biotech Co., Ltd., Taipei City, Taiwan) following the manufacturer’s instructions. The recombinant plasmid sequences were confirmed through DNA sequencing using M13 universal primers conducted by U2Bio (U2Bio (Thailand) Co., Ltd., Bangkok, Thailand). The plasmid copy number was determined, and a tenfold serial dilution of a positive control plasmid served as a standard curve to quantify the copy number of Veillonella per gram weight of feces in sample. Host-gene expression was normalized using the GAPDH endogenous reference gene. The expression data were presented in log2 Relative expression.
Statistical analysis
Demographic data analysis utilized SPSS (version 22.0.0, SPSS Inc., Chicago, IL, USA) and GraphPad Prism (version 9.5.0, Boston, MA, USA). Statistical tests were selected based on data distribution and study design. Categorical variables were compared using the Chi-square test. Continuous variables were assessed using one-way ANOVA for parametric data or the Kruskal–Wallis test for nonparametric data. For two-group comparisons, Student’s t-test was applied to parametric values, and the Mann–Whitney U test to nonparametric values. Univariable and Multivariable logistic regression analyses were performed. Continuous variables, including expression of COL10A1, ENAH, RIMS3, and Veillonella abundance (log copy/g), were modeled as continuous predictors. For better clinical interpretability, age and BMI were dichotomized based on clinically relevant cutoffs (< 60 vs. ≥60 years; <25 vs. ≥25 kg/m²) to reflect age-related and lean-MASLD risk categories. Diabetes mellitus (DM), hypertension (HT), and dyslipidemia (DLP) were included as binary variables (0 = no, 1 = yes). Diagnostic performance of bacterial genera was evaluated by calculating the area under the receiver operating characteristic curve (AUC) with corresponding 95% confidence intervals (CIs) using a logistic regression model in SPSS. Model calibration was evaluated using a decile-based method with locally weighted scatterplot smoothing (LOESS). Predicted probabilities from the multivariable logistic regression were grouped by deciles, and observed versus predicted event rates were plotted. Calibration curves were generated in R (v4.5.1) using rms, dplyr, and ggplot2 packages, with a LOESS-smoothed line and 95% confidence ribbon; the 45° line indicated perfect calibration. Decision curve analysis (DCA) assessed clinical utility using the rmda package in R. Net benefit was computed across threshold probabilities (0.05–0.80) and compared with “treat-all” and “treat-none” strategies. Thresholds where the model exceeded both indicated optimal clinical usefulness.
All p-values are reported as exact values unless < 0.001, in which case they are presented as p < 0.001. Statistical significance cut-off was defined as p < 0.05.
Discussion
MASLD incidence is driven by cardiometabolic risk factors including obesity, insulin resistance, hypertension, and hyperlipidemia [
8]. Moreover, host genes, environmental factors, the status of liver fibrosis and cirrhosis have been recognized as contributing to the pathophysiology of the diseases and can lead to HCC [
34,
35]. To date, accumulated evidence has shown that the gut microbiota is a critical environmental factor related to health status and chronic diseases [
36]. The bidirectional interactions between host and microbiome in MASLD remain incompletely understood. This study provides the first integrated characterization of gut microbiota alterations and host gene expression across MASLD fibrosis stages and MASLD-HCC, offering novel insights into their interplay and highlighting potential clinical applications.
Consistent with previous reports [
37,
38], we observed alterations in gut bacterial diversity and specific microbial signatures in MASLD patients with and without HCC. Although alpha diversity tended to decline with fibrosis severity, these differences were not statistically significant between mild (F01) and significant fibrosis (F234) stages. Notably, bacterial diversity was markedly reduced in MASLD-HCC compared to healthy controls, suggesting that microbial dysbiosis may contribute to or reflect malignant transformation rather than merely fibrosis progression. Beta diversity analysis further supported this, revealing distinct microbial community structures between MASLD patients with and without HCC, while no significant differences were observed among fibrosis stages in the absence of malignancy. This indicates that HCC-associated metabolic and immune disturbances, rather than fibrosis progression, predominantly influence gut microbiota composition. For example, in our study found the enrichment of taxa such as
Veillonella and
Streptococcus may reflect enhanced lactate utilization and altered bile acid metabolism commonly observed in hepatocarcinogenesis [
39,
40].
Prior studies have shown that specific microbiota changes, such as increasing
Bacteroides and
Ruminococcaceae members in NAFLD-related HCC patients, may trigger an immunosuppressive response, characterized by increased IL-10 + Tregs, reduced pro-inflammatory cytokine production (IL-2, IL-12), and diminished cytotoxic CD8 + T cell activity, potentially influencing HCC progression [
38,
41].
Ruminococcus gnavus has been identified as a characteristic bacterial signature in HCC tissues of patients with hepatitis B or C viruses [
38] Another study also reported the enrichment of
B. caecimuris,
Veillonella parvula,
Clostridium bolteae,
Bacteroides xylanisolvens, and
Ruminococcus gnavus in patients with NAFLD-HCC [
42]. Our DESeq2 analysis compared both mild and significant fibrosis stages with MASLD-HCC. After controlling for multiple testing,
Veillonella,
Streptococcus,
Erysipelatoclostridium (lactate-producing taxa), and
Escherichia–Shigella remained significantly enriched in MASLD-HCC, supporting the robustness of our microbiome signature while minimizing false-positive findings. In contrast, butyrate-producing genera such as
Lachnospiraceae UCG-001 and
[Eubacterium] ventriosum group were depleted. The persistence of these taxa across fibrosis stages suggests that microbiome remodeling is driven primarily by malignant transformation rather than fibrosis progression alone. Collectively, these alterations indicate a shift toward lactic acid and endotoxin-producing bacteria that may promote hepatic inflammation via the gut–liver axis, reinforcing their potential as microbial markers of hepatocarcinogenesis. Furthermore, based on existing literature, a plausible mechanistic hypothesis is that
Veillonella species activate TLR4–p38 MAPK signaling via lipopolysaccharides (LPS). In particular, LPS from
Veillonella parvula have been shown to stimulate TNF-α and IL-6 release from human PBMCs in a dose-dependent manner [
43], both of which are implicated in chronic liver inflammation and carcinogenesis. From a clinical perspective, these findings underscore the promise of gut microbiota–based biomarkers for noninvasive assessment of MASLD progression. The consistent enrichment of
Veillonella and
Streptococcus across fibrosis stages suggests their potential as early indicators of malignant transformation. Integrating microbial taxa with circulating host gene markers expression or AFP could enhance diagnostic accuracy beyond conventional imaging.
Functional microbiome analysis revealed significant enrichment of bacterial metabolic pathways, including de novo biosynthesis of adenosine and guanosine deoxyribonucleotides, taxadiene biosynthesis, the urea cycle, and the aspartate superpathway, in MASLD versus MASLD-HCC groups. The de novo deoxyribonucleotide biosynthesis axis has been linked to essential metabolic alterations in small-cell lung carcinoma, suggesting a potential role in tumor proliferation; thus, modulation of this pathway may influence MASLD-HCC progression. Notably, the urea cycle, primarily occurring in the liver, is frequently dysregulated in various cancers, conferring metabolic advantages for tumor survival and growth. Metabolic labeling studies show that breast cancer cells can efficiently incorporate ammonia into amino acids such as glutamate, proline, aspartate, and alanine [
44]. In nonalcoholic steatohepatitis (NASH), impaired urea synthesis is associated with hyperammonemia, contributing to liver injury and fibrosis [
45]. These altered pathways may interact with host transcriptional programs, consistent with our findings of PBMC gene expression changes across fibrosis and HCC stages and offer insights into disease pathophysiology.
Accumulating evidence indicates that alterations in gene expression within PBMCs may serve as indicators of disease-specific pathophysiological changes, highlighting their potential as diagnostic and monitoring tools for a range of chronic disorders, particularly when tissue biopsy is impractical or invasive [
46] and for distinguishing malignant from non-malignant states, including HCC [
17]. In this study, we performed transcriptome profiling of PBMCs and integrated the results with candidate bacterial taxa from gut microbiome analyses. We identified host genes, including
COL10A1,
ENAH, and
RIMS3, that showed notable correlations with the signature bacterium
Veillonella. Among these,
ENAH, the human ortholog of mammalian enabled (hMENA) and a member of the Ena/vasodilator-stimulated phosphoprotein (VASP) family, encodes an actin regulatory protein essential for cell motility, adhesion, and invasion, thereby facilitating metastasis [
47]. ENAH, regulated by the splicing factor 3b subunit 4 (SF3B4), has been shown to promote HCC development via activation of Notch signaling pathways. Overexpression of ENAH has been reported in multiple cancers, including gastric, colorectal, breast, and hepatocellular carcinomas, with implications for tumor differentiation and prognosis [
48]. In our study,
ENAH expression in MASLD-HCC cases showed an increase correlated with fibrosis severity. However,
ENAH did not emerge as a prognostic indicator for overall survival, aligning with prior research that identified invasion depth, rather than
ENAH expression, as an independent survival predictor in gastric cancer [
49]. A similar pattern was observed in the expression of the
RIMS3 gene, which increased progressively with fibrosis stages and highest in MASLD-HCC patients. However, its biological role remains largely unexplored and warrants further investigation.
COL10A1, a collagen family member and structural extracellular matrix (ECM) protein [
50], has been implicated in tumor angiogenesis and fibrogenesis [
51] via activation of Transforming Growth Factor Beta 1 (TGF-β1) and Sex determining region Y (SRY) Box 9 (SOX9) transcription factor axis [
50]. The TGF-β signaling pathway is a key driver of fibrogenesis, activating hepatic stellate cells, stimulating ECM production, and promoting hepatocarcinogenesis [
52]. Consistent with prior findings, elevated
COL10A1 expression has been reported in gastric and colorectal cancers and is associated with poor survival [
53]. Moreover, increased
COL10A1 expression has been strongly linked to tumor angiogenesis [
47]. In this study, we observed a marked upregulation of
COL10A1 in MASLD-HCC, which achieved high diagnostic accuracy for distinguishing mild fibrosis (F01) from MASLD-HCC, outperforming serum AFP. In univariate analysis, both
COL10A1 expression and
Veillonella abundance showed independent predictors of MASLD-HCC. Although
COL10A1 showed borderline significance, its consistent upregulation and high discriminative value support its biological importance. Increased
Veillonella levels were also strongly associated with MASLD-HCC after adjustment for age, BMI, diabetes, hypertension, and dyslipidemia. The inverse association between BMI and HCC risk supports the lean-MASLD phenotype [
54], suggesting that gut dysbiosis and host–microbiota interactions may drive hepatocarcinogenesis in non-obese individuals.
Together, the combined COL10A1 + Veillonella + AFP model provides a practical, non-invasive tool for early detection of HCC among patients with mild fibrosis in primary care settings. Its integration of microbial (Veillonella) and host-gene (COL10A1) signatures offers improved diagnostic accuracy over AFP alone. Because the required assays qPCR or ELISA are compatible with routine laboratory infrastructure, this model could be incorporated into community-level MASLD screening programs as a pre-imaging triage test. Such molecular pre-screening could prioritize high-risk individuals for ultrasound or MRI, thereby optimizing resource allocation and lowering surveillance costs, especially in regions with limited imaging capacity.
Although these results provide novel insights into the interplay between gut microbiota and host transcriptional profiles in MASLD-HCC, several considerations should be acknowledged when interpreting the findings. First, the sample size was modest, with a particularly small single validation cohort, underscoring the need for larger, multi-center studies to confirm and expand our findings. Where possible, future studies should compare baseline characteristics between enrolled and non-enrolled but eligible patients to assess potential selection differences. Additionally, external validation in independent, ethnically diverse cohorts will be needed to confirm the generalizability of our findings. Second, while we observed correlations between specific microbial genera and host gene expression, the functional roles and underlying mechanisms remain unclear. Due to the cross-sectional design, causality cannot be inferred. Longitudinal studies and functional experiments in vitro or in vivo are warranted to clarify these pathways. Third, microbiome profiling relied on 16 S rRNA sequencing, which limits taxonomic resolution and functional insights; future work incorporating shotgun metagenomics could provide more precise classification and functional characterization. Finally, numerous host and environmental factors that could influence the microbiome or host gene expression, such as age, dietary habits, and genetic background were not controlled across the cohorts. These variables may act as confounders and potentially affect the outcomes of the study. Despite these constraints, integrating microbiome and transcriptome data in this study represents a meaningful step toward developing clinically relevant, non-invasive biomarkers for early MASLD-HCC detection.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.