Background
Sepsis is defined as organ dysfunction syndrome caused by uncontrolled inflammatory response to infection. Sepsis is a leading cause of mortality in hospitalized patients [
1,
2], and accounts for 30% of case fatality in hospitalized patients [
3]. Despite the high mortality and morbidity, few agents are proven to be effective for the treatment of sepsis. Thus, more regulatory factors need to be identified to provide potential targets for the design of effective therapeutic agents.
Several studies have used transcriptome analysis to investigate potential biological pathways regulating the pathogenesis of sepsis [
4‐
8]. These studies were performed by differential gene expression analysis, followed by enrichment analyses to established functional pathways. In these analyses, genes were tested individually. The sensitivity to identify biologically meaningful genes can be low due to multiple testing adjustment. Sepsis is a heterogeneous syndrome and its pathogenesis involves hundreds of genes. In this situation, the individual contribution of a single gene is too small to be detected with univariate test. In most diseases, genes function via networks of co-expressed genes with similar biological functions. Thus, identification of co-expression pattern could provide further insights into sepsis-associated biological pathways. Weighted gene co-expression network analysis (WGCNA) is a systems biology approach used for finding gene clusters with highly correlated expression levels and for relating them to phenotypic traits [
9]. Rather than relating thousands of genes to the clinical trait, WGCNA focuses on the relationship between a few modules and the trait [
10,
11]. To the best of our knowledge, WGCNA has been used to explore coexpression pattern in mouse model of sepsis [
12], HIV infection [
13], in vitro inflammatory cells [
14], and pediatric sepsis [
15]. Regulatory factors including transcription factors and miRNA were not systematically explored in adult sepsis.
The present study aimed to identify gene co-expression modules in sepsis by using the consensus WGCNA (consensus from different causes of sepsis including pneumonia and abdominal sepsis). These consensus modules were related to clinical traits and enriched to functional biological pathways. Potential regulators of these modules were explored by using well curated databases.
Materials and methods
The GEO dataset and data preprocessing
The study used the publicly avaiable dataset GSE65682 from the Gene Expression Omnibus (GEO) database. The dataset contained 802 samples including healthy controls, non-sepsis critically ill patients and sepsis patients. Futhermore, the sepsis patients could be further categorized into pneumonia sepsis (n = 192), abdominal sepsis (n = 51) and others (n = 443) based on infection site. PAXgene blood RNA was isolated at intensive-care unit (ICU) admission and whole-blood leukocyte transcriptome was performed at the platform of Affymetrix Human Genome U219 Array. Key benefits of the gene chip include increased productivity and efficiency through parallel processing, excellent gene expression accuracy and reproducibility and complete coverage of the annotated genome. Further detials of the dataset can be found at other publications [
16,
17]. Raw intensity expression data were preprocessed with the Robust Multi-array Average (RMA) method [
18]. An advantage of this method is that normalization occurs at the probe level (rather than at the probeset level) across all of the selected hybridizations. The maximum expression intensity was used when multiple probe sets mapped an individual gene symbol. The quality of processed data were checked by using MA plot (Additional file
1: Figure S1).
Consensus weighted gene co-expression network analysis
The first step in constructing a consensus WGCNA was to choose the soft threshoulding power to which co-expression similarity was raised to calculte adjaency. We chose from a set of values from 4 to 20 based on the criterion of approximate scale-free topology [
10]. Since the topological overlap matrices (TOM) of different sepsis may have different statistical property, we performed quantile normalization over the three types of sepsis (e.g. pneumonia, abdomianl and other sepsis). The consensus TOM was calculated by taking the component-wise (“parallel”) minimum of the TOMs in individual dataset, which was then input to hierarchical clustering. Finally, modules were identified in the resulting dendrogram using the Dynamic Tree Cut algorithm [
19]. This algorithm has several advantages such as capability of identifying nested clusters and flexibility. Modules with similar expression profiles were merged at the threshold of 0.25.
Gene significance was defined as the Student t-test statistic for testing differential expression between sepsis and healthy controls. The significance level was adjusted for multiple testing with Bonferroni correction. The dataset also contained critically ill patients such as those with major abdominal surgery without infection. There were common pathways between critical illness and severe infection, the comparison between sepsis versus non-infectious critical illness would omit some important genes. Thus, the differential expression was tested between sepsis versus healthy controls.
Relating consensus module to pneumonia-specific sepsis module
Modules specific to pnenomia sepsis were identified by the method as described above. Pneumonia sepsis modules were then related to consensus modules. We calculated the overlaps of each pair of pneumonia-consensus modules, and used the hypergeometric test to assign a p-value to each of the pairwise overlaps. This is also known as the cross-tabulation based comparison of modules. This method is justified by the idea that if a module is well preserved and reproducible in all types of sepsis (pneumonia, abdominal and other sepsis), this module could represent the common pathways involving the pathogenesis of sepsis [
20].
Module preservation across all three datasets were explored by pairwise comparing eigengene networks in penumonia, abdominal and other sepsis. Mortality was added as an additional “eigengene”. Network preservation is simply the difference between adjacencies in the two compared sets [
20]. A small difference of the adjacency matrix between two sets indicate the modules are well preserved between the two comparing sets.
Relating consensus modules to clinical traits
Module eigengene was calculated for each module as the first principal component of gene expressions for that module. Correlation analysis was performed to relate module eigengene to external traits including age, gender, mortality and survival time (e.g. survivors were censored at 28 days). We combined three datasets into one and performed correlation analysis.
Gene significance for mortality was correlated to the module membership to investigate whether genes significantly associated with mortality outcome was also associated with module membership. Module membership (eigengene-based connectivity) for each gene was calculated by correlating its gene expression profile with the module eigengene of a given module. For a given module, a module membership value of 0 indicates that a gene is not part of the module; whereas a module memberhsip of − 1 or 1 is highly connected to the module.
Enrichment analysis for biological function and transcription factors
Modules associated with important clinical trait such as mortality were further analyzed for their enrichment in Gene Ontology (GO) pathways [
21]. Specifically, the gene set from a given module were enriched to GO terms to find whether some of functional GO terms are over-represented using annotations for that gene set. Upset plot was employed to display overlapped genes among different GO terms. Dotplot shows the gene ratio and adjusted p values for each enriched GO terms. Enriched terms were organized into a network with edges connecting overlapping gene sets. In this way, mutually overlapping gene sets are tend to cluster together, making it easy to identify functional modules. The category netplot depicts the linkages of genes and GO terms as a network, which is helpful to see which genes are involved in enriched pathways and genes that may belong to multiple annotation categories.
Modules (gene lists) significantly correlated with the mortality trait were tested for its over-representation in transcription factor (TF) binding motifs by using RcisTarget [
22]. Two types of databases (i.e. Gene-motif rankings and the annotation of motifs to transcription factors) were employed in the analysis: Gene-motif rankings which provides the rankings of all the genes for each motif and the annotation of motifs to transcription factors. Parameter settings for the score of each pair of gene-motif were: species = Homo sapiens, Scoring/search space = 500 bp uptream the transcription start site (TSS), Number of orthologous species = 10. The annotation of motifs to transcription factors was performed using the motifAnnotations_hgnc ('mc9nr', 24,453 motifs).
Identification of miRNA-target interactions
The multiMiR package was employed for the retrieval of miRNA-target interactions from 14 external databases in R. These databases are comprehensive collections of predicted and validated miRNA-target interactions and their associations with diseases and drugs [
23]. The module of interest was those associated with mortlaity outcome. It was interesting to check whether some, or all, of these genes within a module were targeted by the same miRNA(s). We restrited our search to the “mirtarbase” table because this table included only experimentally validated miRNA-target interactions.
Survival analysis
The association of each module with the survival outcome was determined by using Cox proportional model. The eigengene value of each module was added into the Cox regression model for univariate analysis. Genes matched to the module with the strongest association with survival were used to cluster patients into two groups using reversed graph embedding (DDRTree), which projects data into a reduced dimensional space while constructs a principal tree which passes through the middle of the data simultaneously [
24]. Samples were clustered into two groups using k-means clustering. Survival probablity of the two groups were comapred using the log-rank test.
Discussion
The study employed consensus network analysis to identify gene co-expression modules. These modules were involved in distinct biological functions and were associated with clinical traits. Regulatory mechanisms of some important modules involving transcription regulators and miRNA were explored by validated databases. Consistent with previously published studies employing high-throughput dataset to examine sepsis, these modules were enriched for pathways related to immune response [
28,
29]. The black module was significantly associated with survival outcome and the transcription factor CEBPB was the master regulator of this module. Several miRNAs including hsa-miR-335-5p, hsa-miR-26b-5p, hsa-miR-16-5p, hsa-miR-17-5p and hsa-miR-124-3p were identified to be important regulators of the gene expressions in the module.
Our analysis has several implications for research and clinical practice. First, sepsis is shown to have a dysregulated immune response highlighted by the upregulation of the black module involving biological functions such as myeloid leukocyte mediated immunity, neutrophil mediated immunity, leukocyte degranulation and myeloid cell activation involved in immune response. Immune dysfunction has long been recognized as an important mediator of sepsis [
4,
30,
31]. However, previous studies mostly analyzed high-throughput data at individual gene level involving differential expression analysis followed by functional pathway enrichment [
4,
15,
31,
32]. In neonate sepsis, Meng and colleagues identified 7 hub genes in key pathways. However, the study did not relate these gene expression profiling to clinical outcomes and the results cannot be extrapolated to adult sepsis [
33]. The present study employed WGCNA to identify modules associated with mortality by treating co-expressed genes as a module. The idea is that genes are working together to take their functions in disease processes. Second, by focusing on modules most significantly associated with mortality, we identified several important regulators including transcription factors and miRNAs. The results support the hypothesis that co-expressed genes are very likely to be regulated by common factors. These transcription factors (such as CEBPB and ETV6) and miRNAs are potential targets for the treatment of sepsis. Since the upregulation of the black module is significantly associated with mortality, drugs targeting these sites may improve the survival outcome. Third, although sepsis is a heterogeneous syndrome [
34‐
38], different types of sepsis share some common important immune regulatory pathways. Our study employed consensus WGCNA and examined whether the module identified in one cause of sepsis can also be found in other causes of sepsis. The results show that most modules are well preserved despite the various causes of sepsis, indicating common pathways associated with the sepsis. Furthermore, the module network constructed by TOM is also well preserved across various causes of sepsis. Collectively, these findings support the notion that sepsis can be considered as a clinical symdrome because various causes of sepsis lead to common pathways. Drugs designed to target these common pathways can be helpful in improving clinical outcomes. For example, our study showed that cluster 2 identified by the black module was associated with lower survival probability, and this subtypes of sepsis was characterized by leukocyte activation. Such over-activation of inflammatory response may indicate that immunoregulatory agents such as arachidonic acid and eicosapentaenoic acid can help to improve the survival outcome [
39].
Mortality is an important clinical trait and thus we tried to identify modules most significantly assocated with mortlaity. Two methods were employed to validate that the black module was associated with mortality. Firstly, we simply correlate the module eigengene to the mortality. Module eigengene is computed as the first component in principal component analysis (PCA), which however is a linear transformation of the high-dimensional space. It is well recognized that linear transformation cannnot fully recover the intrinsic structure of a high-dimensional space [
40‐
42]. Thus, we also emplyed manifold learning to better capture the black module [
24]. The result is consistent with the above observation that the black module can well separate survivors from non-survivors. Most clinical trials of sepsis are focusing on how to reduce mortality. The identified black module in our study can help to design drugs that potentially useful for reducing mortality rate. For example, our analysis shows that hsa-miR-335-5p is the top ranked miRNA in the regulation of the black module. Since the black module consists mostly genes involved in inflammatory response, the upregulation of hsa-miR-335-5p is proposed to have inflammatory suppression effects [
43‐
45]. More recently, hsa-miR-335-5p is shown to reduce inflammation via negative regulation of the TPX2-mediated AKT/GSK3β signaling pathway in a chronic rhinosinusitis mouse model [
46]. Collectively, these observations strongly support our results that hsa-miR-335-5p can be a candidate therapeutic target.
The CEBPB was annotated to the most significantly enriched motifs for genes in the black module. The eigengene of black module is the most significant associated with mortality, and thus the master regulator CEBPB is also an important risk factor for mortality via immunomodulation [
47,
48]. Our finding is also consistent with a recent study comparing sepsis with and without shock, in which CEBPB is significantly enriched in septic shock versus non-shock patients [
49]. Since the presence of shock is a significant risk factor for mortality, the result supports the notion that CEBPB can be a potential target for improving survival outcome.
The strength of the study is the large sample size. To the best of our knowledge, the MARS consortium is the largest sepsis cohort with genome-wide blood transcriptional profiling [
16]. Furthermore, the included sepsis subjects were classified into different causes of sepsis, allowing for the consensus network analysis. The consensus analysis identified common functional modules involved in sepsis. However, there are limitations in the study. First, the severity of illness is not available in the cohort, prohibiting correlation analysis for modules and severity scores. However, since most severity scores are developed with the mortality as the end point, the mortality can be used as a surrogate for the severity of illness. Second, the transcription regulators were predicted by bioinformatic analysis, which might have high false positive rate. The in vivo function of these transcription factors and miRNAs should be validated in experimental studies. In this regard, results from the current analysis can be considered as hypothesis-generating. Third, there are more causes of sepsis that have not been categorized in the study. For example, urinary tract infection is also an important cause of sepsis in clinical practice, however, the dataset did not contain this subclass of sepsis.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.