Background
Renal cell carcinoma (RCC) is a heterogenous disease, which is composed of ccRCC and nccRCC [
1]. Over the past few years, targeted therapies have significantly improved overall survival (OS) and relapse free survival (RFS) of patients with ccRCC [
2]. However, ascribing to relatively low incidence (25–30%) and rare clinical trails of nccRCC, the optimal targeted therapies for nccRCC patients still remain uncertain [
3]. ChRCC, taking up 4–5% of RCC, is the second common subtype of nccRCC. Although the tumor grade or stage of ChRCC is relatively low, there is no significant difference between patients with localized ChRCC and ccRCC in 5-year cancer-specific survival rates (P = 0.98) [
4]. Due to the poor outcomes of ChRCC, it’s urgent to identify novel molecular biomarkers to evaluate the prognosis of ChRCC patients, which might help to assess the malignancy and provide therapeutic potential for this disease.
WGCNA is a method commonly used to explore the complex relationships between genes and phenotypes. This method is able to transform gene expression data into co-expression modules and provide insights into signaling networks that may be responsible for phenotypic traits of interests [
5‐
7]. WGCNA is widely used in various biological processes, such as cancer, neuroscience and genetic data analysis, which is quite helpful for the identification of potential biomarkers or therapeutic targets [
8‐
11]. Not only can it analyses mRNA level of tumor samples, but also work on microRNA or lncRNA datasets of neoplasms to find candidate biomarkers for prognosis and treatment [
12,
13].
In this study, WGCNA method was firstly used to analyze clinic traits and gene expression data of ChRCC samples provided by TCGA database to identify key genes associated with tumor prognosis and progression. Our findings may be very beneficial to assess malignant potential of ChRCC and offer therapeutic methods to this neoplasm.
Methods
Data sources and data preprocessing
Gene expression data and patient clinic traits of ChRCC were downloaded from The Cancer Genome Atlas (TCGA) database (
https://cancergenome.nih.gov/). Annotation information of microarray was used to match probes with corresponding genes. The average value was calculated out for those genes corresponding to more than one probes, while probes matched with more than one gene were eliminated.
Screening for differentially expressed genes
The “DEseq2” R package was used to screen DEGs between ChRCC samples and paired normal tissues in the expression data. The DEGs threshold was set at a |log2FoldChange| > 0.585 and adj.P.value < 0.05. After DEGs were screen out, flashClust tool package in R language was used to perform cluster analysis of ChRCC samples.
Co-expression network construction and module analysis
Firstly, expression data profile of DEGs was tested by to check if they were the good samples and good genes. Afterwards, power value was screened out by WGCNA [
14] algorithm which is implemented in the R software package (
http://www.r-project.org/). Scale independence and average connectivity degree of modules with different power value was tested by gradient method (the power value ranging from 1 to 20). The appropriate power value was determined when the degree of independence was above 0.85 and average connectivity degree is relatively higher [
8]. Once the power value was determined, the scale-free gene co-expression networks were constructed by WGCNA algorithm. Besides, the corresponding genes information of each module was extracted out. The minimum number of genes in each module was set as 50 for the high reliability of the results.
Interaction analysis of co-expression modules of ChRCC
After co-expression modules were identified by WGCNA algorithm. Heatmap was painted to describe the strength of the interactions between different modules by heatmap function embedded in R language.
Construct module-trait relationships of ChRCC and key module identification
The correlation between module eigengenes (MEs) and phenotype (clinic traits) was used to evaluated module-trait associations. MEs were considered as the major component in the principal component analysis for each gene module. We calculated the correlation between MEs and clinical trait to identify the relevant module. Gene significance (GS), which was defined as the log10 transformation of the P value (GS = lgP) in the linear regression between gene expression and clinical information, was calculated to evaluate correlation strength. Modules with the highest correlation coefficients among all modules were usually considered as the key module and selected for further analysis.
Functional enrichment analysis of the key module
The information of genes in key modules was upload to Enrichr online database to perform GO and KEGG pathway analysis [
15,
16]. Analysis results were extracted out under the condition of P < 0.05 after correction [
17]. The top 10 GO terms were visualized if there were more than ten terms, so as to KEGG pathways.
Selection of candidate hub genes
Hub genes, with highly intramodular connectivity, have been shown to be functionally significant. In this study, candidate hub genes were defined by module connectivity, measured by module membership (MM) > 0.8 and clinical trait relationship, measured by significance of the Pearson’s correlation (GS. Pathologic stage > 0.2) [
18]. Furthermore, we uploaded all genes in the hub module to the STRING database, choosing confidence > 0.4 to construct protein–protein interaction (PPI). After setting degree cut-off = 5, node score cut-off = 0.2, k-core = 2, and max. depth = 100, the most significant sub-module was selected by using plug-in MCODE [
9]. Genes both in co-expression network and MCODE sub-module were regarded as candidate hub genes for further analysis.
Hub genes identification and validation
Another independent dataset GSE15641 containing six ChRCC samples and 23 normal tissues was extracted and performed on GEO2R analysis (
https://www.ncbi.nlm.nih.gov/geo/). |log2FoldChange| > 1 and P.Value < 0.05 as criterion, DEGs were identified. Candidate hub genes were mapped to DEGs in GSE15641 to find the real hub genes. For further validation, real hub genes were upload to online TCGA database to perform survival analysis (
http://gepia.cancer-pku.cn/). ChRCC patients were divided into two groups according to median expression of each hub gene (high vs. low) and Kaplan–Meier survival analysis was conducted.
Statistical analyses
Two-tailed Student’s t-test was used for significance of differences between groups. Statistical analyses were performed with GraphPad Prism 6.02. Statistical significance was set at probability values of P < 0.05.
Discussion
ChRCC is an uncommon, but not rare, malignant disease representing ~ 5% of histologic spectrum of cancer arising from kidney [
19]. However, patients with ChRCC have a relatively low risk of tumor progression and cancer-specific death [
20], the clinical outcomes of target therapies between ChRCC and ccRCC were not significantly different in metastatic disease [
21]. To the best of our knowledge, there are no effective treatments for patients with metastatic ChRCC or any preventions for tumor recurrence. Thus, it’s important to get a better understanding of molecular mechanism of ChRCC and identify potential biomarkers to evaluate the biological behavior of this malignancy.
WGCNA has many prominent advantages over other methods since the analysis explores associations between co-expression modules and clinic traits and the results had much higher reliability and biological significance [
22]. In this study, a total of eight co-expression modules were constructed by the 2215 DEGs from the 66 human ChRCC samples by WGCNA method, which was used to detect the relationship between ChRCC transcriptome and clinic traits. We calculated the correlations between co-expression modules and pathologic stage, neoplasm status and living status. Brown module with the highest correlation with clinic traits were regarded as the key module to explore primary cause for disease progression.
For further analysis, genes in brown module with high module membership (> 0.8) and significance of correlations with pathologic stage (> 0.2) were filtered out. Meanwhile, MCODE analysis was performed on brown module to find a panel of genes with high connective degrees [
9]. 29 candidate hub genes were identified by overlapping results of co-expression analysis and MCODE analysis. GO and KEGG analysis showed that dysregulation of cell cycle may be the underlying mechanism of tumorigeneses of ChRCC.
29 candidate hub genes were subsequently validated in GSE15641 dataset, four candidate genes (SKA1, ERCC6L, POLQ, GTSE1) differentially expressed both in TCGA and GSE15641 database were considered as real hub genes. After excluding POLQ for conflicting expression level in two databases, two of three hub genes (SKA1, ERCC6L) were found to be significantly correlated with shorter overall survival time under their over expressions.
SKA1 was a microtubule-binding subcomplex of the outer kinetochore which is essential for proper chromosome segregation. It played an important role on tumorigenesis in multiple malignancies [
23,
24]. SKA1 over-expression led to cancerization in human prostate epithelial cells via the induction of centriole over-duplication [
25]. Depletion of SKA1 inhibited cell proliferation in gastric cancer by blocking cell cycle in S phase [
26]. What’s more, SKA1 was proved to be an oncogene in ccRCC which could be down-regulated by antitumor miR-10a-5p transfection [
27]. ERCC6L gene, also named PLK1-interacting checkpoint helicase (PICH), was a member of the SNF2 protein family (SWI/SNF catalytic subunit SNF2). It was a mitotic target and substrate of polo-like kinases (PLKs) which regulates multiple processes in mammalian cell mitosis [
28]. Downregulation of ERCC6L decreased cell viability in RCC cell lines by blocking mitogen-activated protein kinase (MAPK) signaling pathway and interactions with protein PLK1 [
29]. Protein encoded by GTSE-1 modulated cell migrations in an EB1-dependent manner. Up-regulation of GTSE1 expression could be associated with increased invasive potential in breast cancer [
30]. Depletion of GTSE-1 enhanced mitotic centromere-associated kinesin (MCAK) activity in mitotic cells, leading to chromosomal instability (CIN) which was presented in most solid tumors [
31]. Silencing GTSE-1 expression inhibits proliferation and invasion of hepatocellular carcinoma cells [
32]. Based on many studies about three hub genes, we could find their important roles in tumorigenesis and metastasis. Targeting these biomarkers and other candidate hub genes participating in cell cycles could provide therapeutic potentials for ChRCC.
Some limitations exiting in our studies should be mentioned. The most vital genes out of 29 candidate hub genes couldn’t been filtered out under restrictions of the bioinformatics methods. Multi-central and large sample studies on ChRCC expression profiles were needed for validation of our findings. However, this couldn’t be fulfilled due to limited online databases for ChRCC.
Conclusions
Our study used weighted gene co-expression analysis to construct a gene co-expression network, identify and validate the key module and hub genes associated with the progression and prognosis of ChRCC. Functional analysis of hub genes reveal mitotic cell cycle dysregulation could be the underlying mechanism of ChRCC. Three real hub genes including SKA1, ERCC6L and GTSE-1 were identified. SKA1 and ERCC6L were validated in association with poor prognosis of ChRCC. However, further molecular biological experiments are needed to confirm the function of these biomarkers in ChRCC.
Authors’ contributions
JZ designed the project for this study. XY analyzed datasets and wrote the initial edition of this manuscript. JW revised the manuscript and assisted with formatting and language editing. All authors read and approved the final manuscript.