Background
Thyroid cancer is a common malignant neoplasm in worldwide. Recently, the incidence rate of thyroid cancer is rapid raising in the world and becoming the potential threat for public health [
1,
2]. It is important to develop early precise diagnostics method and interfere the thyroid neoplasm progress into malignant carcinoma. Up to now, the Fine-needle aspiration (FNA) biopsy is the most accurate and cost-effective tool for thyroid nodules clinical evaluating. It has been strongly recommended by the American Thyroid Association as standardized clinical operation [
3‐
5]. However, about 10~30% cases’ cytological results are indeterminate, and being labelled as indeterminate or suspicious for malignancy. Among these cytological indeterminate cases, majority of patients underwent partial or complete thyroidectomy and checked by histological evaluation. Although the subsequent postsurgical evaluation results reveal only 6~30% cytological indeterminate cases identified as malignant, it made this clinical operation extremely low efficiency and non-specificity while with higher costs [
6,
7].
Molecular biomarkers analysis is a powerful adjunct approach to traditional carcinomas pathological evaluation. Multiple molecular markers have been discovered and employed in developing precise diagnostics methods and novel strategies to properly treatment. These biomarkers are generated from gene sequence for gene mutations, gene rearrangements, RNA-based assays, gene expression profiling and immune-histochemistry [
8,
9]. As endocrine neoplasm deriving from follicular or para-follicular thyroid cells, thyroid cancer has been reported associated with higher frequency (about 70%) somatic alternation or deletion of genes involving key signaling pathways, such as the mutation of
BRAF and
RAS [
6,
10],
NTRK1 tyrosine kinases and key effectors mitogen-activated protein kinase (MAPK) signaling pathway [
11]. With advanced understanding of thyroid tumor formation, the researches of generic mutation-based biomarkers discovery shifted from single mutation to molecular signatures genes or panels of multiple mutations [
12]. According to the previously American Thyroid Association Management Guidelines, a 7-gene molecular biomarkers panel of genetic mutation and rearrangement (7-gene MT), including BRAFV600E, three isoforms of RAS point mutations and translocations of
PAX8/PPARc and
RET/PTC genes [
6,
13], was recommended to evaluate the residual FNA sample with cytological indeterminate and estimate with high specificity (~ about 90%) [
14‐
16]. Especially, the mutational testing of biomarker genes has been proposed to be a rule-in test with reported higher specificity in clinical practice. Recently, it was reported that the sensitivity of seven-genes mutational panel testing showed huge variation, from 44 to 100% [
6,
17]. It is strong suggested that traditional gene mutation panels analysis may not reliably rule out nodules malignancy in some case population. In current, there is no definitively single optimal molecular test that 100% promised to rule-in or rule–out the malignancy in cytology-indeterminate cases [
18]. It is necessary to discover novel potential molecular biomarkers to enhance sensitivity and specificity of mutational analysis and precise to rule-in the malignance for cytology indeterminate nodules.
Recently, lacking of long term clinical outcome tracking recording of using molecular markers, there are some controversies over the benefit and limitation of existing molecular markers testing [
18]. To enhance the efficiency of thyroid carcinomas patient’s diagnostic, treatment and health management, it is the trend to develop systemic diagnostic strategy and discover novel applicable and specific molecular biomarkers for early diagnostics through analyzing the genetic and expression profiling of thyroid nodules from FNA biopsy [
19]. Among multiple computerization methodologies, the Weighted Gene Co-Expression Network Analysis (WGCNA) is considered as one of the most useful approaches to discover gene co-expression network based functional feature through gene expression profiling analysis [
20]. Recently, WGCNA is be widely applied to screening the signature genes significantly associated with clinical feature. It is powerful to discover candidate biomarkers for cancer early diagnostics, cancer-associated pathways or therapeutic targets for precise treatment in hepatocellular carcinoma [
21], lung cancer [
22], endometrial cancer [
23] and melanoma [
24].
In this study, we employed the WGCNA to analyze gene expression profile of thyroid nodules with cytological indeterminate and aimed to identify the highly connected hub and modules that genes significantly associated with histological malignant thyroid nodule. In addition, we will explore other independently clinical cases through the genetic database to verify the significantly signature genes with genetic changes and discover the key biological pathways significantly associated with malignant thyroid cancer by Genclip pathfinder.
Discussion
In this study, to discover novel biomarkers to accelerate the precise clinical diagnostics for thyroid nodule cases with cytology indeterminate, we designed the whole project workflow, selected specific dataset (GSE34289) and applied the WGCNA approach to analyze the gene expression profiling of thyroid nodule that generated from FNA clinical samples (Fig.
1). The gene expression profile contained 172 specific genes designed for promise diagnostic assessment [
8], which are mainly involving in variously biological and cellular processes that related to energy metabolism, cell differentiation, and cellular development. and aimed to discover some novel biomarkers to accelerate the precise clinical diagnostics for thyroid cancer. It is representative to discover the signature genes significantly associated with thyroid nodules malignancy through gene expression profiling analysis of these cases. Furthermore, this dataset was generated from 49 national widely clinical sites, collected from 3789 patients and evaluated 4812 thyroid nodules samples (size > 1 cm) in United States and well characterized with higher standard. It obtained 577 cytological indeterminate aspirates and finally selected 265 indeterminate nodules for further analysis through blinded histopathological review [
25]. In addition, this dataset also contained two groups of cases labelled as cytology-benign/histopathology-benign (44 cases) and cytology-malignant /histopathology-malignant (55) with validated cytopathology and histopathological features (Additional file Table
S1). Based on this dataset characters, we utilized totally 364 samples and split into 4 groups in this study. The expression level of identified signature genes will be explored in cytology-benign/histopathology-benign and cytology-malignant/histopathology-malignant groups as negative and positive control. Moreover, as designed in the workflow, the discovered signature genes would be validated through another independently investigations of thyroid carcinomas patient’s cases in TCGA via GEIPA (Fig.
3c) and indicated these genes sensitiveness with statistical analysis.
Compared with other computational methodology, the WGCNA have unique merits, which could be robust and sensitive detection of the subset of genes co-expression as functional modules from the entire transcriptome and without pre-filtering to cause selective bias or losing useful information [
20]. It was designed to discover the networks and genes associated with phenotypes of target by using unsupervised clustering and constructing gene module. The constructed gene co-expression module consists of a group of genes that maintain a consistent expression relationship and share a common biological regulation function that independent of a priori defined gene sets or pathways [
31]. Previously, WGCNA has been successfully applied to biomarker discovery for cancer diagnostics, such as discovered microRNA expression network in prostate cancer [
32] and identified ASPM as a potential biomarker in glioblastoma [
33].
In our study, our results firstly demonstrated that it is reasonable to build on the co-expression networks with clinical traits (histopathology, age, gender) using Pearson correlation analysis (Additional file Figure
S1). To discover the related modules to histopathological phenotype, we calculated the modules statistic significantly with the linear mixed effects model for testing the association of the node to the histological phenotype. We analyzed the gene expression profile data and identified three module eigengenes (ME), blue, Green and Turquoise module, are significantly associated with histological feature of malignant thyroid cancer (Fig.
2, Additional file Figure
S4). Through the Eigengene dendrogram analysis, we discovered the most significantly hub, the blue module that contained 24 genes, related to histological feature (Fig.
2d&e).
To validate the WGCNA analysis results, we took two approaches to test signature genes positive expression and correlation. Firstly, we compared the signature genes expression level between sample group 2/3/4 and group one separately, which represent as test, negative and positive control. Gone through the results, we setup the fold change > 1.05 or fold change < 0.98 (plus
P value < 0.001) as the cutoff for statistics significant standard. As results indicated, all 24 signature genes do not significant expression difference although some genes expression with lower or higher level in the double cytology/histopathology benign case group, which defined as negative control. While 22 signature genes were significantly higher expression (FC > 1.0,
P value < 0.001), except
PPP2R2B with lower expression (FC = 0.9227,
P value = 0.0042) and
CC2D2B with higher expression (FC = 1.0725,
P value = 0.002), in the double cytology/histopathology malignant cases group, which works as positive control. These results indicated that 24 signature genes could significantly differentiate the malignancy and benign cases (positive rate = 91.67%, 22/24). For the cytology-Indeterminate/histopathology-malignant group, the 23 signature genes show significant higher expression (Fold change > 1.0,
P value < 0.001) than benchmark (positive rate = 95.83%, 23/24), while only
PPP2R2B with lower expression (
P value = 0.0039) (Table
1). Furthermore, identified 5 potential biomarker genes were all significantly higher expression (FC > 1.15,
P value < 0.0001) in both cytology-indeterminate/histopathology-malignant group and double cytology/histopathology malignant group (Fig.
3a&b). These results suggest these 5 genes have potential to be the biomarker candidate for differentiation the malignancy among the indeterminate cases. In the secondary approach, we explored these 24 genes through the cBioPortal Cancer Genomic database, which is containing many published cancer studies datasets from CCLE and TCGA [
28], and verified through 3 independent thyroid cancer investigations that contained 915 patient’s datasets. The 16 genes were matched in 37 (4.0%) of 915 patient’s cases with genetic alternation of missense mutation, amplification, deep deletion and copy- number alterations, and listed as
CC2D2B, CFH, CITED1, FN1, GOS2, GABRB2, KRT19, TENM1, PPP2R2B, PROS1, RXRG, SCEL, SERGEF, SLC34A2 and
STK32A. Some of these generic alternations were associated with papillary thyroid carcinoma metastasis to brain [
34] and could be useful as histopathological biomarkers for papillary thyroid carcinoma [
25,
35]. In addition, the queried results also discovered 5 genes with missense mutation significantly (
P value < 0.01) associated thyroid cancer cases, listed as
FN1 (R534P),
PROS1 ((K200I), (Q571K)),
SCEL (T320S),
SLC34A2(T688M) and
TENM1 (S1131F) (Fig.
4b). We compared these genes expression level between groups, and our results indicated these genes could significantly differentiate the cases of benign and malignant among cytological indeterminate cases (Fig.
3a). In addition, 5 potential biomarkers significantly higher expression in malignant cases (P value < 0.0001, Fig.
3b). Furthermore, to validate these signature genes, we blast the 5 potential biomarkers through TGCA database with other independent clinical cases (about 849 cases, Fig.
3c). These 5 genes are also significantly higher expression in thyroid cancer cases (
P value < 0.01) explored in GEPIA with thyroid cancer clinical cases data. These two validation approach made results more convincible.
Through the Genclip analysis, we found that these genes are mainly concentrate on the GO pathways that involving in physiological response of hormone and steroid hormone, and regulation of cell migration and adhesion, cell junction interaction, etc. (Table
1, Additional file Figure
S6A
). The involving pathways are significantly concentrated in subgroups of thyroid cancer, non-small cell lung cancer and cancer, process, signaling, extracellar region, and transporter activity (Additional file Figure
S6B, and Table
1). It indicates that these functions may be associated with metabolism and accelerated growth and development of obesity individuals. Notably, the results of GO enrichment analysis also provide more significant pathways with biological annotations (Table
1).
Checked with published literature, these 5 genes were reported associated with the transform progress of multiple carcinomas. Fibronectin 1 (
FN1) is a glycoprotein existing with soluble dimer or multimeric form in different conditions.
FN1 is involved in multiple cell adhesion and migration processes, including embryogenesis, wound healing, blood coagulation, host defense, and found with higher expression in metastasis [
36]. It was reported that
FN1 is over expression in the Papillary Thyroid Carcinoma [
37] and listed as potential biomarker for diagnostics.
PROS1(Protein S 1) is a vitamin K-dependent plasma protein that works as a cofactor of the anticoagulant protease. It could activate protein C (APC) and inhibit blood coagulation [
38]. The genetic mutation of this gene will result in autosomal dominant hereditary thrombophilia [
39] and malignant glioma [
40]. The
PROS1 and
FN1 others 12 genes alternations were identified as important diagnostic biomarkers for thyroid cancer through the meta-analysis the gene expression profiling of clinical thyroid nodules [
41].
SCEL (Sciellin) is the precursor to the cornified envelope of terminally differentiated keratinocytes.
SCEL is overexpressed in the papillary thyroid carcinoma and worked as key regulator in mesenchymal-to-epithelial transition and dynamically regulated through the metastasis process [
36].
SCEL was high expression in thyroid tumor tissue and significantly associated with I-131 [
42].
TENM1 (Teneurin transmembrane protein 1) is involving in pattern formation and morphogenesis [
43].
TENM1 was overexpression in thyroid cancer and associated with thyroidal invasion [
44] and identified as potential marker of papillary thyroid carcinoma progress [
36,
45].
SLC34A2 (solute carrier family 34 member 2) is a member of the
SLC34 solute carrier protein family and coded for pH-sensitive sodium-dependent phosphate transporter (NaPi2b), which is a multi-transmembrane [
46] and The physiological function of
SLC34A2 is transcellular inorganic phosphate absorption and maintenance of phosphate homeostasis [
47] and cell differentiation.
SLC34A2 is overexpressed in multiple cancer types, including lung, ovarian, and thyroid cancers [
48] and identified as potential therapeutic target for non-small cell lung and Ovarian cancer [
48]. Combined with these independent research results, it is strong suggest that
FN1 (R534P),
PROS1 ((K200I), (Q571K)),
SCEL (T320S),
SLC34A2 (T688M) and
TENM1 (S1131F) are potential novel biomarker candidates significantly associated with thyroid carcinomas and could differentiate the malignant thyroid nodule among cytological indeterminate cases. As mentioned previously, the 7-gene MT biomarkers panel was broadly recommended to evaluate the residual cytological indeterminate thyroid nodules and estimate with high specificity (~ about 90%) [
14‐
16]. However, the sensitivity of 7-gene MT biomarkers panel testing showed huge variation (from 44 to 100%) in clinical practice [
6,
17]. It suggests that there are some unknown biomarkers existing in these indeterminate cases. It is possible that our identified 5 novel biomarkers genes could contribute to enhance specificity of previously 7-gene MT biomarkers panel. The combined application of these two panel biomarkers would get more promise and precise clinical diagnostic results for nodules malignancy in some cases population.
There are some limitations and several novelties in our study. The FNA yield cytology indeterminate cases are including subtype of follicular lesion, follicular neoplasm and suspicious or malignancy. For the first limitation from this dataset, lacking of these cases histopathological information about thyroid cancer subtype sorting, such as follicular adenoma (FA), follicular carcinoma (FC) and papillary thyroid carcinoma (PTC), we could not track these potential biomarkers back to original patient’s pathological status and dig deeper insight. Secondly, the gene expression profile was limited to 172 genes for promise diagnostic assessment [
8]. It will cause to miss other genes significantly associated with malignant thyroid carcinomas by this pre-filter selection. Thirdly, due to the bioinformatics analysis nature, the discovered specific GO pathways and KEGG pathways were referred from previously literatures and did not be further investigated. Although we explored these significant genes associated with histological feature through TCGA database via cBioPortal and compared with the other two groups case in the same dataset, these potential biomarkers will be required to verify with storing patient’s cases according to subtype of thyroid cancer by immunohistochemistry (IHC) or other genetic detection method, like qPCR or sequencing in coming research work. Therefore, more number and sorting subtype patient’s cases are mandatory to verify these potential biomarkers for thyroid cancer precise diagnostics in the future cohort study. On the other side, our study has several novelties. Firstly, we applied reverse strategy by using WGCNA approach to discover the genes significantly associated with malignant histopathological feature in clinical FNA samples with cytological indeterminate feature. In parallels, compared these signature genes expression level among histopathological benign and malignant groups, the results indicated that signature genes have significant positive overexpression in Malignant groups and negative overexpression in benign group (Table
1). Secondly, we inquired the key functional & GO pathways that genes significance in module involving in the progress of thyroid carcinomas by Genclip enrichment analysis (Table
1). The results will be a clause for the next step research. Thirdly, exploring through TCGA database, we discovered 5 novel potential biomarkers to differentiate the malignant and benign thyroid nodules, which were identified as potential biomarkers of malignant thyroid cancer in previously independent researches. Furthermore, these 5 genes were validated with significant higher expression level in the TCGA thyroid cancer cases (Fig.
3c). These results are partially as evidences to support our results and research strategy.