Erschienen in:
26.08.2022 | Research
Classification prediction of early pulmonary nodes based on weighted gene correlation network analysis and machine learning
verfasst von:
Guang Li, Meng Yang, Longke Ran, Fu Jin
Erschienen in:
Journal of Cancer Research and Clinical Oncology
|
Ausgabe 7/2023
Einloggen, um Zugang zu erhalten
Abstract
Objective
To use weighted gene correlation network analysis (WGCNA) and machine learning algorithm to predict classification of early pulmonary nodes with public databases.
Methods
The expression data and clinical data of lung cancer patients were firstly extracted from public database (GTEx and TCGA) to study the differentially expressed genes (DEGs) of lung adenocarcinoma (LUAD). The intersection of three R packages (Dseq2, Limma, EdgeR) methods were selected as candidate DEGs for further study. WGCNA was used to obtain relevant modules and key genes of lung cancer classification, GO and KEGG enrichment analysis was performed. The model was built using two machine learning methods, Least Absolute Shrinkage and Selection Operator (LASSO) regression and tumor classification was also predicted with extreme Gradient Boosting (XGBoost) algorithm.
Results
DEGs analysis revealed that there were 1306 LUAD genes. WGCNA module analysis showed that a total of 116 genes were significantly related to classification, and module genes were mainly related to 14 KEGG pathways. The machine learning algorithm identified 10 target genes by LASSO regression analysis of differential genes, and 18 genes were identified by XGBoost model. A total of 6 genes were found from the intersection of the above methods as classification signatures of early pulmonary nodules, including “HMGB3” “ARHGAP6” “TCF21” “FCN3” “COL6A6” “GOLM1”.
Conclusion
Using DEGs analysis, WGCNA method and machine learning algorithm, six gene signatures related to early stage of LUAD, which can assist clinicians in disease classification prediction.