Introduction
Lung cancer has one of the highest mortalities among cancer worldwide [
1], and the 5-year survival rate is less than 50% for locally advanced lung cancer with invasion of other organs or mediastinal lymph node metastasis. In contrast, the 5-year survival rate is more than 80% for stage I lung cancer, for which surgery is the initial therapy, and a long-term survival can be expected with appropriate staging and treatment [
2]. However, even stage I lung cancer is not uniformly pathologically invasive, and there are some high-grade lung cancers. Small-cell lung cancer [
3] and large-cell neuroendocrine carcinoma [
4] have a poor prognosis, as do adenocarcinoma with micropapillary component [
5]. Recurrence rates and prognoses vary, and it is necessary to provide appropriate medical treatment for each patient rather than uniform local therapy and postoperative follow-up. In terms of surgery, lobectomy is the standard procedure for primary lung cancer, but the indication of sublobar resection for small lung cancer has been verified in the large-scale clinical trials JCOG0802/WJOG4607L [
6] and CALGB140503 [
7]. The underlying reason is that sublobar resection is expected to be non-inferior to lobectomy with respect to overall survival in small lung cancers. JCOG0802/WJOG4607L showed that local recurrence was more likely with sublobar resection than lobectomy, but the overall survival was superior due to an improvement in deaths from other diseases [
6].
However, radical treatment for recurrence may require a pneumonectomy, and the burden of radiation therapy and chemotherapy is large. Therefore, the selection of procedure should depend on the risk of recurrence. Lung cancers prone to recurrence tend to have lymphovascular invasion or pleural invasion, and an increase in local recurrence has been reported in cases in which sublobar resection was performed for highly invasive cancer [
8]. Koike et al. reported that lymphatic invasion and pleural invasion were independent predictors of local recurrence in sublobar resection, with hazard ratios of 3.824 and 2.272, respectively. However, the preoperative diagnosis of invasiveness is still difficult in the clinical setting since the pathological invasiveness of lung cancer is evaluable only after scrutinizing the pathological specimen. Therefore, we explored methods for estimating the presence of invasion based solely on preoperative radiomic features.
The invasiveness of lung cancer has been conventionally evaluated based on tumour diameter and consolidation tumour ratio (CTR) on computed tomography (CT) images [
9]. Although higher CTR tends to indicate higher malignancy, CT findings do not always match the pathologic invasive findings in individual cases [
10]. In addition, many lung cancers are solid nodules without ground-glass opacity, and false-positive results occur frequently with CTR [
11]. Other than CT scan, fluorodeoxyglucose-positron emission tomography (FDG-PET) is recommended for evaluating the presence of lymph node metastases and inspecting distant metastases [
12]. FDG-PET is also useful for a qualitative evaluation reflecting the tumour metabolism. Not only the maximum standardised uptake value (SUV
max) but also the metabolic tumour volume (MTV) and total lesion glycolysis (TLG) are useful for making a diagnosis, evaluating the curative effect, and determining the prognosis [
13,
14]. Furthermore, with the recent development of radiomics, which aims to extract quantitative radiological features from medical images in a high-throughput manner for diagnostic and therapeutic applications, medical imaging modalities including FDG-PET are expected to fill new clinical roles.
The application of machine learning to radiomics has been shown to improve the predictive performance in the diagnosis and prognosis prediction [
15]. Many machine learning models with CT images have been reported in lung cancer [
16,
17], but few analyses have been performed on FDG-PET images, and no studies have predicted highly invasive lung cancer restricted to patients who had undergone surgery. Even in the reported PET image studies, the small sample size has limited the evaluation of the prediction models [
18].
In the present study, we constructed and validated a PET/CT radiomics-based machine learning model to predict pathological highly invasive lung cancer in a large cohort of patients who had undergone surgery for lung cancer. This study further analysed the predictive performance and its application to clinical practice, and compared machine learning model to the CTR on CT images. We also evaluated the performance when the histological type was limited to adenocarcinoma (Adc) or squamous cell carcinoma (Sqc) and when the tumour diameter was limited to ≤ 3 cm or ≤ 2 cm.
Discussion
To our knowledge, this is the first report on developing machine learning models for predicting highly invasive lung cancer using preoperative PET/CT. A machine learning model for predicting pathological highly invasive lung cancer was established based on radiomics features extracted from PET/CT in a large cohort. The best predictive performance was achieved by combining PET and CT features and ensemble multiple machine learning models.
Seven machine learning models were applied in this study. Gradient boosting is generally shown to perform well on tabular data [
27], but the excellent performance of deep models has been reported in recent years. Not only DNN with a few hidden layers [
28] but also models designed for tabular data have been proposed [
25]. When limited to AUC alone, ENS showed the best results, with an AUC of 0.856 (SD 0.0.183) for CT, AUC of 0.872 (SD 0.0177) for PET, and AUC of 0.880 (SD 0.0165) for PET/CT. Ensemble with unweighted simple averaging necessarily show superior results, especially when the models are of similarly high performance and low correlation [
29]. All models in this study had excellent prediction performance, and especially DNN had low correlation coefficients with the other models. When combining multiple models, selecting a deep model with a low correlation coefficient may lead to better results. On the other hand, LR showed high prediction performance in PET/CT alongside ENS, indicating that classical models could be effective even with high-throughput omics data. When the risk of comparison bias was low, LR was indicated to show no marked difference in AUC from other machine learning models [
30], so it is important to construct and combine multiple machine learning models for each analysis.
Lobectomy has long been the standard procedure for managing lung cancer. In 1995, a randomised trial compared lobectomy and sublobar resection for lung cancers with a diameter ≤ 3 cm. The results showed that sublobar resection was associated with a threefold risk of local recurrence and an increased mortality rate [
31]. However, recently, small lung cancers with a high rate of ground-glass opacity found on thin-slice CT were shown to be primarily pathological non-highly invasive lung cancers [
11]. In surgery, lung cancer with a low CTR showed a good recurrence-free survival, even with sublobar resection [
32]. For patients whose cancer was inoperable due to their age or presence of major comorbidities, stereotactic body radiotherapy (SBRT) has also been shown to be effective and to have a low recurrence rate in cases of lung cancer with a low CTR, similar to surgery [
33,
34]. However, as reported in JCOG0802/WJOG4607L, although sublobar resection was superior to lobectomy in terms of the overall survival, local recurrence was increased in the patients with a high CTR. SBRT has also been shown to increase local recurrence when the CTR is high [
34]. Regarding surgery, the risk of local recurrence increases with proximity to the surgical margins [
35,
36]. As for tumour effects, local recurrence is more common in highly invasive lung cancer than the other lung cancer [
8,
37], and even in Adc, the presence of micropapillary patterns [
38] or spread through air space (STAS) [
39] is associated with an increased risk of recurrence. However, because the evaluation of invasiveness requires assessing extant invasion into anatomic structures, it is difficult to practically diagnose pathological highly invasive lung cancer preoperatively in cases without lymph node metastasis. CT effectively predicts invasiveness, provided that the findings reflect the pathologic invasion findings [
11]. As shown by Aokage et al. [
10], although there is a strong correlation between CT findings and invasiveness, CT findings do not always match the pathologic invasive findings in individual cases.
Several reports have shown that PET findings also effectively predict pathological highly invasive lung cancer. Li et al. showed that the MTV obtained from preoperative PET/CT was an independent predictor of lymphatic invasion, with an AUC of 0.854 when multiple factors were combined in the same cohort (not validation data) [
40]. Despite the good statistical performance, however, no machine learning model has been built to predict highly invasive lung cancer based on PET/CT findings. Regarding machine learning models in lung cancer, Zhou et al. analysed PET images using the gradient boosting decision tree (GBDT) as a feature selector and classifier to differentiate between primary lung cancer and metastatic lung tumours, achieving an AUC of 0.983 [
41]. In the same study, the GBDT was similarly used as a feature selector and classifier to differentiate lung Adc from lung Sqc, achieving an AUC of 0.839. Zhang et al. predicted the presence of an EGFR mutation on pretreatment PET/CT in non-small-cell lung cancer [
42]. The least absolute shrinkage and selection operator (LASSO) algorithm was used to achieve an AUC of 0.87. A nomogram including the radiomics signature score by LASSO was also created within the same study, and the calibration plot showed consistency. Both studies had high predictive performance, and these findings along with the good results in the present study support the potential utility of machine learning models based on PET/CT images.
In our study, the ENS model was superior to the CTR both for all cases and when restricted to cStage IA cases. The accuracy was 9.9% higher for all patients and 16.6% higher for cStage IA patients than for those with a CTR cut-off of 0.5. When the cut-off was set at 1.0, the accuracy was 8.5% higher for all patients and 8.8% higher for cStage IA. The CTR tends to be prone to producing false positives, and machine learning models based on PET/CT, which show good overall performance, are valid. However, simply predicting whether or not a patient has highly invasive lung cancer is not sufficient for clinical use. While lobectomy is the standard resection approach, an essential aspect of performing sublobar resection involves minimizing false-negative results of highly invasive lung cancer. However, when a patient undergoes passive sublobar resection due to respiratory deterioration or a comorbidity, some risk of recurrence is assumed to be acceptable when the risk–benefit balance is considered.
It is important to note that the predicted probability of highly invasive lung cancer and the actual percentage of highly invasive lung cancer should approximate each other. Each patient has a different threshold for the predicted probability of highly invasive lung cancer in surgical selection. The treatment plan should be based on the individual risk–benefit balance, and a DCA that provides quantitative value criteria would be beneficial. The present study presented calibration plots and the results of a DCA for highly invasive and less-highly invasive lung cancer. The calibration plot approximated the actual percentage of highly invasive lung cancer when all cases were analysed and when cases were limited. The DCA indicated that the model was valuable over a wide range of threshold values. If the predicted probability of pathological highly invasive lung cancer could be presented for each patient, it might be useful for a more detailed consideration of therapeutic strategy.
Several limitations associated with the present study warrant mention. First, this study was a single-centre retrospective study, and the influence of bias cannot be excluded. Especially in the PET-CT analysis, it is technically complicated to analyse multiple scanners because the reference value differs for each scanner. Therefore, we limited our analysis to a single scanner. Conversely, whether or not the same level of prediction performance can be achieved for other PET images is unclear. Prediction models should be conducted for PET images of multiple scanners at multiple facilities using harmonization in the future to generalise these results. Second, this study was conducted with sublobar resection in mind. Although an increased risk of recurrence has been shown for highly invasive lung cancer, the comparison should be made with actual recurrence and overall survival rates. This study only concerns the prediction of highly invasive lung cancer and does not analyse the risk of recurrence or survival. Whether or not these models are clinically useful needs to be determined based on the postoperative course, so further analyses are warranted. A multicentre prospective analysis of machine learning model applications for the discrimination of invasiveness in preoperative lung cancer might be able to resolve this issue. Third, segmentation was performed using a method that eliminated manual segmentation as much as possible, but complete automatic segmentation was not achieved. To ensure reproducibility, it will be necessary to perform analyses based on a uniform procedure or automatic segmentation.
In conclusion, the machine learning models based on preoperative PET/CT findings accurately predicted pathological highly invasive lung cancer in the present study. In ENS, the accuracy was 81.1% for all cases and 73.6% for cStage IA, showing an improvement in accuracy of more than 8% over the CTR. This model predicts with high accuracy invasiveness that cannot be evaluated by CT alone, so it may be useful for determining treatment indication. The predicted probability and actual percentage of pathological highly invasive lung cancer were well approximated, and a DCA indicated that the models could provide validity with a wide range of thresholds in clinical analyses. Machine learning models of FDG-PET findings provided accurate information for predicting highly invasive lung cancer and may aid in the selection of surgical procedures.
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.