Introduction
Gastric gastrointestinal stromal tumors (GISTs) account for 60–65% of all GISTs, followed by GIST
S of the small intestine (25–30%) and colorectal region (5%). GIST
S derive from interstitial cells of Cajal (ICC), and have a potential for malignancy [
1,
2]. Independent prognostic factors for GISTs based on the National Institutes of Health (NIH) risk category criteria include tumor size and site, mitotic count, and tumor rupture [
3]. Risk stratification is essential to identify and better define those patients with GISTs who are most likely to benefit from adjuvant imatinib therapy [
4]. High-risk GISTs are considered to require a multidisciplinary approach to improve the prognostic outcome, such as one including adjuvant therapy and surgery [
5,
6]. Therefore, it would be helpful to determine a precise preoperative risk rating to ensure appropriate adjuvant therapy and treatment for individual patients.
Abdominal contrast-enhanced computed tomography (CT) is the most commonly applied method for determining the signs of GISTs, such as calcification, hemorrhage, growth pattern, degree of enhancement, necrosis, and lymph node involvement [
7,
8]. However, the resulting subjective interpretations have inevitable limitations because of differences in reader experience and understading in the definitions of imaging features, thereby motivating researchers to seek more objective and reliable predictive approaches [
9].
Recently, the convolutional neural network (CNN) has become the typical algorithm for deep learning, and they are now widely used in the fields of diagnostic imaging, classification, and prediction in various diseases, including gastric cancer, breast cancer, and lung cancer [
10‐
12]. With advantages in accuracy, objectivity, and reproducibility, CNN models applied to imaging data can discern important predictive features that may not be detected by the naked eye [
13,
14]. Although several CNN image data models have been applied to endoscopic ultrasonography (EUS) imaging of gastrointestinal diseases, there is still a lack of research on their application to contrast-enhanced CT images of gastric GISTs [
11,
15,
16]. We considered whether a CNN-based model applied to venous phase contrast-enhanced CT would be able to predict the risk rating of gastric GISTs, and adopted a newly developed CNN called Efficient Net to build and validate predictive models for this purpose [
17].
Discussion
In this study, we present the results of a newly developed CNN model called EfficientNet_b1 that uses preoperative venous-phase CT images to predict the risk category of gastric GISTs. The findings of our study showed that CNN_layer 3/9/15 could accurately predict the risk classification of gastric GISTs in both the training dataset (all with AUROCs > 0.7) and validation dataset (all with AUROCs > 0.8), indicating that the CNN extracted suitable features for evaluating the risk in patients with gastric GISTs. To the best of our knowledge, this is the first study to report using a CNN applied to preoperative venous-phase CT images to predict the risk category of gastric GISTs, and the only study to compare the diagnostic efficacy of different CNN models obtaining upper and lower 3/9/15 layers of maximum tumor mask slice.
In China, the prognoses for GISTs are commonly stratified according to modified National Institute of Health (NIH) criteria, including size (2, 5, or 10 cm), mitotic index (< 5, 5–10, or > 10 mitoses per 50 HPFs), tumor site (gastric, small intestine, or other), and tumor rupture, because of their simplicity in clinical practice [
19].Once GISTs have intermediate- or high-risk CT features, Surgery instead of endoscopy is the preferred treatment regardless of tumor size, and the difference in risk grade is closely related to the choice of surgical plan, surgical method and patient prognosis [
20]. Therefore, accurate stratified risk assessment has important clinical reference value for the diagnosis, treatment and prognosis of patients [
21]. The requirement for a precise risk rating has become a crucial task owing to emerging adjuvant systemic treatments. Recent guidelines state that only high-risk patients should be considered for adjuvant treatment, with the suggestion for intermediate-risk patients being ‘space for shared decision-making’ [
22]. In the abovementioned risk classification, high-risk GISTs are followed up by CT every 4–6 months, whereas GISTs with very low, low, or moderate risks are followed up by CT every 6–12 months [
23]. Previous studies reported on the characteristic CT features of GISTs such as tumor size, calcification, ulcer, hemorrhage, intratumoral vessels, growth pattern, degree of enhancement, necrosis, and lymph node involvement, which may provide valuable information for predicting the risk rating of GISTs [
7,
24‐
25]. However, the interpretations of CT findings were subjective and relied on radiologists. Our results found that tumor contour, growth pattern, necrosis, surface ulceration, LN involvement, hemorrhage, intratumoral vessel, peritumoral exudation, necrosis under the tumor wall, LD, SD, and LD/SD showed significant differences between the very low/low-risk, intermediate-risk, and high-risk groups in both the training dataset and validation dataset. However, it remains difficult for radiologists to predict the risk rating of gastric GISTs using these CT features because of their low occurrence rates and non-specificity.
The convolutional neural network (CNN), an advanced machine learning method, is a neural network able to learn complicated functions mapping an input to an output with no need for manually extracted characteristics [
26‐
27]. In the field of gastrointestinal diseases, CNNs have begun to show promise for tumor detection, differential diagnosis, and risk assessment. Zhang et al. found that a CNN system based on endoscopic images showed better diagnostic performance in the detection of early gastric cancer than endoscopists with higher accuracy (85.1–91.2%) and stability [
11]. Oh et al. and Liu et al. developed CNN systems using endoscopic ultrasound images that demonstrated higher diagnostic ability for GISTs than human assessments, including higher accuracy, sensitivity, and negative predictive value [
27,
28]. A recent study reported that a deep learning machine for differentiating three risk levels of GISTs (high-risk, intermediate-risk, and low-risk GISTs) demonstrated an AUROC of 0.89 in the training dataset and 0.85 in the external validation dataset, showing better performance than a subjective model [
14]. In this study, we developed CNN models using preoperative venous-phase CT images that achieved AUROCs above 0.7 for differentiating high-risk gastric GISTs from intermediate-risk and very low/low-risk gastric GISTs in the training dataset, and above 0.8 in the validation dataset. Furthermore, the diagnostic effect was obtained using the micro average roc and macro average roc, which are more credible because of the data imbalance in this multi-classification task. The micro average roc and macro average roc of the CNN models for differentiating the three risk categories of gastric GISTs were above 0.8 in both the training and validation datasets, showing high accuracy for the risk rating on venous-phase CT images. Previous studies using radiomics models confirmed that analyses using 3D or 2D-3D hybrid CNN models could supply more relevant information on lesions than 2D images, which may enhance the accuracy of discrimination [
29‐
32]. In this study, we hypothesized that the diagnostic performance of CNN models could be affected by the tumor volume consists of different layers based on the maximum tumour mask slice, which can influence the accuracy of image segmentation. In the training dataset, the Obuchowski index was significantly higher with the CNN_layer9 and CNN_layer15 models than with the CNN_layer3 model (
P <.05), providing preliminary evidence that more layers based on the maximum slice may improve the diagnostic performance of CNN models for predicting gastric GISTs. However, this difference was not confirmed in the validation dataset. Further research with an increased sample size is required to confirm the preliminary evidence. In our analysis, we showed detailed probability distributions for every subject in the validation dataset being classified as one of the three risk classifications, and these results manifested the high-risk groups showing high probability when being diagnosed (with all probability > 0.51), which were higher than those of the very low/low and intermediate groups for these three CNN models. These results also indicate the stability of the CNN models.
Our study is subject to several limitations. First, the numbers of patients in the intermediate and high-risk groups in both the training and validation datasets were lower than in the very low/low risk groups, and all venous-phase CT images were retrospectively obtained from one of only? four centers. As a result of the small number of included patients in intermediate and high-risk groups,
larger, multicentric trials are required to confirm these results. Second, a selective bias exits because the analysis was conducted retrospectively. Third, the tumor segmentation was finished manually, rather than being fully automated. The stability of our diagnostic model needs to be confirmed when using automatic segmentation. Finally, the venous-phase CT images were obtained from a variety of CT scanners, which may have resulted in potential confounding factors.
In conclusion, we developed and validated CNN models using preoperative venous-phase CT images to predict the risk categories of gastric GISTs with high accuracy and specificity, and these have potential for assisting clinical work in the imaging diagnosis of gastric GISTs. Although the volume of the lesions in the CNN_layer3/9/15 models are different, there is no difference in the identification of the risk category of gastric GISTs.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.