Introduction
In February 2022, the National Center of Cancer (China) released the latest national cancer statistics [
1]: lung cancer is the number one malignant tumor in China in terms of incidence and the number one cause of cancer deaths. According to the International Agency for Research on Cancer (IARC) of the World Health Organization (WHO) [
2], the number of new lung cancer cases in 2021 was 2.2 million, ranking second only to breast cancer with 2.26 million cases; and there were 1.8 million lung cancer deaths, out of the 9.96 million cancer deaths worldwide, far exceeding the death rate of other cancers and making lung cancer the mostly deadly cancer type. It is clear that lung cancer poses a great threat to human health.
Radiation therapy is one of the main treatments for lung cancer, and about 60%-70% of lung cancer patients need to receive radiation therapy [
3]. In recent years, with the rapid development of medical imaging technology and computer technology, we have entered the era of image-guided high-precision radiotherapy in tumor radiotherapy. Precision radiotherapy is initially based on the manual outlining of the radiotherapy target area and the endangered organs by the medical professionals [
4]. The normal delineation of target areas and organs-at-risk (OARs) is a key step in tumor radiotherapy planning. In order to reduce the complications of radiotherapy and the risk of secondary malignant tumor caused by radiation, it is necessary to accurately delineate the target area and OARs. Even though there are unified principles and consensus for reference, the manual outlining of radiotherapy targets is still largely based on the experience of the practitioner [
4]. This method is highly variable and time-consuming, which has an impact on the efficacy of radiation therapy. Artificial intelligence is increasingly used in the medical field [
5], and the use of artificial intelligence techniques can provide optimized and effective decisions with minimal error, offering unparalleled advantages in improving the efficiency and consistency of target outlining in radiotherapy. Convolutional neural network (CNN) is a type of deep learning and has better results in medical image segmentation because CNN is insensitive to image noise, blur, and contrast [
6] and is currently one of the most successful algorithms to achieve image segmentation. In the field of tumor radiology, a trained CNN model, accelerated by a graphics processing unit (GPU), can achieve the task of fast segmentation of the Gross Target Volume (GTV) as well as normal tissues and organs. Rhee et al
. [
7] used CNN for automatic segmentation of the clinical target volume (CTV) of pelvic tumors on CT localized in radiotherapy and achieved a DSC of 0.86. Men et al
. [
8] used an end-to-end Deep Deconvolutional Neural Network to segment the primary lesion of nasopharyngeal carcinoma with a DSC of 0.809. Wang et al
. [
9] proposed a new patient-specific adaptive convolutional neural network (A-net), which used the weekly MRI images and the segmentation of the GTV to train this network, with a DSC of 0.82 ± 0.10. Zhang et al
. [
10] introduced a modified ResNet to segment the GTV of non-small cell lung cancer patients on the CT images, with the average DSC level of 0.73. Although deep learning based automatic segmentation techniques have rapidly applied in delineating the OARs and GTV in lung cancer radiotherapy, when it comes to automatic segmentation of the GTV in radiotherapy planning for lung cancer, the studies are still not much or deep enough, and the segmentation performance is not very well. Therefore, there is an urgent need for a method to automatically segment the GTV of lung cancer in the field of radiotherapy to improve the efficiency and accuracy of GTV outlining.
In this paper, we propose that a TransResSEUnet2.5D network can perform accurate segmentation of GTV in radiotherapy for lung cancer, and greatly save the time for segmentation.
Discussion
During the planning of radiotherapy, radiotherapists need to outline the GTV on CT images layer by layer. The quality of GTV outlining determines 60% of the overall radiotherapy effectiveness [
31]. Manual outlining by physicians is prone to introduce subjective errors and poor traceability. Therefore, a rapid and automated GTV outlining method is important for improving the overall efficiency and performance stability of radiotherapy in clinical practice. Currently, researchers have used machine learning methods to achieve automatic GTV segmentation during radiation treatment for nasopharyngeal carcinoma [
32,
33], brain tumors [
34], and breast cancer [
35]. Li et al
. [
32] used a U-net network to automatically segment the primary lesion of nasopharyngeal carcinoma with a DSC of 0.659; Cardenase et al
. [
33] used a two-channel 3D convolutional neural network for automatic segmentation of the GTV of nasopharyngeal carcinoma with a DSC accuracy of 0.75, and some later studies on automatic segmentation in nasopharyngeal carcinoma achieved a DSC accuracy of up to 0.835 [
36].Yang et al
. [
34] proposed a DCU-Net model with a DSC of 0.91 for automatic segmentation of intracranial tumors. In another study using DD- Res Net network for postoperative breast cancer also achieved a DSC of 0.91 for automatic segmentation of CTV [
35]. In lung cancer, some progress has been made. For example, Jiang et al
. [
37] proposed a multi-resolution residual connection network for lung tumor volume segmentation and showed that the DSC accuracy of automatic segmentation was 0.74; and Zhang et al
. [
38] improved the Res Net network and applied it to the GTV segmentation of non-small cell lung cancer, and the DSC accuracy of segmentation could reach 0.73. Through these studies, it can be seen that accurate automatic segmentation of the GTV for lung cancer radiation therapy can be achieved using the correct method.
In this study, we proposed a TransResSEUnet2.5D network to explore the accurate segmentation of the GTV for radiation treatment of lung cancer patients. According to the segmentation results, our proposed network segmentation is relatively effective, especially in the margins of the burr, where we automatically segmented the DSC of (84.08 ± 0.04) %. This is due to the special 2.5 D architecture of the TransResSEUnet 2.5D network. 2.5D architecture uses 2D convolutional layers, which can restore edge details in segmentation results more accurately for 2D edge feature information, to extract features in CT images, and it also uses 3D convolutional layers, which extract interlayer information in CT images by using abstract semantic features. Such a special network architecture is compatible with the advantages of both 2D and 3D convolutional layers. Compared with the simplest Unet2D DSC (77.07 ± 0.09) %, 7% higher, which also fully demonstrates the advancement of our research work. The lack of statistically significant differences may be due to the small sample size, but in later experiments the sample size can be expanded this year to explore the statistical significance.
In the test set of automatic segmentation of GTV in 20 lung cancer patients, the DSC metric of the TransResSEUnet2.5D network was higher than all other five network models, and the variance was smaller. It indicates that the automatic segmentation effect of TransResSEUnet2.5D is more stable and the generalization performance of the model is better. HD95 is a measure of the degree of distortion of the segmentation results, and its magnitude is influenced by the number of outlier points [
30]. Through statistical analysis, TransResSEUnet2.5D segmented images with greater continuity and produced fewer outliers in 20 test set patients, and the HD95 metric was superior to other network models. Currently, HD95 is in the range of 7.19- 9.35 mm in most studies [
39]. In our study, the HD95 of TransResSEUnet2.5D was (8.11 ± 3.43) mm, and it was better than the other models, to some certain, but there was no statistical difference, which might be due to the small sample size. The study by Cui et al
. [
40] used DVNs network to automatically segment lung tumors with DSC of 83.2% and HD95 of 4.57 mm; hence the index of HD95 was superior to our study. One possible reason lies in the differences of the CT thickness between our studies. All patients in our study were treated with IMRT and the layer thickness of their CT was 5 mm, whereas Cui et al
. [
40] studied non-small cell patients treated with SBRT and the layer thickness of their scanned CT was 2 mm or 3.3 mm. Besides, lung cancer patients treated with SBRT had smaller tumors (in the Cui et al
. [
40] study, the GTV mean effective diameter of GTV was 11.039 mm). Nonetheless, the result differences between our studies suggest that there is a need to further improve the segmentation accuracy of our proposed network by regulating the parameters and the depth of iteration during automatic segmentation training. In addition, the amount of data in this study is still relatively small, especially lacking multicenter data, which will affect the robustness of the segmentation model. These are some issues that require further attention in our follow-up research.
The TransResSEUnet 2.5D network proposed by this research achieves the clinical applicability requirements in the indicators of GTV automatic segmentation of DSC and HD95 for lung cancer radiotherapy patients, and also greatly improves the efficiency of radiotherapy delineation. It has been reported that radiotherapy targets for lung cancer are manually delineated by experienced radiotherapy physicians, which took nearly 32 min [
41]. Ermiş et al
. automatically segmented the target area of one glioma patient based on deep learning methods, which took about 10 s [
42]. In our study, the automatic segmentation time of GTV for each lung cancer patient was shortened to less than 8 s, about (6.50 ± 1.31) s. Great progress has been made while ensuring accuracy. TransResSEUnet 2.5D network prediction time is longer than Unet 2.5D (p = 0.000), Unet3D (p = 0.000), and ResSEUnet 2.5D (p = 0.001). This might be due to the fact that the TransResSEUnet 2.5D network adds Transformer's modules to the ResSEUnet2.5D, making the model more complex, with more parameters and longer natural prediction times. When the segmentation accuracy is not high, even if the prediction time is short, this is also not clinically meaningful. Therefore, the TransResSEUnet 2.5D network we proposed is of clinical significance.
In summary, on the automatic GTV segmentation task for radiation treatment of lung cancer patients, the TransResSEUnet2.5D network that we proposed can effectively prevent the occurrence of overfitting even when the training set is not large enough, and it effectively mitigates the vanishing gradient problem by repeatedly utilizing the feature maps of different layers during the training process—providing a new method for medical image segmentation.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.