MSFCN-multiple supervised fully convolutional networks for the osteosarcoma segmentation of CT images

https://doi.org/10.1016/j.cmpb.2017.02.013Get rights and content

Highlights

  • It is a deep end-to-end network for medical image segmentation.

  • Multiple supervision side output layers were introduced to the network for guiding the multi-scale feature learning.

  • A large number of feature channels were used in the up-sampling portion in order to capture more context information.

  • The segmentation method achieved an average DSC of 87.80%, an average sensitivity of 86.88%, an average HM of 19.81%, and an F1-measure of 0.9080, these results are better than some existing studies.

Abstract

Background and objective

Automatic osteosarcoma tumor segmentation on computed tomography (CT) images is a challenging problem, as tumors have large spatial and structural variabilities. In this study, an automatic tumor segmentation method, which was based on a fully convolutional networks with multiple supervised side output layers (MSFCN), was presented.

Methods

Image normalization is applied as a pre-processing step for decreasing the differences among images. In the frame of the fully convolutional networks, supervised side output layers were added to three layers in order to guide the multi-scale feature learning as a contracting structure, which was then able to capture both the local and global image features. Multiple feature channels were used in the up-sampling portion to capture more context information, for the assurance of accurate segmentation of the tumor, with low contrast around the soft tissue. The results of all the side outputs were fused to determine the final boundaries of the tumors.

Results

A quantitative comparison of the 405 osteosarcoma manual segmentation results from the CT images showed that the average Dice similarity coefficient (DSC), average sensitivity, average Hammoude distance (HM) and F1-measure were 87.80%, 86.88%, 19.81% and 0.908, respectively. It was determined that, when compared with the other learning-based algorithms (for example, the fully convolution networks (FCN), U-Net method, and holistically-nested edge detection (HED) method), the MSFCN had the best performances in terms of DSC, sensitivity, HM and F1-measure.

Conclusion

The results indicated that the proposed algorithm contributed to the fast and accurate delineation of tumor boundaries, which could potentially assist doctors in making more precise treatment plans.

Introduction

Osteosarcoma is one of the most prevalent types of bone tumor, and occurs most often in children and adolescents [1]. The present standard practices for the management of osteosarcoma are a combination of neoadjuvant chemotherapy, and surgical intervention of the primary tumor [2].

The accuracy of the tumor segmentations from osteosarcoma CT images is crucial not only to the treatment planning before neoadjuvant chemotherapy, but also to the following therapeutic efficacy evaluations. The manual delineation of tumor tissue from each slice by an experienced radiologist is time-consuming and laborious. Also, the results are subjective and non-reproducible. For these reasons, an accurate automatic or semi-automatic tumor segmentation method is required. Tumor tissue segmentation from osteosarcoma CT images presents many challenges. The main difficulties can be divided into the following three aspects: (1) The specificity of the osteosarcoma. Osteosarcoma arises from bones, as well as the soft tissues of the extremities [3], which makes it difficult to identify the boundaries of tumors. Furthermore, the tumors of different patients may vary greatly in size and position, and also may have a variety of shapes and appearance properties; (2) The heterogeneity of the tumors. The grey scale and texture features are not uniform inside a tumor, and the distributions of tumor tissue necrosis are diverse. In addition, the gray differences between the tumor tissue and other normal surrounding tissues on the osteosarcoma CT images are usually very small; and (3) The diversity of the CT imaging equipment protocols. The osteosarcoma CT images are acquired from different imaging equipment, which may have variable imaging protocols. These diverse arguments may present non-ignorable differences among the images. All of these reasons cause osteosarcoma CT image segmentations to be challengeable tasks.

During the past decade, a number of methods have been proposed to effectively segment tumor regions from osteosarcoma images. Generally speaking, these can be divided into two categories: (1) Cluster-based methods [4], [5], [6]. These methods interactively chose object seed point and background seed point, and depicted the properties of each class. These methods had high computational efficiency. However, they were found to be sensitive to initialization and noise. In addition, due to the lack of object prior, they were only able to process the images with simple structures and orderly textures. (2) Traditional learning-based methods [7], [8], [9]. These methods considered the segmentation tasks as per-pixel classification tasks. They learn a pixel classification model based on the handcraft features. However, the learning-based methods had some limitations. In order to improve the accuracy of the classifier, a large number of features were required to be calculated. This caused the computations to be slow, and also costly in terms of memory. In order to make the algorithm more efficient, many techniques, such as dimensionality reduction [10] or feature selection methods [11], were employed to reduce the number of features. However, the reduction in the number of features was found to be often at the cost of reduced accuracy [12]. Therefore, limited by the handcraft features, these methods did not work well when applied to large amounts of osteosarcoma CT images, as they failed to extract the object osteosarcoma tumor regions which were characterized by complex structures and disorderly textures.

Recently, new learning-based segmentation methods, which were based on convolutional neural networks (CNN), have been introduced [13], [14], [15], [16], [17], [18]. These CNN-based methods were able to learn a hierarchy of increasingly complex features directly from patches (a local region around pixel), and predict the class label of each pixel according to the learned features [19]. Due to the fact that a CNN operates over patches using kernels, there is no need for extracting the handcraft features. Thereby, the segmentation accuracy can be significantly improved. However, these patch-based CNN are too time and memory consuming, and a large amount of redundancy exists due to overlapping patches [20]. Furthermore, the receptive field sizes are limited by the patch sizes, and only local features can be extracted. More elegant networks, referred to as fully convolutional networks (FCN) [21], have been proposed to overcome these limitations. An FCN uses a pre-trained CNN model on ImageNet to make accurate image segmentations. Some segmentation tasks based on FCN has achieved good results [20], [22]. However, these methods were found to be unable to identify some smaller object regions.

Therefore, the development of a fast and fully automatic segmentation method, which has better accuracy and uses osteosarcoma CT images, was the prime motivation behind this study. In this study, a multiple supervised fully convolutional networks (MSFCN) for the segmentations of tumor areas on osteosarcoma CT images, was presented. In the MSFCN, supervision side output layer was added to the middle hidden layers, which enabled the network to learn the rich hierarchical features directly from the images, and accurately identify the tumor regions from the osteosarcoma CT images.

Section snippets

Data acquisitions

The datasets used in this research study consisted of 2305 osteosarcoma CT images from 23 patients aged between 8 to 30 years. This dataset was split into 1900 training images and 405 testing images. The testing images were divided into two groups, based on their lesion locations. There were 109 bone lesion images, in which the tumor was located on bone, and 296 mixed lesion images, in which the tumors were located in both bone and soft tissues. All of the osteosarcoma CT images were obtained

Model initialization parameters

In this study, the first nine groups parameters of the pre-trained VGG-16 model were adopted to initialize the filters of the convolutional part of the MSFCN. The remaining hyper parameters of the MSFCN are shown in Table 1:

Data augmentation

When there is not abundant medical data to train a deep network, artificial data augmentation is a common way to generate sufficient training data. It can also teach the network the desired invariances and robustness properties when the data set is relatively small. In this

Discussion and conclusion

In this study, a novel multiple supervised fully convolutional networks method (MSFCN) for the segmentation of osteosarcoma in CT images was presented. The MSFCN displayed two advantages in identifying the boundaries of the osteosarcoma: (1) A large number of feature channels (128) were used in the up-sampling, which reserved more context information; and (2) Multiple supervision layers were introduced to guide the multi-scale feature learning, which was found to be helpful in capturing the

Acknowledgment

This work is supported by National Natural Science Foundation of China [81571772].

References (35)

  • MaJ. et al.

    Segmentation of multimodality osteosarcoma MRI with vectorial fuzzy-connectedness theory

  • ChenC.-x. et al.

    Osteosarcoma Segmentation in MRI Based on Zernike Moment and SVM

    Chin. J. Biomed. Eng.

    (2013)
  • M. Havaei et al.

    Brain tumor segmentation with deep neural networks

    Med. Image Anal.

    (2016)
  • R. Girshick et al.

    Region-based convolutional networks for accurate object detection and segmentation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2016)
  • O.Z. Kraus et al.

    Classifying and segmenting microscopy images with deep multiple instance learning

    Bioinformatics

    (2016)
  • P. Moeskops et al.

    Automatic segmentation of MR brain images with a convolutional neural network

    IEEE Trans. Med. Imaging

    (2016)
  • G. Ertas et al.

    Computerized detection of breast lesions in multi-centre and multi-instrument DCE-MR data using 3D principal component maps and template matching

    Phys. Med. Biol.

    (2011)
  • Cited by (63)

    • An advanced W-shaped network with adaptive multi-scale supervision for osteosarcoma segmentation

      2023, Biomedical Signal Processing and Control
      Citation Excerpt :

      (1) Cropping: due to the fact that the CT images in our dataset include too many background areas rather than the osteosarcoma lesions located in the leg, we crop the image to a 320 × 320 region covering the leg region completely. This cropping procedure is helpful not only for improving segmentation accuracy but also for reducing the computation burden [4]. (3) Histogram equalization: due to the small difference between pixel values in regions of osteosarcoma and normal tissues in the CT image, usually no clear boundary exists between them.

    View all citing articles on Scopus
    View full text