Automatic 3D liver segmentation based on deep learning and globally optimized surface evolution

Peijun Hu; Fa Wu; Jialin Peng; Ping Liang; Dexing Kong

doi:10.1088/1361-6560/61/24/8676

1. Introduction

Liver cancer has been one of the most frequently diagnosed cancers with high mortality and poor prognosis all over the world (Torre et al 2015). Accurate detection and delineation of the liver from 3D computed tomography (CT) images are fundamental steps in many clinical treatments, such as liver resection, transplantation and radiotherapy treatment planning. Although manually delineating the liver boundaries by experienced radiologic technologists in a slice-by-slice manner gives accurate segment results, it is time-consuming and subjective, with high intra- and inter-observer variability. Therefore, semiautomatic or automatic liver segmentation is very desirable and meaningful in clinical applications.

However, there are several challenges in accurate computer-aided liver segmentation. Firstly, low contrast between liver and surrounding tissues makes the liver boundaries fuzzy and difficult to detect. Secondly, liver pathologies (e.g. liver tumors) and high-intensity intrahepatic veins usually lead to complicated intensity distributions and heterogeneous appearances. In these cases, leakage to surrounding tissues and under-segmentation of abnormal liver regions often occur in approaches depending on only intensity-based information. Although shape priors can be utilized to separate adjacent organs and preserve intrahepatic tissues, the highly varied liver shapes and sizes among different individuals also make it challenging for shape-prior-based methods to address the task. Figure 1 shows why liver segmentation is a challenging task through several typical cases.

**Figure 1.** Examples illustrating challenges for accurate liver segmentation from CT images: (a) the presence of pathological abnormalities and fuzzy boundary between the liver and the stomach; (b) low-density liver tumor and fuzzy boundary between the liver and adjacent tissues; (c) large variations in liver shapes.
Download figure:
Standard image High-resolution image

In the last few decades, a variety of approaches have been proposed to segment the liver from CT images. A comprehensive review of different techniques with their advantages and disadvantages can be found in Campadelli et al (2009). Heimann et al (2009) represented a comparison study between different kinds of methods based on the results for a public database from the 'MICCAI 2007 Grand Challenge' workshop. Generally, current liver segmentation methods can be classified into two categories, i.e. image-based and prior model-based approaches according to whether prior knowledge was used.

Image-based methods are mainly based on low-level image information, such as intensity, gradient and other low-level features. Typical methods in this category include region growing (Rusko et al 2007), thresholding (Seo et al 2005), level-set-based methods (Oliveira et al 2011, Li et al 2015a), graph-cut-based methods (Afifi and Nakaguchi 2012, Peng et al 2015) and so on. The major challenges for these gray-level information-based methods are to prevent the segmentation from leaking to organs/tissues with similar intensities or appearances around the liver and avoid under-segmentation of inhomogeneous liver regions. Automatic methods (Song et al 2013) usually initialize the liver contour through thresholding and morphological operators. But the thresholds will affect the result directly and are hard to determine. In contrast, semiautomatic methods with user interaction or initialization often achieve better performance. Maklad et al (2013) proposed a novel semiautomatic method that used abdominal blood vessels to segment the liver from portal phase CT images. Peng et al (2014a) introduced a variational energy method that combined intensity, regional appearance, and surface smoothness to deal with fuzzy boundaries and heterogeneous backgrounds. Seed constraints, both in the foreground and background, were used in the constrained convex variational model (Peng et al 2014b). Although some of the aforementioned semiautomatic image-based approaches have achieved promising performance, a drawback of these methods is that they need user initialization or further interactive refinement and may be sensitive to initial contours/surfaces.

Model-based methods are more robust by utilizing the typical shape of a liver to constrain the segmentation process. Currently, models with prior information are usually based on statistical shape models (SSMs) (Heimann and Meinzer 2009, Zhang et al 2010, Li et al 2015b) and atlas (Juan Eugenio Iglesias 2015). A challenge for SSM-based methods is addressing the large shape variations with limited training data. Therefore, SSMs and their extensions used for automatic liver segmentation are often combined with a deformable model (Ling et al 2008), constrained free-form (Kainmüller et al 2007), template-matching algorithm (Saddi et al 2007) or incorporated into a level-set framework (Wimmer et al 2009) to capture shape variations that are not present in the training set. To better model the complex shape variations, sparse shape composition (SSC) (Shi et al 2015, Wang et al 2015) and dictionary learning (Al-Shaikhli et al 2015) were introduced to represent liver shapes. Atlas-based methods (van Rikxoort et al 2007, Linguraru et al 2010) start by registrating a single atlas or multiple atlases into the target image and then deform the atlas-labeled image by label propagation or label fusion. However, the segmentation results heavily depend on the registration process, which is difficult due to the variability in liver shapes. In addition, computational complexity, atlas selection and fusion are challenges which need to be overcome. Recent works Dong et al (2015) and Platero and Tobar (2014) are attempting to address some of these problems.

Recently, deep convolution neural networks (CNNs) (LeCun et al 1998, Krizhevsky et al 2012, Long et al 2015), one type of deep learning model, have shown promising results in medical image segmentation. As a type of multi-layer, fully trainable model, CNNs can capture the hierarchy of features from low-level to high-level based on the raw image. Moreover, the spatial information is encoded in the extracted features. Several works have applied CNNs in knee cartilage segmentation (Prasoon et al 2013), infant brain image segmentation (Zhang et al 2015) and pancreas segmentation (Roth et al 2015). For liver segmentation, Lu et al (2015) proposed a method (called '3D CNN-GC') that combined 3D fully CNNs and graph cuts to achieve automatic segmentation in CT images. The trained CNN generated a probability map of the liver and then the learned information was integrated into the image data penalty term of graph cuts. Compared to model-based methods, this method is advantageous as it can automatically produce a subject-specific prior without complex shape position initialization, registration or shape deformation. However, the prior probability map was only partly utilized. Thus, the method may still suffer from the limitations of image-based methods, especially for cases with appearance heterogeneity due to the presence of intrahepatic veins and liver pathologies. In addition, the results of 3D CNN-GC may be over-constrained by the prior probability map.

To address the aforementioned problems, we propose a novel automatic liver segmentation method based on deep 3D convolutional neural network and globally optimized surface evolution. In our method, a CNN is firsly trained to learn a subject-specific probabilistic map of the liver, which automatically detects the liver. The probability map is thresholded to provide both initial segmentation and shape prior for the following segmentation step. Then, the prior is effectively incorporated into a novel energy-functional model to accurately delineate the liver surface. The key contributions are twofold. First, a new data term in the model is proposed to adaptively incorporate both global and local statistical information from the initial segmentation. In the healthy liver region, a global estimation is used to learn the intensity distribution and region appearance, whereas in the abnormal liver region, a local nonparametric estimation (Brox and Cremers 2009) helps capture the abnormal liver information. With this strategy, the model can not only successfully deal with healthy livers, but can also handle unhealthy livers with heterogenous appearances and large shape variations. Second, a global optimization-based method (Yuan et al 2012) is employed to minimize the proposed energy function, which is able to propagate the surface to its optimal position in each iteration. We validated the proposed method on 42 CT images from two separate clinical databases. The results show that the method is accurate and effective for clinical usage.

2. Materials and methods

In this section, we present the imaging data used in this study and the proposed automatic liver segmentation framework, which consists of offline training and runtime testing stages. In the training stage, an eleven-layer 3D CNN is trained using labeled CT images. In the testing stage, given a testing image, a probability map of the liver is learned by the trained CNN. Then, the probability map is thresholded to provide both initial segmentation and shape prior for the subsequent fine segmentation step. The proposed model precisely segments the liver based on a set of prior information, including spatial location of the shape prior, the liver probability map, intensity distribution and region appearances. Considering the appearance inhomogeneity and large shape variations of the liver, global and local statistical information from the initial segmentation are adaptively incorporated into the energy function. Finally, the energy function is minimized using a global optimization-based approach to propagate the initial surface to the optimal position. Figure 2 illustrates an example of testing a CT image with the proposed method.

**Figure 2.** Testing stage of the proposed liver segmentation framework. From left to right, for an input CT image (a), a probability map (b) of the liver is predicted by the 3D CNN. Then, an initial segmentation (yellow) (c) is obtained through thresholding. Finally, the initial surface propagates to the optimal surface (green) (d). Manual segmentation (red) in (c) and (d) acts as a reference for evaluating the segmentation performance.
Download figure:
Standard image High-resolution image

2.1. Clinical datasets

In our experiments, 151 abdominal CT volumes from two databases were used for model training and testing. The first database is a public one from the 'MICCAI 2007 grand challenge workshop' (Sliver07) (Heimann et al 2009). It includes 20 volumes (Sliver07-I) with ground truth and 10 volumes (Sliver07-II) without available ground truth for participants. The evaluations on Sliver07-II data were performed online by the organizers of the Sliver07 website (http://sliver07.org). Most subjects in Sliver07 are pathologic, including metastases, tumors and cysts of different sizes. All of the images are contrast-enhanced in the central venous phase using a variety of different CT scanners. The images have axial dimensions of 512 by 512 with slice numbers varying from 64 to 502. The pixel spacing varies from 0.55 to 0.80 mm, and the slice distance varies from 1.00 to 3.00 mm. The second dataset consists of 121 volumes from local hospitals, including (late) arterial phase and portal-venous images. The dimensions for each axial slice of the images are $512\times 512$ , and slice numbers are from 79 to 304, with pixel spacing varying from 0.50 to 1.00 mm, and slice distance from 1.00 to 2.50 mm.

In this study, 109 CT volumes from local hospitals were randomly selected for training the CNN. The corresponding segmentation labels were obtained by trained technicians with the semiautomatic liver segmentation tool (Peng et al 2014b), and then the results were approved and revised by experienced radiologists. The testing data consisted of the remaining 12 volumes from local hospitals and all the 30 volumes in the Sliver07 database. In particular, there were three (late) arterial-phase and nine portal-venous images in the 12 non-public testing images, of which the reference segmentations were delineated by radiologists in a transversal slice-by-slice fashion.

2.2. Deep 3D CNN training

In this study, a deep 3D convolutional neural network is designed and trained to automatically detect the liver. The network predicts a probability map as a subject-specific prior, which assigns each voxel the likelihood of being the liver for the target image. The primary benefit of a deep CNN is its powerful feature-learning ability. As the main blocks of CNNs, convolutional layers and pooling layers are applied alternatively on the input image. Each layer takes as input the output of the preceding layer and thus builds a hierarchy of increasingly complex features.

Inspired by the work of Lu et al (2015), we adopt a similar CNN architecture with several improvements for the liver detection task. Specifically, we enlarge the network in intermediate layers, use graphics processing unit (GPU) support and adopt new development in activation functions. The detailed architecture of the CNN is described in table 1. As can be seen, the network takes a cropped target image of size $496\times 496\times 279$ as input and outputs a probability map of size $496\times 496\times 256$ , with values outside this block in the original image set to 0. Components of the network include eleven convolutional layers, two average pooling layers and three Doublesize layers (Lu et al 2015). More specifically, a recently proposed nonlinear activation function, i.e. a parametric rectified linear unit (PReLU) (He et al 2015) is applied after the convolution operation in each convolutional layer. The PReLU can improve model fitting with nearly zero computational cost and little overfitting risk. To make the computation faster and satisfy the memory limit, the network is spread across four GPUs with a parallelization scheme (Krizhevsky et al 2012) from layer Conv₃ to layer Conv₇. Finally, a logistic regression is performed for each voxel, which predicts probability for 'liver' and 'non-liver'.

Table 1. Detailed architecture of the 3D CNN. 'Conv', 'Norm', 'Sum' denote convolutional layer, normalization and summation, respectively.

Layer	Input	Filter	Padding	Stride	Output
Conv₁ → Norm	$496\times 496\times 279,1$	$7\times 7\times 9,96$	$3\times 3\times 0$	$2\times 2\times 2$	$248\times 248\times 136,96$
Pooling₁	$248\times 248\times 136,96$	$2\times 2\times 2$	$0\times 0\times 0$	$2\times 2\times 2$	$124\times 124\times 68,96$
Conv₂	$124\times 124\times 68,96$	$5\times 5\times 5,256$	$2\times 2\times 0$	$2\times 2\times 1$	$62\times 62\times 64,256$
Pooling₂	$62\times 62\times 64,256$	$2\times 2\times 2$	$0\times 0\times 0$	$2\times 2\times 2$	$31\times 31\times 32,256$
Conv₃	$31\times 31\times 32,256$	$3\times 3\times 3,2048/4$	$1\times 1\times 1$	$1\times 1\times 1$	$31\times 31\times 32,2048/4$
Conv₄	$31\times 31\times 32,2048/4$	$3\times 3\times 3,2048/4$	$1\times 1\times 1$	$1\times 1\times 1$	$31\times 31\times 32,2048/4$
Conv₅	$31\times 31\times 32,2048/4$	$3\times 3\times 3,2048/4$	$1\times 1\times 1$	$1\times 1\times 1$	$31\times 31\times 32,2048/4$
Conv₆	$31\times 31\times 32,2048/4$	$3\times 3\times 3,2048/4$	$1\times 1\times 1$	$1\times 1\times 1$	$31\times 31\times 32,2048/4$
Conv₇ → Sum	$31\times 31\times 32,2048/4$	$3\times 3\times 3,2048/4$	$1\times 1\times 1$	$1\times 1\times 1$	$31\times 31\times 32,512$
Doublesize₁	$31\times 31\times 32,512$	—	—	—	$62\times 62\times 64,64$
Conv₈	$62\times 62\times 64,64$	$3\times 3\times 3,512$	$1\times 1\times 1$	$1\times 1\times 1$	$62\times 62\times 64,512$
Doublesize₂	$62\times 62\times 64,512$	—	—	—	$124\times 124\times 128,64$
Conv₉	$124\times 124\times 128,64$	$3\times 3\times 3,128$	$1\times 1\times 1$	$1\times 1\times 1$	$124\times 124\times 128,128$
Doublesize₃	$124\times 124\times 128,128$	—	—	—	$248\times 248\times 256,16$
Conv₁₀	$248\times 248\times 256,16$	$3\times 3\times 3,16$	$1\times 1\times 1$	$1\times 1\times 1$	$248\times 248\times 256,16$
Conv₁₁ → Logistic	$248\times 248\times 256,16$	$3\times 3\times 3,1$	$1\times 1\times 1$	$1\times 1\times 1$	$248\times 248\times 256,1$
Upsampling	$248\times 248\times 256,1$	—	—	—	$496\times 496\times 256,1$

In the training phase, the network used prepared training CT images with corresponding segmentation labels. During training, learnable weights including convolutional filters, biases and PReLU parameters were updated by minimizing a cross-entropy loss function with weight decay, which was added to avoid overfitting. The loss function was minimized using the backpropagation algorithm.

2.3. Initial liver segmentation by the 3D CNN

Given a volume image $I:\mathbf{x}\in \Omega \to R$ defined on the domain $\Omega \subset {{R}^{3}}$ , we denote $u\left(\mathbf{x}\right)\in \left\{0,1\right\}$ as the indicator function of the estimated region, where 1 and 0 stand for the foreground (liver region) and background (non-liver region), respectively.

During testing, the trained CNN performs classification for each voxel and generates a probability map denoted as $L\left(\mathbf{x}\right),\mathbf{x}\in \Omega$ . Subsequently, a rough segmentation is obtained through thresholding as follows:

$\begin{eqnarray}{{u}_{\text{ref}}}\left(\mathbf{x}\right):=\left\{\begin{array}{*{35}{l}} 1, & L\left(\mathbf{x}\right)>t \\ 0, & \text{otherwise} \end{array},\mathbf{x}\in \Omega,\right. \end{eqnarray} \tag{ 1 }$

where t > 0 is a constant taken as the threshold. Although the initial segmentation can locate the liver accurately and predict rough liver shapes, part of the liver detection results may contain severe over- or under-segmentation (see figure 3). In addition, the initial segmentation is not accurate around the liver surface. Therefore, a novel segmentation model is proposed to accurately delineate the liver surface.

**Figure 3.** Examples showing bad and good initial segmentations by the 3D CNN (first row and second row, respectively). From left to right in each row: (a) one CT slice image; (b) the probability map of the liver; (c) initial segmentation (green) based on the probability map with threshold of 0.5 and manual segmentation (red); and (d) initial segmentation (green) and manual segmentation (red) in 3D view.
Download figure:
Standard image High-resolution image

2.4. Energy model for the liver segmentation using shape prior

In this section, a novel energy function based on global and local statistics from prior information is proposed to refine the initial segmentation. The energy function is formulated in a hybrid model that integrates region statistics, shape prior constraint as well as a gradient-edge map. The indicator function $u\left(\mathbf{x}\right)$ of the estimated liver region is minimized over the following energy functional:

$\begin{eqnarray}&&\underset{u\left(\mathbf{x}\right)\in \left\{0,1\right\}}{{\min}}\,E(u)={{\lambda}_{1}}{{E}_{\text{data}}}(u)+{{\lambda}_{2}}{{E}_{\text{prior}}}(u)+\lambda {\int}_{ \Omega }g\left(\mathbf{x}\right)|\nabla u|\text{d}\mathbf{x},\end{eqnarray} \tag{ 2 }$

where ${{E}_{\text{data}}}(u)$ formulates the image statistics inside and outside the liver region, the second term encodes the shape prior, and the last weighted total variation term acts as a boundary regularization term. Here, the weight function $g\left(\mathbf{x}\right)$ is given by $g\left(\mathbf{x}\right)=1/\left(1+\beta |\nabla I\left(\mathbf{x}\right){{|}^{2}}\right)$ , where β is a positive constant and fixed to 0.2 in our experiments. Note the values of $g\left(\mathbf{x}\right)$ fall within the range [0,1] and $g\left(\mathbf{x}\right)$ is an edge indicator that vanishes at sharp object boundaries. Spatially varying weights ${{\lambda}_{1}}\left(\mathbf{x}\right)$ , ${{\lambda}_{2}}\left(\mathbf{x}\right)$ and constant $\lambda >0$ are used to adaptively balance the three terms. Specifically, we set ${{\lambda}_{1}}\left(\mathbf{x}\right)={{\alpha}_{1}}g\left(\mathbf{x}\right)$ and ${{\lambda}_{2}}\left(\mathbf{x}\right)={{\alpha}_{2}}g\left(\mathbf{x}\right)$ , where ${{\alpha}_{1,2}}>0$ are constants. This makes the model selectively act as edge-based or region-based model in different regions.

To constrain the model with the shape prior, we represent the prior probability map by negative log-likelihood in equation (3), where $L\left(\mathbf{x}\right)$ and $1-L\left(\mathbf{x}\right)$ denote the probability of voxel $\mathbf{x}$ belonging to the liver and background, respectively. The negative log-likelihood map describes the confidence that a certain voxel belongs to the liver or to the background. The lower the value of $-\log L\left(\mathbf{x}\right)$ is, the more likely $\mathbf{x}$ belongs to the liver, and vice versa. With the adaptive weight, the shape prior term is formed as

$\begin{eqnarray}&&{{\lambda}_{2}}{{E}_{\text{prior}}}(u)=-{{\alpha}_{2}}{\int}_{ \Omega }g\left(\mathbf{x}\right)\left[u\log L\left(\mathbf{x}\right)+(1-u)\log \left(1-L\left(\mathbf{x}\right)\right)\right]\text{d}\mathbf{x}.\end{eqnarray} \tag{ 3 }$

The data term ${{E}_{\text{data}}}(u)$ incorporates two kinds of image statistics, i.e. intensity distribution and region appearance. We define ${{p}_{\text{in}}}\left(\mathbf{x}\right):=p\left(I\left(\mathbf{x}\right)|u\left(\mathbf{x}\right)=1\right)$ and ${{p}_{\text{out}}}\left(\mathbf{x}\right):=p\left(I\left(\mathbf{x}\right)|u\left(\mathbf{x}\right)=0\right)$ as the probability density functions (PDFs) of a voxel $\mathbf{x}$ with intensity value $I\left(\mathbf{x}\right)$ in the regions inside and outside the liver, respectively. In the framework of Bayesian inference, one seeks a segmentation u through maximizing the posteriori probability given the image I, as equation (4) shows. Thus, the negative logarithm of the estimated PDFs can be used as a powerful data term. In addition, a region appearance distance potential (Peng et al 2014a) $\mathcal{P}\left(\mathbf{x}\right)$ is introduced to capture edges weak in gradient. The appearance distance reflects the similarity extent of the local appearance at a certain voxel to the liver region appearance. The smaller $\mathcal{P}\left(\mathbf{x}\right)$ is, the more $\mathbf{x}$ is likely to be in the liver. Combining the intensity distribution and region appearance with a balance weight γ, the data term is formulated as in equation (5).

$\begin{eqnarray}&&\underset{u}{{\arg \max}}\,p(u|I)=\underset{u}{{\arg \max}}\,p(I|u)p(u).\end{eqnarray} \tag{ 4 }$

$\begin{eqnarray}&&{{\lambda}_{1}}{{E}_{\text{data}}}(u)=-{{\alpha}_{1}}{{{\int}^{}}_{ \Omega }}g\left(\mathbf{x}\right)\left[u\log {{p}_{\text{in}}}+(1-u)\log {{p}_{\text{out}}}-\gamma \mathcal{P}\,u\right]\text{d}\mathbf{x}.\end{eqnarray} \tag{ 5 }$

From the initial segmentation u_ref, we can estimate p_in, p_out and calculate the region appearance distance $\mathcal{P}\left(\mathbf{x}\right)$ . However, pathologic livers often exhibit multiple intensity distributions and region appearances, for which global estimation may result in inaccuracy. In abnormal liver regions, such as low-intensity tumors, the intensity distribution and region appearance are often different from global statistics of healthy liver tissues. To address this issue, spatial location is taken into consideration. In sections 2.4.1 and 2.4.2, we first introduce details about data term formulation using global prior information. Then, we describe estimation using a local prior that considers the spatial location. In our work, the data term adaptively utilizes global and local prior without separating the liver region.

2.4.1. Global-prior-based data term.

We first estimate probability density functions of foreground and background from the shape prior u_ref, globally. Since the shape prior gives most of the liver region, but is not accurate around the liver surface, it is shrunken by 5 voxels based on its signed distance function (SDF). We denote the inside region of the shrunken shape prior as the initial liver region ${{ \Omega }_{\text{ref}}}$ . The intensity histogram of ${{ \Omega }_{\text{ref}}}$ is used to calculate the global probability density function $p_{\text{in}}^{\text{global}}\left(\mathbf{x}\right)$ for the foreground. Similarly, the shape prior is dilated by 5 pixels using its SDF, and the voxels outside the dilated shape prior are used to estimate ${{p}_{\text{out}}}\left(\mathbf{x}\right)$ . Note that only a narrow band outside the dilated shape prior is used for computation, since the far away region in the complex background provides little information for the neighboring tissues of the liver.

Three features $f\left(\mathbf{x}\right)=\left(I\left(\mathbf{x}\right),\text{LBP}\left(\mathbf{x}\right),\text{VAR}\left(\mathbf{x}\right)\right)$ , i.e. intensity, local binary pattern (LBP) (Ojala et al 2002) and local variance (VAR) are combined for region appearance description as defined in Peng et al (2014a). Define $P_{\mathbf{x}}^{i}\left(i=1,2,3\right)$ as the histogram of ith feature ${{f}^{i}}\left(\mathbf{x}\right)$ inside the 3D local window $O\left(\mathbf{x}\right)$ and ${{P}_{\mathbf{x}}}=\left(P_{\mathbf{x}}^{1},P_{\mathbf{x}}^{2},P_{\mathbf{x}}^{3}\right)$ as the local region appearance of $\mathbf{x}$ . Given the initial liver region ${{ \Omega }_{\text{ref}}}$ , the liver region appearance ${{P}_{\text{ref}}}=\left(P_{\text{ref}}^{1},P_{\text{ref}}^{2},P_{\text{ref}}^{3}\right)$ is estimated. Then, the global estimation of region appearance distance potential $\mathcal{P}\left(\centerdot \right)$ of $\mathbf{x}$ is calculated as follows, where W¹ denotes the L₁ Wasserstein distance:

$\begin{eqnarray}&&{{\mathcal{P}}^{\text{global}}}\left(\mathbf{x}\right)=\underset{i=1}{\overset{3}{\sum}}\,{{W}^{1}}\left(P_{\mathbf{x}}^{i},P_{\text{ref}}^{i}\right),\mathbf{x}\in \Omega.\end{eqnarray} \tag{ 6 }$

2.4.2. Local-prior-based data term.

Global statistics for intensity and appearance distance estimation are not accurate in abnormal liver tissues. For example, in liver tumors or intrahepatic veins, the probability density values of the foreground will be very low and the region appearance distance potential will be large. Thus, under-segmentations will occur in these regions (see figures 4(a)–(d)).

**Figure 4.** An example illustrating the effect of a local prior. (a) A 2D transversal slice with initial contour (yellow) and one selected voxel (cyan point) with a window centered in it (cyan box); (b) intensity histogram of the initial liver region; (c) the probability map p_in estimated from (b) globally; (d) the final segmentation with global prior; (e) zoom in of (a); (f) intensity histogram of the local region of the selected voxel; (g) the probability map p_in estimated locally; (h) the final segmentation with both global and local prior.
Download figure:
Standard image High-resolution image

In this section, we introduce the data term estimation using local prior information. The idea behind our model is to incorporate spatial location of the shape prior under the assumption that there is generally a different probability density at each voxel in the image. For homogeneous healthy liver regions, the intensity distribution can be well estimated from the global prior. But in abnormal liver regions, a local intensity distribution estimation is more suitable. The task necessary is to find out the abnormal liver region. Since the region ${{ \Omega }_{\text{ref}}}$ gives a good estimation of the liver, voxels with low values of p_in inside or around the initial liver surface are candidates to be the abnormal liver region. Suppose the abnormal liver region candidate is S_c and the local window centered at voxel $\mathbf{x}$ is $W\left(\mathbf{x}\right)$ . S_c can be defined as ${{S}_{c}}:=\left\{\mathbf{x}\in \Omega |\,p_{\text{in}}^{\text{global}}\left(\mathbf{x}\right)<m-svar,W\left(\mathbf{x}\right){\cap}^{}{{ \Omega }_{\text{ref}}}\ne \varnothing \right\}$ , where m and svar are the mean and standard deviation value of $p_{\text{in}}^{\text{global}}$ over region ${{ \Omega }_{\text{ref}}}$ , respectively.

To locally estimate intensity distribution, we employ a local nonparametric model (Brox and Cremers 2009) via a kernel density estimator based on the Parzen method (Parzen 1962). Specifically, for a voxel candidate $\mathbf{x}\in {{S}_{c}}$ , the probability density function is estimated in the local region $R\left(\mathbf{x}\right):=W\left(\mathbf{x}\right){\bigcap}^{}{{ \Omega }_{\text{ref}}}$ with an adaptive region-based Parzen density model:

$\begin{eqnarray}&&p_{\text{in}}^{\text{local}}\left(\mathbf{x}\right):=\underset{\zeta ={{I}_{\text{min}}}}{\overset{I\text{max}}{\sum}}\,\frac{h\left(\zeta \right)}{H}{{K}_{\eta}}\left(I\left(\mathbf{x}\right)-\zeta \right),\text{where}H=\underset{\zeta ={{I}_{\text{min}}}}{\overset{I\text{max}}{\sum}}\,h\left(\zeta \right),\mathbf{x}\in {{S}_{c}},\end{eqnarray} \tag{ 7 }$

where $h\left(\zeta \right)$ denotes the intensity histogram of $R\left(\mathbf{x}\right)$ , $\zeta \in \left[{{I}_{\text{min}}},{{I}_{\text{max}}}\right]$ is the observed intensity. ${{K}_{\eta}}$ is the Gaussian kernel with width η.

In abnormal liver regions, the estimated local probability density value is higher than the global one. Thus, we denote $S:=\left\{\mathbf{x}\in {{S}_{c}}|\,p_{\text{in}}^{\text{local}}\left(\mathbf{x}\right)>p_{\text{in}}^{\text{global}}\left(\mathbf{x}\right)\right\}$ as the abnormal liver region. The local prior is only employed in region S to estimate the probability density function of the foreground and region appearance distance. Since region appearance distance is used for separating the liver from surrounding tissues with similar intensity, we set its value to 0 in region S. Thus, the data term is estimated selectively using a global or local prior as follows:

$\begin{eqnarray}{{p}_{\text{in}}}\left(\mathbf{x}\right)=\left\{\begin{array}{*{35}{l}} p_{\text{in}}^{\text{local}}\left(\mathbf{x}\right), & \mathbf{x}\in S \\ p_{\text{in}}^{\text{global}}\left(\mathbf{x}\right), & \text{otherwise} \end{array}.\right. \end{eqnarray} \tag{ 8 }$

$\begin{eqnarray}\mathcal{P}\left(\mathbf{x}\right)=\left\{\begin{array}{*{35}{l}} 0, & \mathbf{x}\in S \\ {{\mathcal{P}}^{\text{global}}}\left(\mathbf{x}\right), & \text{otherwise} \end{array}.\right. \end{eqnarray} \tag{ 9 }$

To normalize, $\mathcal{P}\left(\mathbf{x}\right)$ is adjusted to $|\mathcal{P}\left(\mathbf{x}\right)-{{\mu}_{\text{ref}}}|/{{\tau}_{\text{ref}}}$ , where ${{\mu}_{\text{ref}}}$ and ${{\tau}_{\text{ref}}}$ are the mean and standard deviation value over ${{ \Omega }_{\text{ref}}}$ . Figure 4 illustrates an example that compares the estimation using global and local prior information.

2.5. Global optimization-based surface evolution

Integrating equations (3), (5) into (2) and rearranging it, we obtain the proposed model rewritten as

$\begin{eqnarray}\begin{array}{*{35}{l}} \underset{u\in \left\{0,1\right\}}{{\min}}\,E(u)= & -{{{\int}^{}}_{ \Omega }}g\left(\mathbf{x}\right)\left\{{{\alpha}_{1}}\left(\log {{p}_{\text{in}}}-\log {{p}_{\text{out}}}-\gamma \mathcal{P}\right)+{{\alpha}_{2}}\left[\log L-\log (1-L)\right]\right\}u\text{d}\mathbf{x} \\ {} & +\,\lambda {{{\int}^{}}_{ \Omega }}g\left(\mathbf{x}\right)|\nabla u|\text{d}\mathbf{x}. \end{array}\end{eqnarray} \tag{ 10 }$

For model (10), the level-set approach can be easily used to gradually propagate the initial surface to the minimization of the energy function. However, level-set methods are based on a local optimization technique, and thus the surface may be trapped in a locally optimal position in each iteration. Moreover, the discrete time step size of level-set methods should be small, resulting in a slow convergence to numerical stability. In this study, we use a fast global optimization-based approach proposed by Yuan et al (2012) to solve model (10). This method propagates a contour/surface to its globally optimal position in each iteration by solving a sequence of convex optimization problems, for which an efficient continuous max-flow algorithm (Yuan et al 2010) is available. In addition, the new contour/surface position at each evolution step is computed in a fully time-implicit manner, which allows a large time-step and substantially speeds up contour/surface propagation (Yuan et al 2012). We describe the algorithm in this study briefly and refer readers to Yuan et al (2010, 2012) for details and proofs.

For the given surface ${{\mathcal{C}}_{t}}$ , its new position ${{\mathcal{C}}_{t+1}}$ can be achieved by solving the following optimization problem:

$\begin{eqnarray}&&\underset{{{\mathcal{C}}_{t+1}}}{{\min}}\,{\int}_{{{\mathcal{C}}^{+}}}{{e}^{+}}\left(\mathbf{x}\right)\text{d}x+{\int}_{{{\mathcal{C}}^{-}}}{{e}^{-}}\left(\mathbf{x}\right)\text{d}\mathbf{x}+\lambda {\int}_{\partial \mathcal{C}}g(s)\text{d}s,\end{eqnarray} \tag{ 11 }$

where ${{\mathcal{C}}^{+}}$ and ${{\mathcal{C}}^{-}}$ are the expansion and shrinkage regions with respect to ${{\mathcal{C}}_{t}}$ , and the functions ${{e}^{+}}\left(\mathbf{x}\right)$ and ${{e}^{-}}\left(\mathbf{x}\right)$ define the cost corresponding to the voxel $\mathbf{x}$ in ${{\mathcal{C}}^{+}}$ and ${{\mathcal{C}}^{-}}$ . The third term is the weighted smooth term.

The optimization problem (11) can be equally formulated as a spatially continuous min-cut problem as

$\begin{eqnarray}&&\underset{u\left(\mathbf{x}\right)\in \left\{0,1\right\}}{{\min}}\,<1-u,{{C}_{s}}>+<u,{{C}_{t}}>+\,\lambda {{{\int}^{}}_{ \Omega }}g\left(\mathbf{x}\right)|\nabla u|\text{d}\mathbf{x},\end{eqnarray} \tag{ 12 }$

where $u\left(\mathbf{x}\right)\in \left\{0,1\right\}$ is the indicator of surface ${{\mathcal{C}}_{t}}$ , and the two cost functions C_s and C_t are defined as

$\begin{eqnarray}{{C}_{s}}:=\left\{\begin{array}{*{35}{l}} {{e}^{-}}\left(\mathbf{x}\right), & \text{where}\mathbf{x}\in {{\mathcal{C}}_{t}} \\ 0, & \text{otherwise} \end{array},\right. \end{eqnarray} \tag{ 13 }$

$\begin{eqnarray}{{C}_{t}}:=\left\{\begin{array}{*{35}{l}} {{e}^{+}}\left(\mathbf{x}\right), & \text{where}\mathbf{x}\notin {{\mathcal{C}}_{t}} \\ 0, & \text{otherwise} \end{array},\right. \end{eqnarray} \tag{ 14 }$

where C_s,t are cost functions for foreground and background with respect to the current surface ${{\mathcal{C}}_{t}}$ , respectively. In our model, we first define

$\begin{eqnarray}&&y\left(\mathbf{x}\right):=-g\left(\mathbf{x}\right)\left\{{{\alpha}_{1}}\left(\log {{p}_{\text{in}}}-\log {{p}_{\text{out}}}-\gamma \mathcal{P}\right)+{{\alpha}_{2}}\left[\log L-\log (1-L)\right]\right\}.\end{eqnarray} \tag{ 15 }$

C_s,t is related to E_data and E_prior as

$\begin{eqnarray}&&\begin{array}{*{35}{l}} {{e}^{+}}\left(\mathbf{x}\right)=\left(\text{dist}\left(\mathbf{x},\partial {{\mathcal{C}}_{t}}\right)+y\left(\mathbf{x}\right)\right)/h, \end{array}\end{eqnarray} \tag{ 16 }$

$\begin{eqnarray}&&\begin{array}{*{35}{l}} {{e}^{-}}\left(\mathbf{x}\right)=\left(\text{dist}\left(\mathbf{x},\partial {{\mathcal{C}}_{t}}\right)-y\left(\mathbf{x}\right)\right)/h, \end{array}\end{eqnarray} \tag{ 17 }$

where $\text{dist}\left(\mathbf{x},\partial {{\mathcal{C}}_{t}}\right)$ is the distance function and h is the discrete time gap. Correspondingly, ${{e}^{+}}\left(\mathbf{x}\right)$ and ${{e}^{-}}\left(\mathbf{x}\right)$ define the min-cut costs ${{C}_{s}}\left(\mathbf{x}\right)$ and ${{C}_{t}}\left(\mathbf{x}\right)$ through (13) and (14).

The optimization problem (12) can be efficiently optimized by a continuous max-flow algorithm (Yuan et al 2010). In this way, the current surface propagates to the next position in each iteration and finally to the optimal position u^* of model (10).

3. Experimental settings

3.1. Evaluation metrics

To quantitatively evaluate the performance of the proposed method, we compared the algorithm segmentation with the manual segmentation, which was taken as the ground truth. As in Sliver07 (Heimann et al 2009), given the algorithm segmentation A and manual segmentation B, five metrics, i.e. volumetric overlap error (VOE), relative volume difference (RVD), average symmetric surface distance (ASD), root mean square symmetric surface distance (RMSD) and maximum symmetric surface distance (MSD) were calculated. For these metrics, the smaller the (absolute) value is, the better the segmentation result is. To combine different metrics and assess the general quality, scores are calculated. Particularly, a perfect scoring result (zero for all the five metrics) is worth 100 per metric, while the manual segmentations by a trained non-expert of the average quality ( $6.40 \%,4.70 \%,1.00\text{mm},1.80\text{mm},19.00\text{mm}$ ) is worth 75 per metric. The final score is the average of the five scores.

In addition to the Sliver07 metrics, the Dice similarity coefficient (DSC) was also used. It is defined as $\text{DSC}=100\times 2|A{\bigcap}^{}B|/\left(|A|+|B|\right)$ , which measures the volume overlap error in a range from 0 to 100 (perfect segmentation).

3.2. Implementation details and parameter settings

Our code for the 3D CNN was based on the cuda-convnet package⁵. For data preparation in both the training and testing stages of the CNN, all images were adjusted to 279 slices by appending or deleting slices without liver tissue. The input of the network was image blocks of size $496\times 496\times 279$ that cropped from CT images of size $512\times 512\times 279$ . The output of the network was $496\times 496\times 256$ probability blocks with values in [0.1,0.9]. The probability values of voxels outside this block in the original image were set to 0 to generate probability maps for original CT images. The training of the network took about 17 hours to run 70 epochs and converged at around the 40th epoch on the training dataset.

In the segmentation refinement step, each slice of the input image and probability map was downsampled to $256\times 256$ . Parameters were empirically set as the following. The threshold t in equation (1) for the probability map was set to 0.5. The window $O\left(\mathbf{x}\right)$ for computing the local region appearance was set to a cubic of $7\times 7\times 7$ and the window $W\left(\mathbf{x}\right)$ for computing local intensity distribution was set to a cubic of $17\times 17\times 5$ . The width η of the Gaussian kernel in the Parzen density model was set to 4. The discrete time step in surface propagation was set to h = 10. In model (10), since the parameters ${{\alpha}_{1}}$ , ${{\alpha}_{2}}$ and λ are scalar weights balancing the regional and boundary terms, we just need to change two of them to adjust the weights. Here, we choose to fix $\lambda =10$ and adjust other parameters according to magnitude analysis and the terms' effect on the model. Generally, small values of ${{\alpha}_{1,2}}$ will lead to smooth boundaries and vice versa. In addition, ${{\alpha}_{1}}$ and ${{\alpha}_{2}}$ weight the effect of the data term and prior term, respectively. It is noteworthy that a high value of ${{\alpha}_{2}}$ will lead to over-constraint by the prior probability map. Besides, the parameter γ balances the intensity distribution and region appearance distance in the data term. The parameters were empirically chosen first, then they were optimized sequentially by changing a single parameter at a time while holding others fixed. In this study, parameters were set as ${{\alpha}_{1}}=40$ , ${{\alpha}_{2}}=32$ , $\lambda =10$ and $\gamma =0.01\times \sigma$ , where σ was the standard deviation of intensities over the initial liver region ${{ \Omega }_{\text{ref}}}$ . Although these weights were set experimentally, the result was not very sensitive to their exact values within a range (see section 5.3).

For postprocessing, the segmented liver was processed with the morphological closing operator and cavity filling. This process was consistent with the standard manual segmentation of the Sliver07 database, which treated any tissue surrounded by liver tissues as part of the liver. In addition, the part of vessels enclosed by liver tissues was included in the liver too. Note that, the clefts are not included in the liver in clinical applications.

The experiments were conducted on a desktop computer with Intel Xeon E5-2680 CPU (2.70 GHz) and four graphics cards (NVDIA Geforce GTX 980). The training of the CNN was implemented using parallel computing architecture with four graphics cards. In the surface evolution, the convex max-flow algorithm was based on the (GPU version) code of Yuan et al (2007, 2010). Other computations were programmed in MATLAB 2014b. The average computational time per volume was about 135s, which consists of 3s in the CNN testing, 105s on the data-term computation in Matlab, 25s on the continuous max-flow algorithm and 2s on other computations.

4. Experimental results

4.1. Segmentation results

The testing dataset of 42 volumes was used to evaluate the final segmentation of the proposed method. The segmentations of the Sliver07-II dataset, a well-organized public dataset which is widely used for liver segmentation methods evaluation and comparison, were evaluated by the organizers of the Sliver07 website. The results are represented in table 2 with a total score of $80.3\pm 4.5$ . The calculated mean ratios of VOE, RVD, ASD, RMSD and MSD are $5.35\pm 1.23 \%$ , $-0.17\pm 1.34 \%$ , $0.84\pm 0.25$ mm, $1.78\pm 0.56$ mm and $19.58\pm 0.56$ mm, respectively. In addition, the metric DSC was assessed from VOE, which reached an average value of $97.25\pm 0.65 \%$ .

Table 2. Evaluation on the Sliver07-II dataset (10 volumes). Note: This evaluation was performed by the Sliver07 website organizers and the metric DSC was computed from VOE (SD: standard deviation).

Metric case	VOE (%)	RVD (%)	ASD (mm)	RMSD (mm)	MSD (mm)	Score (—)	DSC (%)
#1	5.13	0.64	0.84	1.87	19.84	80.7	97.37
#2	4.98	1.69	0.71	1.63	20.86	80.8	97.45
#3	4.6	1.16	0.89	1.72	19.04	80.9	97.65
#4	7.19	−0.98	1.3	2.86	25.04	72.3	96.27
#5	5.98	−1.57	1.03	2.23	23.06	76.3	96.92
#6	5.79	0.8	0.85	1.86	19.49	80.1	97.02
#7	3.87	−1.62	0.56	1.32	18.65	83.8	98.03
#8	5.07	−0.21	0.8	1.64	18.32	82.4	97.40
#9	3.66	0.58	0.41	0.73	13.58	88.9	98.14
#10	7.27	−2.21	1.02	1.95	17.95	76.7	96.23

Average	5.35	−0.17	0.84	1.78	19.58	80.3	97.25
SD	1.23	1.34	0.25	0.56	3.07	4.5	0.65

Table 3 describes evaluations of our method on the Sliver07-I dataset. The mean ratios of VOE, RVD, ASD, RMSD and MSD are $5.36\pm 1.35 \%$ , $0.03\pm 1.99 \%$ , $0.96\pm 0.24$ mm, $1.84\pm 0.54 \%$ mm and $19.2\pm 5.97 \%$ mm, respectively. Moreover, the mean DSC is $97.24\pm 0.71 \%$ . Figure 5 shows several typical results on the Sliver07-I dataset. The visual comparison of algorithm segmentation with manual segmentation illustrates the effectiveness of our method on the Sliver07-I dataset.

**Figure 5.** Visual examples of final segmentations on the Sliver07-I dataset. From up to down are results of case 04, 07, 16, 19. From left to right are (a) transversal 2D slice, (b) transversal 2D slice, (c) coronal 2D slice and (d) 3D view of each case. The outline of manual segmentation is in red, and the algorithm segmentation is in green. The fourth column (d) visualizes the surface distance (mm) from algorithm segmentation to manual segmentation.
Download figure:
Standard image High-resolution image

Table 3. Evaluation on the Sliver07-I dataset (20 volumes) (SD: standard deviation).

Metric	VOE (%)	RVD (%)	ASD (mm)	RMSD (mm)	MSD (mm)	Score (—)	DSC (%)
Average	5.36	0.03	0.96	1.84	19.20	79.3	97.24
SD	1.35	1.99	0.24	0.54	5.97	5.6	0.71

Table 4 summarizes segmentation results on data from local hospitals. The average performances with respect to VOE, RVD, ASD, RMSD, MSD and DSC are $5.29\pm 0.75 \%$ , $-0.37\pm 1.89 \%$ , $0.87\pm 0.14$ mm, $1.50\pm 0.31$ mm, $16.45\pm 6.47$ mm and $97.28\pm 0.39 \%$ , respectively. Figure 6 shows a perceptual comparison of the algorithm segmentation with manual segmentation on several typical cases with tumors or large shape variations. As can be seen, livers with tumors near the boundary and inhomogeneous appearances can be segmented accurately. In addition, the method can successfully deal with the shape variations of the liver.

**Figure 6.** Typical results for the local hospitals dataset. The top row shows the original images and the bottom row shows the algorithm segmentations (green) with manual segmentations (red). (a) and (b) show portal-phase livers with big tumors near the boundary that result in heterogeneous appearances. (c) shows a healthy artery-phase case of largely varied shape and (d) shows a healthy portal-phase liver.
Download figure:
Standard image High-resolution image

Table 4. Evaluation on the local hospitals dataset (12 volumes) (SD: standard deviation).

Metric	VOE (%)	RVD (%)	ASD (mm)	RMSD (mm)	MSD (mm)	DSC (%)
Average	5.29	−0.37	0.87	1.50	16.45	97.28
SD	0.75	1.89	0.14	0.31	6.47	0.39

In table 5, we assess the statistical significance of the method performance on different data groups, i.e. Sliver07 and non-Sliver07 data, healthy and non-healthy data. The p-values were calculated by the Mann–Whitney U-test (Hollander et al 2013) based on Sliver07 metrics. As we can see, these results demonstrate that our method does not perform significantly differently on different data.

Table 5. Comparison of performances on different datasets. The p-values are calculated by the Mann–Whitney U-test, where p > 0.05 means no significant difference of medians between the two segmentation results.

p-value	Sliver07 and non-Sliver07	Healthy and non-Healthy
VOE	0.902	0.635
RVD	0.670	0.708
ASD	0.500	0.131
RMSD	0.042^a	0.619
MSD	0.082	0.960

^ap < 0.05.

4.2. Comparison with state-of-the-art automatic methods

To assess the performance of the proposed method within existing literature, we compared it with automatic methods from the Sliver07 competition (Heimann et al 2009) based on the Sliver07-II dataset. The comparison was restricted to the top methods with available references describing their algorithms. Results for each metric are represented as the mean of the overall dataset. Table 6 shows the quantitative comparisons with seven state-of-the-art automatic methods, i.e. Kainmüller et al (2007), Wimmer et al (2009), Linguraru et al (2012), Al-Shaikhli et al (2015), Dong et al (2015), Lu et al (2015) and Gauriau et al (2013). Figure 7 graphically depicts the comparative results through their quartiles. As can be seen, the proposed method presents improved segmentation performance in terms of total score compared to other methods. In terms of DSC, our method achieves the best ratio of 97.25%. Three methods including Kainmüller et al (2007), Al-Shaikhli et al (2015) and Wimmer et al (2009) use SSM or its extended version to conduct automatic segmentation, for which complex shape position initialization or liver detection is a necessary step. The method employed by Dong et al (2015) utilizes a probabilistic atlas as prior information, which also needs to detect the liver's bounding box. By contrast, our method takes advantage of the CNN to predict a subject-specific prior and needs no initialization. Table 6 also shows the running time for testing. The computation time of our method is about 135s per CT volume, which is low among these methods and can meet clinical requirements.

**Figure 7.** Box plots of segmentation metrics for liver segmentations from eight ranking methods on the Sliver07 website, i.e. our own, Kainmüller *et al* (2007), Wimmer *et al* (2009), Linguraru *et al* (2012), Al-Shaikhli *et al* (2015), Dong *et al* (2015), Lu *et al* (2015) and Gauriau *et al* (2013). (a) VOE, (b) RVD, (c) ASD, (d) RMSD, (e) MSD, (f) DSC.
Download figure:
Standard image High-resolution image

Table 6. Comparison with state-of-the-art automatic methods based on the Sliver07-II dataset (n/a: not available).

Metric method	VOE (%)	RVD (%)	ASD (mm)	RMSD (mm)	MSD (mm)	Score (—)	DSC (%)	Time (—)
Al-Shaikhli et al (2015)	6.44	1.53	0.95	1.58	15.92	$79.6\pm 3.1$	96.67	n/a
Lu et al (2015)	5.90	2.70	0.91	1.88	18.94	$77.8\pm 6.1$	96.96	(24, 184) s
Kainmüller et al (2007)	6.09	−2.86	0.95	1.87	18.69	$77.3\pm 9.4$	96.85	15 min
Dong et al (2015)	6.44	0.01	0.98	1.87	18.14	$77.1\pm 4.3$	96.67	143 s
Wimmer et al (2009)	6.47	1.04	1.02	2.00	18.32	$76.8\pm 3.8$	96.65	3 min
Linguraru et al (2012)	6.37	2.26	1.00	1.92	20.75	$76.2\pm 5.9$	96.70	n/a
Gauriau et al (2013)	7.24	2.58	1.32	2.58	23.12	$71.5\pm 10.1$	96.23	46 s
Ours	5.35	−0.17	0.84	1.78	19.58	$80.3\pm 4.5$	97.25	135 s

4.3. Comparison with the CNN-based method

We compared the proposed model with the 3D CNN-based work of Lu et al (2015) (3D CNN-GC). On ten cases from Sliver07-II (see table 6), the proposed method achieved a total score of 80.3, and surpassed the 77.8 score of 3D CNN-GC. Table 7 shows the quantitative comparative results of our method and 3D CNN-GC based on ten odd-numbered cases from the Sliver07-I dataset. Average errors and scores were computed over nine cases except for case 05, which was a complete failure for 3D CNN-GC. The average total score of our method is 81.07, much higher than the 76.67 score of 3D CNN-GC. In addition, we used the paired t-test to find significant differences in accuracies of the two methods. The results show that the proposed method achieved a 3.4 point higher score with significant accuracy improvement (p < 0.05) than 3D CNN-GC on the 19 cases from the Sliver07 dataset.

Table 7. Comparison with Lu et al (2015) (3D CNN-GC) based on ten cases from the Sliver07-I dataset. Note the average errors and scores are calculated based on nine cases except for #05 (SD: standard deviation).

Metric	VOE (%)		RVD (%)		ASD (mm)		RMSD (mm)		MSD (mm)		Score
method	Lu	Our	Lu	Our	Lu	Our	Lu	Our	Lu	Our	Lu	Our
#01	7.74	7.08	4.14	0.39	1.49	1.21	2.79	1.79	32.72	14.69	65.69	79.15
#03	5.48	4.91	2.57	1.31	1.00	0.89	2.03	1.93	17.35	19.79	77.78	79.74
#05	—	4.70	—	0.09	—	0.76	—	2.00	—	28.64	Failure	79.31
#07	5.29	5.10	−0.65	−0.34	0.96	0.86	1.66	1.38	19.58	11.58	80.64	84.46
#09	5.34	5.29	1.94	−1.97	1.00	0.94	2.07	1.78	22.06	16.16	77.22	79.87
#11	6.22	5.37	0.57	−0.04	1.22	1.01	3.04	2.32	21.82	19.92	74.24	79.01
#13	9.45	7.71	6.64	3.22	1.57	1.27	2.50	2.16	17.04	21.96	66.31	72.45
#15	4.21	3.62	0.65	1.05	0.71	0.60	1.43	1.04	19.51	10.54	83.37	87.39
#17	5.51	5.86	0.70	0.06	0.93	1.04	2.03	2.21	19.96	19.43	79.41	78.91
#19	4.46	3.15	0.26	−0.19	0.75	0.54	1.26	0.90	13.87	13.24	85.35	88.62

Average	5.97	5.34	1.87	0.38	1.07	0.93	2.09	1.72	20.43	16.36	76.67	81.07
SD	1.65	1.45	2.29	1.41	0.30	0.24	0.60	0.51	5.26	4.10	6.88	4.98

Furthermore, we assess the effectiveness of our segmentation refinement method by comparing segmentation results on the Sliver07-I dataset (20 volumes). Both methods utilized the same prior probability map learned by the CNN proposed in this study. Average errors and scores were computed over 19 cases except for case 05, which was a failed case for 3D CNN-GC (see figure 8). The average performances of 3D CNN-GC with respect to VOE, RVD, ASD, RMSD and MSD are $5.83\pm 1.62 \%$ , $0.53\pm 2.35 \%$ , $1.11\pm 0.36$ mm, $2.33\pm 1.02$ mm and $23.87\pm 8.86$ mm, while the average errors of the proposed method are $5.39\pm 1.34 \%$ , $0.02\pm 1.99 \%$ , $0.97\pm 0.23$ mm, $1.84\pm 0.54$ mm and $18.70\pm 5.54$ mm, respectively. The average total score of $79.3\pm 5.6$ for our method is significantly better (p < 0.05) than the $75.4\pm 8.7$ score of 3D CNN-GC. Figure 8 shows visually comparable results of two typical cases from the Sliver07-I dataset, i.e. case 05 and case 07. Case 05 in the top row was a rotated liver, of which the initial segmentation varied from manual segmentation greatly. Restricted by the inaccurate initialization, 3D CNN-GC failed to segment the liver. By contrast, the proposed method successfully dealt with this case. Case 07 in bottom row was a liver containing a tumor near the boundary. Due to the inhomogeneity in intensity and appearance, 3D CNN-GC experienced under-segmentation in the tumor, whereas the proposed method agreed better with the liver boundary.

**Figure 8.** Visual comparison of the proposed method with Lu *et al* (2015) (3D CNN-GC). The first row and the second row show a transversal slice of case 05 and case 07 in the Sliver07-I dataset, respectively. Manual segmentation is shown in red contour. From left to right column: (a) initial segmentation (yellow); (b) final segmentation (green) of 3D CNN-GC; (c) final segmentation (green) of the proposed method.
Download figure:
Standard image High-resolution image

5. Discussion

5.1. Summary of the proposed method

In this paper, we proposed a fast automatic liver segmentation method based on 3D CNN and globally optimized surface evolution. This method can tackle the problems caused by heterogeneous appearances and large shape variations of the liver. To automatically detect the liver, a 3D CNN was trained to efficiently learn a subject-specific probability map of the liver. Then, the probability map was thresholded to provide both initial segmentation and shape prior for the subsequent segmentation step. Compared to model-based methods (Kainmüller et al 2007, Wimmer et al 2009, Al-Shaikhli et al 2015, Dong et al 2015), our liver detection method required no need for a shape model/atlas construction, registration, complicated initial position searching or shape deformation. In addition, our initial segmentation not only identified the rough position of the liver, but can also successfully delineate most parts of the liver surface. Given the initial segmentation, one of the biggest challenges for accurate liver segmentation is the appearance inhomogeneity due to the presence of intrahepatic veins and liver pathologies. To address this issue, we proposed to learn the multiple intensity distributions and appearances in different liver sub-regions by considering both global and local statistical information from initial segmentations. The proposed energy function was globally optimized in a surface evolution way. Compared with the CNN-based method (3D CNN-GC (Lu et al 2015)), our method was more capable of dealing with difficult cases with tumors or large shape variations. In contrast, the 3D CNN-GC tended to under-segment the liver with only global prior information.

A thorough validation on two independent databases showed the proposed method is accurate and robust for detecting and segmenting the liver. Online evaluations on the Sliver07-II dataset showed the proposed method yielded a mean VOE of $5.35\pm 1.23 \%$ , a RVD of $-0.17\pm 1.34 \%$ , an ASD of $0.84\pm 0.25$ mm, a RMSD of $1.78\pm 0.56$ mm, a MSD of $19.58\pm 3.07$ mm, and presented an improved total score compared to the state-of-the-art automatic methods (Kainmüller et al 2007, Wimmer et al 2009, Linguraru et al 2012, Gauriau et al 2013, Al-Shaikhli et al 2015, Dong et al 2015, Lu et al 2015) on Sliver07 website. Although some semiautomatic methods (Maklad et al 2013, Peng et al 2014b, 2015) achieved higher scores, they required user interaction or complicated initialization, which is a very challenging issue. When applied to the Sliver07-I dataset and the local hospitals dataset, our method yielded a mean DSC of $97.24\pm 0.71 \%$ and $97.28\pm 0.39 \%$ with an ASD of $0.96\pm 0.24$ mm and $0.87\pm 0.14$ mm, respectively. By implementing in Matlab and using a GPU-based algorithm, the average computational time was about 135s per volume.

5.2. Effect of different terms in the proposed model

To better understand the effect of different terms in the model (10), the accuracies of the segmentation model that omit different terms are shown in figure 9. Average total scores of four outcomes on ten cases, i.e. five healthy cases and five unhealthy cases with heterogeneous appearances and ambiguous boundaries, are compared. The first outcome is the initial segmentation produced by the CNN. The second outcome corresponds to the segmentation model without prior constraint ( ${{\alpha}_{2}}=0$ ) and using a global-prior-based data term, i.e. $p_{\text{in}}^{\text{global}}$ and ${{\mathcal{P}}^{\text{global}}}$ (see section 2.4.1). The third outcome is from the segmentation model using a global-prior-based data term with prior constraint. The last outcome is for the complete proposed model that uses both prior constraint and a local-prior-based data term. We have adjusted the parameters to make the second and third model work well. It is noteworthy that the segmentation model using the global-prior-based data term with prior constraint is intrinsically the same as that of the 3D CNN-GC (Lu et al 2015). The validation results show that the average total score increases as the prior constraint and the local-prior-based data term are added. The prior constraint is effective to prevent leakage to adjacent tissues/organs with similar appearances. On the unhealthy livers, significant improvement (p < 0.05) is observed using local-prior-based data terms over the first three outcomes.

**Figure 9.** Accuracy improvement with different terms in the proposed model. From left to right, the bars correspond to outcomes of the CNN, the fine segmentation without prior constraint and using a global-prior-based data term, the fine segmentation with a global-prior-based data term, and the complete proposed model, respectively.
Download figure:
Standard image High-resolution image

5.3. Parameter analysis

In this section, we conduct a detailed analysis of the effect of model parameters on segmentation results. An evaluation of initial segmentations with different thresholds for binarizing the prior probability map was conducted to test the effect of the threshold. Figure 10 shows the plots of average DSCs with different thresholds on the 109 training volumes and 32 testing volumes (20 volumes from Sliver07-I and 12 from the local hospitals database). The optimal threshold was observed at t = 0.5 with respect to DSC for both training data and testing data. In addition, the average DSCs are relatively stable and achieve above $93 \%$ in testing with a threshold of t = 0.4,0.5,0.6. More specifically, the average DSC of initial segmentation was $93.45\pm 3.52 \%$ at a threshold of 0.5 in testing.

**Figure 10.** Average DSCs of initial segmentation based on the 3D CNN probability with different thresholds in training (a) and testing (b).
Download figure:
Standard image High-resolution image

To further analyze the choices of model trade-off parameters, i.e. ${{\alpha}_{1}}$ , ${{\alpha}_{2}}$ and γ, we conduct a sensitivity analysis of these parameters on a dataset consisting of five healthy livers and five unhealthy livers. Figure 11 shows the performances of our model with different values of ${{\alpha}_{1}}$ (from 0 to 100), ${{\alpha}_{2}}$ (from 0 to 100) and $\gamma /\sigma$ (from 0 to 0.04), where σ is the standard deviation of intensity values in the initial liver region. When testing on one parameter, the others were kept as the default value. The result in figure 11 shows that the average score is relatively stable with values of ${{\alpha}_{1}}$ in the range of [36,80], ${{\alpha}_{2}}$ in the range of [20,40] and $\gamma /\sigma$ in the range of [0.005,0.020], which means our method is not sensitive to the exact values of the parameters in a certain wide range. The parameter setting of ${{\alpha}_{1}}=40,{{\alpha}_{2}}=32$ and $\gamma /\sigma =0.01$ in our implementation would be meaningful in relation to balancing the terms of the model and achieving a high accuracy.

**Figure 11.** Illustration of parameter analysis of the proposed method. The average scores on ten typical cases were evaluated with different values of ${{\alpha}_{1}},{{\alpha}_{2}}$ and $\gamma /\sigma$ . When testing on one parameter, the others were kept as the default value, i.e. 40, 32 and 0.01 for ${{\alpha}_{1}},{{\alpha}_{2}}$ and $\gamma /\sigma$ , respectively.
Download figure:
Standard image High-resolution image

5.4. Effect of distance function in surface evolution

In the surface evolution process, the surface is driven by a region-based force and curvature function, which is related to the definition of ${{e}^{+}}\left(\mathbf{x}\right)$ and ${{e}^{+}}\left(\mathbf{x}\right)$ . Figures 12(a)–(c) show the iterations of surface evolution with distance functions of different magnitudes, i.e. $0.1\text{dist}\left(\mathbf{x},\partial {{\mathcal{C}}_{t}}\right)$ , $\text{dist}\left(\mathbf{x},\partial {{\mathcal{C}}_{t}}\right)$ and $10\text{dist}\left(\mathbf{x},\partial {{\mathcal{C}}_{t}}\right)$ . The initial surface propagated with different speeds according to the distance function. As can be seen, a large weight of the distance function made the evolution slow, while a small weight of distance function speeded up the evolution. The evolution stopped at the 8th, 15th and 40th iteration for $0.1\text{dist}\left(\mathbf{x},\partial {{\mathcal{C}}_{t}}\right)$ , $\text{dist}\left(\mathbf{x},\partial {{\mathcal{C}}_{t}}\right)$ and $10\text{dist}\left(\mathbf{x},\partial {{\mathcal{C}}_{t}}\right)$ , respectively. Final results in figure 12(d) show the differences at the concave position of the surface. The detected surface is relatively less smooth with smaller distance function. This is because with a small distance function, the surface evolution is more dominated by the region-based force and less driven by curvature.

**Figure 12.** Surface evolution with the distance function of different magnitudes. The manual segmentation is in red contour. (a)–(c) The 1st, 3rd and 7th iterations of the surface evolution with $0.1\text{dist}\left(\mathbf{x},\partial {{\mathcal{C}}_{t}}\right)$ (yellow contour), $\text{dist}\left(\mathbf{x},\partial {{\mathcal{C}}_{t}}\right)$ (green contour) and $10\text{dist}\left(\mathbf{x},\partial {{\mathcal{C}}_{t}}\right)$ (blue contour); (d) The final results.
Download figure:
Standard image High-resolution image

**Figure 12.** Surface evolution with the distance function of different magnitudes. The manual segmentation is in red contour. (a)–(c) The 1st, 3rd and 7th iterations of the surface evolution with $0.1\text{dist}\left(\mathbf{x},\partial {{\mathcal{C}}_{t}}\right)$ (yellow contour), $\text{dist}\left(\mathbf{x},\partial {{\mathcal{C}}_{t}}\right)$ (green contour) and $10\text{dist}\left(\mathbf{x},\partial {{\mathcal{C}}_{t}}\right)$ (blue contour); (d) The final results.
Download figure:
Standard image High-resolution image

6. Conclusion

A fully automatic method was proposed to detect and delineate the liver surface from 3D CT images. Quantitative validations and comparisons showed the method is accurate, efficient and may be suitable for clinical practice. An advantage of the proposed method is that it can overcome the challenge of appearance heterogeneity in unhealthy livers. Since no assumption about the specific organ, such as location, shape and appearance, was adopted, the approach can be extended to other organs and multi-organ segmentation, which is our future work. Despite the promising results, the segmentation framework can be further improved in following way. In testing stage, most of the computation time (almost $80 \%$ ) is spent on data-term calculation in Matlab. The computational efficiency will be further improved by fully implementing the algorithm in C++ with parallelization.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 11271323, 91330105, 11401231) and the Zhejiang Provincial Natural Science Foundation of China (Grant No. LZ13A010002). JP was also supported by Natural Science Foundation of Fujian Province (No. 2015J01254), Science–Technology Foundation for Middle-aged and Young Teacher of Fujian Province (No. JA14021), and the Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University.

Automatic 3D liver segmentation based on deep learning and globally optimized surface evolution

Article metrics

Permissions

Author e-mails

Author affiliations

Author notes

Dates

Abstract

1. Introduction

2. Materials and methods

2.1. Clinical datasets

2.2. Deep 3D CNN training

2.3. Initial liver segmentation by the 3D CNN

2.4. Energy model for the liver segmentation using shape prior

2.4.1. Global-prior-based data term.

2.4.2. Local-prior-based data term.

2.5. Global optimization-based surface evolution

3. Experimental settings

3.1. Evaluation metrics

3.2. Implementation details and parameter settings

4. Experimental results

4.1. Segmentation results

4.2. Comparison with state-of-the-art automatic methods

4.3. Comparison with the CNN-based method

5. Discussion

5.1. Summary of the proposed method

5.2. Effect of different terms in the proposed model

5.3. Parameter analysis

5.4. Effect of distance function in surface evolution

6. Conclusion

Acknowledgments

Footnotes

Automatic 3D liver segmentation based on deep learning and globally optimized surface evolution

Article metrics

Permissions

Share this article

Author e-mails

Author affiliations

Author notes

Dates

Abstract

1. Introduction

2. Materials and methods

2.1. Clinical datasets

2.2. Deep 3D CNN training

2.3. Initial liver segmentation by the 3D CNN

2.4. Energy model for the liver segmentation using shape prior

2.4.1. Global-prior-based data term.

2.4.2. Local-prior-based data term.

2.5. Global optimization-based surface evolution

3. Experimental settings

3.1. Evaluation metrics

3.2. Implementation details and parameter settings

4. Experimental results

4.1. Segmentation results

4.2. Comparison with state-of-the-art automatic methods

4.3. Comparison with the CNN-based method

5. Discussion

5.1. Summary of the proposed method

5.2. Effect of different terms in the proposed model

5.3. Parameter analysis

5.4. Effect of distance function in surface evolution

6. Conclusion

Acknowledgments

Footnotes