Abstract

Early detection of pulmonary nodules is extremely important for the diagnosis and treatment of lung cancer. In this study, a new classification approach for pulmonary nodules from CT imagery is presented by using hybrid features. Four different methods are introduced for the proposed system. The overall detection performance is evaluated using various classifiers. The results are compared to similar techniques in the literature by using standard measures. The proposed approach with the hybrid features results in 90.7% classification accuracy (89.6% sensitivity and 87.5% specificity).

1. Introduction

Computer aided detection (CAD) system is an extremely important task for the detection of pulmonary nodules in medical images. To attain a more reliable and accurate diagnosis, CAD systems have been recently developed to assist interpretation of the medical images. The systems that find true positive findings from the medical images are especially important in that they can also help radiologists in the identification of early stage pulmonary nodules. To best interpret the information revealed in the images, experienced physicians are required; however, such experts may reach different diagnosis results for the same set of medical imaging. Thus, CAD system is an intensive tool that can provide radiologists with a second opinion to improve the sensitivity of their diagnosis decision-making process [1]. The aim of a CAD system is to provide diagnosis information to improve clinical decision-making process; therefore, its success is related directly to its disease detection accuracy [2]. Today, CAD systems are frequently utilized to detect and diagnose numerous abnormalities in routine clinical work. CAD systems are usually specialized in anatomical regions such as the thorax, breast, or colon by using certain medical imaging technologies such as radiography, computed tomography (CT), or magnetic resonance imaging (MRI) [3].

Recently, lung cancer is still considered a major cause of deaths from cancer worldwide. In particular, it is one of the main public health issues in the developed industrial countries [4, 5]. This makes the treatment of lung cancer a very important task in the war against cancer. Early detection of potentially cancerous pulmonary nodules is highly important for improving the patient’s chance of survival. Multidetector computed tomography system is a very sensitive imaging modality to detect small pulmonary nodules.

In previous studies, classification systems were developed by using the features of nodule candidate patterns with image-processing techniques [68], by classifying the shape of pulmonary nodule patterns [9, 10] and by using morphological features [11, 12]. To classify lung nodules, neural network approaches  [13, 14] and Fisher linear discriminant classifier [15, 16] were proposed. In addition, several approaches have been proposed to detect pulmonary nodules in thin-slice helical computed tomography images [17, 18]. Similar techniques are introduced by using genetic algorithm with the random subspace method [19, 20], a single support vector machine [21], and random forest classifiers [22, 23]. Recently, the ensemble learning methods have been applied to classification problems [24, 25]. Especially, the ensemble learning algorithms such as bagging and adaboost are shown to be superior to a single classifier [26, 27].

In this study, a combination of four different methods was proposed for feature extraction from CT images.

Method 1. Two-dimensional principal component analysis (2D-PCA) applied to dataset.

Method 2. Statistical features obtained from 2D-PCA values.

Method 3. Geometric features obtained by using the regional descriptors of the 2D patterns based on the basic morphological shape information.

Method 4. Selecting the best features of the above three methods with mRMR (minimum Redundancy Maximum Relevance) method, hybrid features are obtained by combining the best features.

To perform a rigorous validation with the proposed system, completely independent training and testing datasets are utilized. All nodules in the dataset are first tuned/trained using a dataset provided as a courtesy of the University of Istanbul, Cerrahpasa Faculty of Medicine.

A classification task forms the backbone of a computer aided detection system. In this paper, we propose a new classification approach for pulmonary nodules using hybrid features to be used in such a CAD system. The objective of the proposed study is to analyze the effect of the hybrid features on classification of pulmonary nodules. The proposed classification approach has several novel potential roles.(i)To be used as an effective filtering method to reduce the number of false positives in a CAD system.(ii)To increase the diagnostic accuracy of the detection system.

The rest of this paper is organized as follows. The proposed classification approach for a CAD system and methods used in the algorithm are described in Section 2. This section includes the database information, feature extraction, feature selection, and classifier algorithms. Overall performance of the proposed system as well as comparisons with six other previously presented CAD systems is presented in Section 3. Conclusions are given in Section 4.

2. Materials and Methods

2.1. Pulmonary Nodule Database and Imaging Protocol

In the study, dataset containing 95 pulmonary nodules and 75 nonnodules patterns obtained from two-dimensional (2D) CT images from 63 patients was utilized. The 2D pulmonary nodule patterns are manually marked on CT image by radiologists. Then, the nodule pattern is extracted from the CT image as illustrated in Figure 1. Other patterns in the lung parenchyma similar to nodules but not marked as “nodule” by the radiologists are selected as the member patterns of nonnodule class. Images are collected from 39 male and 24 female patients whose ages are ranging from 25 to 78 years [mean = years]. The number of pulmonary nodules detected in the right and left lung parenchyma, as illustrated in Figure 2, is a total of 67 (20 in the upper part, 20 on the bottom part, and 27 pleural cases) and a total of 28 (12 in the upper part, 8 on the bottom part, 8 pleural case), respectively.

The average nodule diameter is  mm. The diameter distribution of the nodules used in the database is shown in Figure 3. Also nodule and nonnodule pattern samples used in dataset are given in Figure 4. The age distribution of the patients is illustrated in Figure 5.

The dataset was obtained from chest CT images of patients scanned by using “Sensation 16” CT scanner (Siemens Medical Systems) between 2010 and 2012 at Radiology Department, Cerrahpasa Medicine Faculty, Istanbul University. CT scans were acquired at a tube potential voltage of 120 kVp. All CT images are in size of pixels and stored as DICOM (Digital Imaging and Communications in Medicine) format files, directly from the CT modality.

2.2. Feature Extraction
2.2.1. Two-Dimensional Principal Component Analysis (2D-PCA)

Principal component analysis (PCA) is defined as a classical dimension reduction method for feature extraction and data representation technique widely used in the areas of pattern recognition, computer vision, and signal processing [28]. Eigenvalue and eigenvector components are ranked according to their variance to the principal axes and ranked from having the most contribution to the least one. Number of the reduced dimension is based on summed contribution of the eigenvalues exceeding 99%. It provides a dimensionality reduction with an unsupervised learning algorithm [29]. Consider the following.

Let be an -dimensional column vector. The project image is an matrix, onto by . In order to determine the optimal projection vector , the total scatter of the projected samples is utilized to measure the optimality of where depicts the image covariance matrix.

Suppose that there are training samples and is the average image,

The optimal projection direction denotes the eigenvector of corresponding to the largest eigenvalue. Usually a set of orthonormal projection directions, , are chosen. These projection directions are the orthonormal eigenvectors of corresponding to the first largest eigenvalues.

For a given , let . A set of projected feature vector and the principal components of are found. The feature matrix of is obtained as . The nearest neighborhood classifier is adopted for classification. The distance between two arbitrary feature matrices, and , is given by where depicts the Euclidean distance between and [30].

A classification process is the basis of a computer aided detection system. The classification scheme proposed for a computer-aided detection algorithm used in this work is shown in Figure 6.

2.2.2. Morphological Image Processing

Morphology is a cornerstone of the mathematical set of tools underlying the development of techniques that extract the meaning features from an image [31]. To extract the features of pulmonary nodules, geometric features were obtained by using the regional descriptors of the 2D patterns based on the basic morphological shape information. The geometric features consist of the area, perimeter, diameter, solidity, eccentricity, aspect ratio, compactness, roundness, circularity, ellipticity of the patterns in this study.

These features are given by its definitions in Table 1. A total of 10 features are evaluated for extracting features of the patterns. From these features, Solidity denotes the proportion of the pixels in the convex hull that are also in the region. Eccentricity depicts the eccentricity of the ellipse that has the same second moments as the region. Also it is the ratio of the distance between the foci of the ellipse and its major axis length. The value of eccentricity is between 0 and 1. Measurements of compactness, roundness, circularity and ellipticity are computed by the definitions given in Table 1 [37].

2.3. Feature Selection
2.3.1. The mRMR Method

The mRMR (minimum Redundancy Maximum Relevance) method from the feature selection methods has been providing shorter calculation time and higher accuracy for the classifier. The mRMR method was proposed by Peng et al. [38]. The mRMR method uses the mutual information between a feature and a class or a feature and another feature. The relevance of a feature set for the class is defined by the average of all mutual information values between individual feature and class , where denotes the mutual information between feature and class . The redundancy of all features in the set is defined by the average of all mutual information values between the feature and the feature , where is the mutual information between features and . The mRMR criteria, that is, the combination of two measures given in (4) and (5), are given by the following terms: As a result, the best feature set is obtained by optimizing expressions of (4) and (5) simultaneously according to (6) or (7).

2.4. Nodule Classification
2.4.1. Artificial Neural Network

An artificial neural network (ANN) is one of the tools of artificial intelligence intended to imitate the complex operation of organizing and processing information of the neurons in the human brain. ANN can recognize patterns correlating strongly with a set of data which correspond to a class by a learning process, in which interneuron connection weights are utilized to store knowledge about specific features identified within the data [39]. It is used for reducing experimental work and time losses. A common ANN is the multilayer perceptron (MLP) algorithm which is made up from three layers as shown in Figure 7. The ANN is trained by entering information from the input layer through the hidden and output layers of the network [40]. The ANN is performed by using the back-propagation algorithm based on the Levenberg-Marquardt rule [41].

The output signal for the lth neuron in the th layer is given by the following expression: where denotes the activation function, depicts the connection weight, denotes the time index, and depicts the weights. The synaptic weight is defined by the following expression (): And it is revised as the following: where depicts the learning rate (). Also the local error gradient is given by To improve the performance of the back-propagation algorithm, a momentum term is added as the following: where is between 0 and 1. For the output layer, the local error gradient is defined by where , depict the goal output signal and the activation function, respectively.

2.4.2. Random Forest

Random forest was proposed by Breiman in 1999 [42]. It is a new development in tree based classifiers and fast proven to be one of the most important algorithms in the machine learning. It is defined as a combination of tree predictors depending on the values of a random vector sampled independently and with the same distribution for all trees in the forest. Random forest has given robust and improved results of classifications on standard data sets. It is providing very good competition to neural networks and ensemble techniques on different classification problems. Random forest is related to be special type of ensembles using bagging and random splitting methods to grow multiple trees [42, 43].

There are several advantages for the Random forest method. Especially, Random forest can predict what features are important in the classification. It can process efficiently large data sets. Also it can be utilized as an effective method to estimate missing data.

2.4.3. Bagging

Bagging is unstable learning algorithm for small data set if small changes in the training data will generate very diverse classifiers. The use of bagging to improve performance by taking advantage of this effect was proposed by Breiman [44]. A single classifier could have a higher test error. The combination of classifiers can produce a lower test error than that of the single classifier because the diversity of classifiers usually compensates for errors of any single classifier [45]. A learning algorithm combination in those small changes in the training set leads to relatively large changes in accuracy.

2.4.4. Adaboost

Adaboost is one of the powerful methods for pattern recognition [46]. Adaboost classifier firstly introduced by Freund and Schapire [47, 48] is an ensemble classifier composed of many weak classifiers for the two-class classification problem. It generates strong classifier with weak classifiers. Adaboost makes a committee of member weak classifiers by adaptively adjusting the weights at each loop. While the weights of the training patterns classified correctly by a weak classifier are decreased, the weights of the training patterns misclassified by the weak classifier are increased.

Adaboost algorithm shows good performance effect because of the ability to generate expanding diversity. In order to improve the performance result of the final ensemble, adaboost algorithms consist of diverse weak classifiers. Especially, the boosting algorithm adaboost.M1—the first directly—extends the original adaboost algorithm to the multiclass case without reducing it to multiple two-class problems.

Principal component analysis, mRMR method, and morphological image processing algorithms are performed by using the Matlab codes. Classification processes were provided by using data mining software called the Weka tool version 3.7.7 which is available from http://www.cs.waikato.ac.nz/~ml/weka/. Tests are done on a PC with Intel Core i7, 1.90 GHz CPU, and 4.00 GB RAM. For evaluating the classifiers, 5-fold cross-validation technique is used.

3. Results

Various classification methods are utilized for feature extraction and selection in medical pattern recognition. In this study, two-dimensional principal component analysis and geometric feature values were used for feature extraction. The mRMR method was applied for feature selection. The entire dataset is randomly partitioned into training and testing sets. The entire dataset is divided into approximately 50% training dataset and 50% test dataset. The training dataset consists of 47 pulmonary nodules and 37 nonnodule patterns (total number of patterns is 84). The test dataset consists of 48 pulmonary nodules and 38 nonnodule patterns (a total of 86 patterns). The best features for each method are determined using the mRMR feature selection only in the training dataset. Then, the classification accuracies of the methods are calculated using these features in the test dataset.

In the study, four different methods were proposed. For principal component analysis on method 1, the largest first seven values were selected for the first seven principal components because of highest variance value. So that, a -dimensional matrix was formed for each pattern. Then, -dimensional feature vector was obtained. In this way, at least 99% value of the total variance for each pattern was taken into account. To select the best features that contribute to the performance of classification system in the training set, the mRMR method was utilized. The number of best features performed with the mRMR method was determined as 20.

In method 2, the statistical features, minimum (min), maximum (max), mean, standard deviation (std), variance (var), and 3rd moment values, are calculated in the training dataset. Thus, a -dimensional feature vector was obtained. The best feature ranking that performed with the mRMR method is 3rd moment, min, mean, std, max, and var. The number of best features performed with the mRMR method was the first 5 features which are 3rd moment, min, mean, std, and max.

In method 3, geometric features based on the basic morphological shape information were utilized for the 2D patterns in the training dataset. The geometric features include the area, perimeter, diameter, solidity, eccentricity, aspect ratio, compactness, roundness, circularity, and ellipticity of the patterns. The number of best features performed with the mRMR method was 5 features consisting of compactness, aspect ratio, area, solidity and ellipticity.

A new hybrid approach for classification was introduced on method 4. A new feature vector was created by combining the best features of the above three methods, aiming at increasing the sensitivity of the proposed classification approach. A total of 30 features selected by the three methods were now applied to the test dataset.

Random forest, artificial neural networks, ensemble bagging with RF, ensemble bagging with ANN, ensemble adaboost with RF, and ensemble adaboost with ANN classifiers were separately applied in all of the methods.

The classifiers were compared, and overall performance results of the proposed classification approach were given in Table 2. The performance measurements are given by where TP, TN, FP, and FN denote the number of nodules classified as true positive, true negative, false positive, and false negative, respectively. FPR is false positive rate per image.

Sensitivity is the number of correctly predicted positives divided by the total number of positive cases. Specificity is the number of correctly predicted negatives divided by the total number of negative cases. TCA (total classification accuracy) represents the probability of correctly classified patterns. For RMSE (root mean squared error), , , and depict actual value, predicted value, and number of data patterns, respectively. In order to measure the performance of the classification system, AUROC is often used as well as sensitivity and specificity [49]. AUROC represents the area under the receiver operating characteristic curve. Kappa statistics is a chance-corrected measure of agreement between the classifications and the true classes. If Kappa is equal to 1, it indicates perfect agreement. If Kappa is equal to 0, it represents chance agreement.

Confusion matrixes of the classifiers in the proposed methods were shown in Table 3.

A ROC curve is usually used as a technique to visualize the performance of classifiers and is extremely useful to compare the performance of different classifiers in medical decision-making systems. The curve indicates the tradeoff between the true positive and false positive rates.

The area under ROC (AUROC) used here is largely adopted to represent the expected performance of a classifier. The AUROC of a classifier is equivalent to the probability that the classifier ranks a randomly chosen positive instance higher than a randomly chosen negative instance [50]. For our proposed methods, ROC curves are illustrated in Figure 8.

3.1. Performance Comparison

To evaluate the performance of the classification approach, the results of this study were compared with previously reported CAD systems. It is highly difficult task to make comparison between previously published CAD systems due to different datasets, nodule size or type, and nodule or nonnodule patterns. It is still important to make a relative comparison. It is obviously shown that the performance results of a CAD system can differ significantly depending on those variables.

A single 2D slice is selected for each 3D object as seen in Figure 1. Pulmonary nodules are observed on a several slice range of the whole CT scan. Radiologists inspect these slices for the 2D patterns then select and label the pulmonary nodule pattern which has the largest dimension (i.e., area, diameter). Thus, when any physician detects a pulmonary nodule on the CT slices, he/she chooses the largest 2D pattern which is labeled and used in the dataset.

For comparative analysis, it is examined recently and reported that CAD systems have utilized the LIDC (Lung Image Database Consortium) database to evaluate detection systems [3234]. Opfer and Wiemker [32] utilized the dataset comprised of 93 cases (2-3 mm slice thickness) with 127 nodules. Sahiner et al. used the dataset having a total of 73 nodules by combining 28 CT scans from the LIDC and 20 scans from another database [34]. Rubin et al. used a total of 84 CT scans with a total of 143 nodules in the range of 3–30 mm in nodule size [33]. Other papers utilized their own databases for the performance analysis of CAD system [35, 36, 51]. Suzuki et al. used the dataset of 20 CT scans (1.25 mm slice thickness and 0.6 mm pixel interval) containing 195 noncalcified nodule patterns (3 mm) [35]. Tartar et al. utilized low-dose CT images scanned from 71 different patients with a total of 121 nodules (8–20 mm nodule size interval), totaling 101 CT scans (10 mm slice thickness and 0.586–0.684 pixel interval) [36]. Shiraishi et al. used the dataset containing 67 pulmonary nodules and 67 nonnodules obtained from 46 patients in our previous study [51].

In this study, a dataset containing 95 pulmonary nodules and 75 nonnodules patterns obtained from two-dimensional CT images from 63 patients is used. All of our CT scans are scanned by using the standard imagery protocol. A comparison of the performance of reported CAD systems was shown in Table 4. As seen from the table, the proposed classification approach achieved a sensitivity of 89.6% and an accuracy of 90.7% in the range of 2–20 mm nodule size. All other CAD systems have reasonable sensitivity values in classification of pulmonary nodules. It is extremely important to consider the small nodule size in the classification of a CAD system. This increases the probability of early detection of nodules. Considering these results, it can be seen that the proposed study represents a relatively high sensitivity. In addition, the overall false positive rate per image is calculated as 0.079 by using the expression of (18) for the hybrid approach.

4. Conclusions

In this paper, a new classification approach of pulmonary nodules for a CAD system from CT imagery is presented. An important feature of a CAD system desired by radiologists is that it is able to detect and classify small nodule patterns. The dataset in our study is composed of nodules with relatively smaller diameters (2 mm), as shown in Figure 3 and Table 4.

In the literature, various classification algorithms for CAD systems have been extensively studied. In order to reduce the complexity of the algorithm and the computational load, the use of fewer features is extremely important, while maintaining an acceptable detection performance. For example, the CAD system in Messay et al. [15] uses 40 features selected from a set of 245 features with sensitivity of 82.66%, Hardie et al. [16] uses a subset of 46 features selected from a set of 114 features by sensitivity of 78.1%, and Shiraishi et al. [51] utilizes 71 features by sensitivity of 70.4%, respectively. In this study, in order to choose the best set of image features characterizing the patterns, various feature extraction/selection methods such as 2D-PCA, statistical features of 2D-PCA, morphological image processing based on geometric features, and mRMR feature selection method were implemented.

The performances of the proposed approaches are evaluated by using different classifiers and performance metrics such as accuracy, sensitivity, specificity, AUROC, Kappa statistic, and RMSE. The proposed classification approach utilizes 30 features combined by the hybrid approach with sensitivity of 89.6%, accuracy of 90.7, and specificity of 87.5%.

Considering the test results in Table 2, ensemble learning algorithms yield the best performances on the features suggested in methods 1 and 3. However, especially, in the hybrid approach (method 4) combining the best features of the three methods, nonlinear multilayered ANN is shown to be superior to the other classifiers. Our approach uses ANN classifier with fewer features to avoid generalization problems, high complexity, and computational burden that can be caused by using an ANN with very large number of (potentially irrelevant) features. In addition, as shown in Table 3, false positive (FP) rate is shown to decline in the hybrid approach which provides higher detection performance by using fewer features.

Conflict  of  Interests

The authors have no conflict of interests with the trademarks included in the paper.

Acknowledgment

This work was supported by Scientific Research Projects Coordination Unit of Istanbul University, Project Numbers: 24014, 14381, 31474, and 35119.