Magnetic resonance brain classification by a novel binary particle swarm optimization with mutation and time-varying acceleration coefficients

Shuihua Wang; Preetha Phillips; Jianfei Yang; Ping Sun; Yudong Zhang

doi:10.1515/bmt-2015-0152

Publicly Available Published by De Gruyter February 25, 2016

Magnetic resonance brain classification by a novel binary particle swarm optimization with mutation and time-varying acceleration coefficients

Shuihua Wang , Preetha Phillips , Jianfei Yang , Ping Sun and Yudong Zhang

From the journal Biomedical Engineering / Biomedizinische Technik

https://doi.org/10.1515/bmt-2015-0152

Abstract

Aim:

To develop an automatic magnetic resonance (MR) brain classification that can assist physicians to make a diagnosis and reduce wrong decisions.

Method:

This article investigated the binary particle swarm optimization (BPSO) approach and proposed its three new variants: BPSO with mutation and time-varying acceleration coefficients (BPSO-MT), BPSO with mutation (BPSO-M), and BPSO with time-varying acceleration coefficients (BPSO-T). We first extracted wavelet entropy (WE) features from both approximation and detail sub-bands of eight-level decomposition. Afterwards, we used the proposed BPSO-M, BPSO-T, and BPSO-MT to select features. Finally, the selected features were fed into a probabilistic neural network (PNN).

Results:

The proposed BPSO-MT performed better than BPSO-T and BPSO-M. It finally selected two features of entropies of the following two sub-bands (V1, D1). The proposed system “WE + BPSO-MT + PNN” yielded perfect classification on Data160 and Data66. In addition, it yielded 99.53% average accuracy for the Data255, over 10 repetitions of k-fold stratified cross validation (SCV), higher than state-of-the-art approaches.

Conclusions:

The proposed method is effective for MR brain classification.

Keywords: binary particle swarm optimization; cross validation; magnetic resonance imaging; mutation; probabilistic neural network; time-varying acceleration coefficients; wavelet entropy

Background

Magnetic resonance imaging (MRI) is a popular tool for human body imaging. It is basically a noninvasive technique that provides superior resolution for soft tissues within the brain [52] to traditional computed tomography (CT) [5], ultrasound [1], positron emission tomography (PET) [16], etc.

To develop an easy and swift diagnosis based on brain magnetic resonance (MR) images [14] remains a hot topic in both academic and industrial fields [32, 35, 44]. Traditional manual methods are either tedious or time-consuming, expensive, or irreproducible, as the MR data are commonly enormous. This causes scholars to try to develop an automatic computer-aided diagnosis (CAD) method [2, 39, 62].

The literature shows that a wide variety of automatic methods have been proposed for brain MR image classification. Chaplot et al. [4] were the forerunners to use discrete wavelet transform (DWT) to extract the approximation coefficients and employed both a self-organizing map (SOM) neural network and a support vector machine (SVM) for classification. Wang and Wu [41] presented to use a feed-forward neural network (FNN) to solve the task, that is to make decisions on a given MR brain image as to whether it is healthy or pathological. El-Dahshan et al. [10] employed three-level DWT coefficients and then reduced them by the classical feature reduction technique: principal component analysis (PCA). They utilized two classifiers: artificial neural network (ANN) and K-nearest neighbors (KNN). Dong et al. [9] further suggested a relatively new method, the scaled conjugate gradient (SCG). Das et al. [6] combined ripplet transform (RT) and PCA. They combined the least square method and a support vector machine and termed it LS-SVM. They performed a 5×5 cross validation test, which offered excellent accuracies. Zhang and Wu [50] utilized a kernel support vector machine (KSVM). They used three new kernels: a radial basis function, a homogeneous polynomial, and an inhomogeneous polynomial. Zhang et al. [56] suggested to use particle swarm optimization (PSO) to train the KSVM. El-Dahshan et al. [11] used the feedback pulse-coupled neural network to preprocess the MR images, the DWT and PCA to extract and reduce features, and the ANN to detect pathological brains from normal brains. Wang et al. [42] distinguished Alzheimer’s disease (AD) from healthy controls, by structural MR images by the KSVM decision tree. A five-fold cross validation showed their method yielded 80% accuracy. Zhou et al. [63] utilized wavelet entropy (WT) from MR brain images. They utilized a naive Bayes classifier (NBC) as a classifier. Wang et al. [43] utilized the stationary wavelet transform (SWT) to replace traditional DWT. They presented a hybridization of PSO and the artificial bee colony (HPA) algorithm. Nayak et al. [29] proposed to use two-dimensional DWT and AdaBoost with random forests. Zhang et al. [54] employed a discrete wavelet packet transform (DWPT). They also employed Tsallis entropy (TE) and Shannon entropy (SE) to extract features. Finally, a nonparallel SVM, viz. the generalized eigenvalue proximal support vector machine (GEPSVM), was used as the classifier. Yang et al. [47] used wavelet energy to be the features. To train the SVM, they introduced biogeography-based optimization (BBO). Zhang et al. [53] presented a new method to detect AD subjects by a novel 3D eigenbrain analysis method. They achieved 92.36% accuracy. Jayachandran and Sundararaj [18] proposed a multiclass brain tumor classification system, using fuzzy logic-based hybrid KSVM. Zhang et al. [60] introduced the three-dimensional discrete wavelet transform (3D-DWT) to extract features from volumetric brain MR images. Moeskops et al. [26] used supervised the classification to perform automatic segmentation of MR brain images of preterm infants. Zhang et al. [55] presented a hybridization of BBO and PSO (abbreviated as HBP). Munteanu et al. [27] used proton magnetic resonance spectroscopy (MRS) data in order to detect mild cognitive impairment (MCI) and AD.

After analyzing the above methods, we found most of them performed DWT or its variants for brain MR images. This will lead to a problem that the wavelet coefficients will cost a mass of computer memory. Saritha et al. [34] were the forerunner to introduce a new wavelet-entropy (WE) feature in abnormal brain detection. They then utilized spider-web plots (SWP) with the aim of discarding redundant features, which greatly reduced the feature number to only three. They finally employed the probabilistic neural network (PNN) for classification and achieved 100% accuracy, better than existing methods.

However, three problems arise in their work. (1) How do SWP influence the classification results? Our past work [51] suggested that removing SWP yielded the same classification performance. (2) Can the features be reduced further? We found they only consider the approximation coefficients; hence, in this study, we suggested to consider both approximation coefficients and detail coefficients, and proposed a novel feature selection (FS) method to determine the best feature subset. In our experiments, we demonstrated that we used only two features (the least among all publications) while obtaining 100% classification accuracy. (3) Their work was tested over a 75-image dataset, so how does the algorithm perform over a larger dataset? To answer this, we used three different datasets in this study.

The structure of the article is organized in follows: the following contains the materials used in this study, this is followed by the methodology. The next section presents the experiment results and gives the discussions. The final section concludes the article with future directions. The explanation of acronyms used is appended at the end of this article.

Materials

Three commonly used benchmark datasets use sizes of 66 images, 160 images, and 255 images. We abbreviated them as Data66, Data160, and Data255, respectively. Any user can download these data from the website of Medical School of Harvard University. Figure 1 offers samples of brain MR images.

Figure 1:

Sample of MR brains. MS, Multiple sclerosis; HD, Huntington’s disease.

The cost of predicting abnormal to normal is heavy. The treatments of patients may be delayed. Nevertheless, predicting normal to abnormal is remediable by other diagnostic means. We solve this cost sensitivity (CS) problem by adjusting the class distribution by intentionally picking up more abnormal brains than normal ones.

Methodology

Saritha et al. [34] proposed a method that used only three features while yielding 100% classification accuracy. It is one of the best algorithms for MR brain classification. Their method “WE + SWP + PNN” is listed in Table 1. For details, the readers can refer to their work.

Table 1

Pseudocode of Saritha’s work.

Algorithm 1: Saritha’s method (WE + SWP + PNN)
Step 1	Acquire the image or data
Step 2	Choose the proper wavelet for analysis
Step 3	Obtain the entropy of the wavelet decompositions
Step 4	Construct the spider-web plots (SWP)
Step 5	Calculate areas of SWP
Step 6	Perform statistical analysis of the areas
Step 7	Classify using PNN with suitable areas as feature set

The difference between our work and Saritha’s work lie in two major points. First, we not only consider detailed coefficients but also approximation coefficients, while Saritha et al. [34] only considered the latter. Second, we proposed a novel advanced FS method to select the optimal feature, while Saritha et al. [34] used SWP.

Wavelet entropy

As is known, the famous DWT is a signal processing tool that used the dyadic scales and positions for multilevel and multiresolution analyses [12]. In addition, entropy is traditionally a statistical measure of randomness, which was then redefined as an uncertainty measure for the information content of a system with the definition of S=-Σp_jlog₂(p_j), where j represents the gray value of reconstructed coefficient, and p_j the corresponding probability.

In this study, we performed an eight-level db4 wavelet and thus obtained 25 WE features [54] for each MR brain image. Table 2 shows the decomposition components of both Saritha’s method and our method. Note that the feasible combination should be 2^N for a total set of N features; hence, we need to pick up an optimal solution from the 2²⁵=33,554,432 types of combination. A feasible solving technique is proposed below.

Table 2

Decomposition components extracted from an eight-level decomposition.

Level	Saritha’s method	Our method
1	A1	(H1, D1, V1)
2	A2	(H2, D2, V2)
…	…	…
7	A7	(H7, D7, V7)
8	A8	(H8, D8, V8, A8)

Feature selection

FS explores the combination spaces to find the optimal feature combination. Generally speaking, there are three categories of FS (See Table 3). One is the traditional exponential algorithms, which evaluated enormous subsets, the number of which will grow in an exponential way when the dimension grows. The typical algorithm is exhaustive search. Another category is the sequential methods. They removed or added features one by one; however, those algorithms have a deficiency of being trapped in local minima. The third are the metaheuristics algorithms that incorporate randomness to the search so as to guarantee it escapes from local minima points. Two typical algorithms are simulated annealing (SA) and genetic algorithm (GA).

Table 3

Summary of state-of-the-art algorithms of feature selection (FS).

Category	Typical methods
Exponential	Exhaustive research, beam search, branch and bound
Sequential	Plus-l minus-r selection, sequential backward selection, sequential forward selection, bidirectional search, sequential floating selection
Metaheuristics	Random generation, simulated annealing, particle swarm optimization, genetic algorithm, ant colony optimization

Nevertheless, both GA and SA have two problems. (1) They are sensitive to initial populations. (2) The search procedure costs lengthy time [19, 21]. To alleviate these two problems, PSO was proposed to be a naive metaheuristic, biologically inspired by the swarming behavior of ants, fish, birds, bees, etc. PSO is proposed initially to solve continuous problems. Recently, its variant, the binary PSO (BPSO), was presented by scholars to extend its ability to solve discrete problems.

Encoding

Binary encoding was used in the way that every candidate was represented by a particle, associated with two characteristics (velocity v and position x). For the i^th particle, the two characteristics were in the form of

(1)xi=(xi1, xi2, xi3, …, xiN), xij∈(0, 1), (1)

(2)vi=(vi1, vi2, vi3, …, viN). (2)

Here, N denotes the problem dimension, and i the particle index. For the FS problem, the x_ij describes the status of corresponding j^th feature: selected or not. Figure 2 illustrates a six-dimensional FS problem, where an i^th particle x_i=[0, 1, 0, 0, 1, 1] suggests the corresponding 2^nd, 5^th, and 6^th features are selected.

Figure 2:

A six-dimensional feature selection (FS) problem.

Particle swarm optimization

PSO evaluates the fitness functions of the whole swarm in each step. The v_i of i^th particle is updated by the positions of the two best particles [13, 20, 58]: (1) the optimal position that a particle had traveled (p_B) and (2) the optimal position the neighbors of i^th particle had traveled so far (n_B). When the whole swarm is treated as a neighborhood area, the neighborhood becomes the global best and is accordingly called “g_B”. From above we can deduce the updated equations as

(3)v=c×v+a1×rand()×(pB-x)+a2×rand()×(nB-x), (3)

(4)x=v+x. (4)

Here, rand() is a random number generator whose values fall within the range of [0, 1]. The rand() are performed when they occur. c is termed the “inertia weight” [30, 38, 61]. If c is less than 1, the particle favors exploitation over exploration; else if c is larger than 1, the particle favors exploration over exploitation.

The parameters a₁ and a₂ are non-negative constants named as “acceleration coefficients” [3, 37, 40]. The positions of the particle swarm are changed according to formulas (3) and (4). They will become close to each other from various directions. The PSO ran through these processes iteratively until the termination criterion was met [15, 48]. Note that v_max (the maximum velocity) should be determined beforehand, with the aim of keeping the optimizers within a reasonable range [33].

Binary particle swarm optimization

The BPSO initializes the velocities and positions of the swarm randomly [22],

(5)xij={1, if rand()>0.50, otherwise, (5)

(6)vij=-vmax+2×rand()×vmax. (6)

The positions x_ij for each variable was calculated by

(7)xij={1, if S(vij)>rand()0, otherwise. (7)

Here, S(.) denotes the logistic function, by which it serves as the probability distribution for the position x_ij,

(8)S(vij)=11+exp(-vij). (8)

The velocities v_ij are iteratively updated by Equation (3). From an empirical point, the value S(v_max) should be <1, which increases the chance to produce new solutions [25, 57].

Unlike traditional PSO, the positions of BPSO are within the Hamming space (a set of all 2^N binary strings with length of N) [31]. Therefore, divergence will not occur due to the limited string length. Nevertheless, the premature convergence may exist. To solve it, the maximum velocity v_max is important.

Two improvements

With the aim of improving the diversity of BPSO, we proposed a mutation operator as

(9)xij={~xij if rand()≤rmutxij otherwise, (9)

where r_mut represents the chance of random mutation. Each chromosome is mutated with a probability r_mut, after position properties are updated by (7). r_mut is commonly assigned with a value of 1/N, which suggests that one bit in each candidate will be flipped on average.

Another improvement of time-varying acceleration coefficients (TVAC) technique [36] was introduced, which can augment the global search ability in the initial stage and encourage the local search ability of particles at the end of the search. In order to achieve this goal, TVAC gives more weight to cognitive component and less weight to social component at the former stage, and gives less weight to cognitive component and more weight to social component in the latter stage. Mathematically, TVAC tunes the “acceleration coefficients” a₁ and a₂ as

(10)a1=(a1f-a1i)∗t/tmax+a1i, (10)

(11)a2=(a2f-a2i)∗t/tmax+a2i, (11)

where a_1i and a_1f denote the initial and final values of a₁, respectively. a_2i and a_2f denote the initial and final values of a₂, respectively. t_max denotes the number of maximum iteration.

After embedding the two improvements to conventional BPSO, we named this variant the binary particle swarm optimization with mutation and TVAC (BPSO-MT). For fair comparison, we combined only the mutation operator with BPSO and combined only TVAC with BPSO. We then named the two variants the binary particle swarm optimization with mutation (BPSO-M) and the binary particle swarm optimization with TVAC (BPSO-T).

Probabilistic neural network

For the rest, we used PNN, which has gained interest as it yields a probabilistic score for each input. Suppose there are two classes (A and B), we should decide which class the sample x=[x₁, …, x_N] belongs to [17]. Suppose h_A and h_B represent the a priori probabilities of instances in classes A and B, respectively, the Bayes decision rule turns to

(12)Class(x)={Aif lAhAfA(x)>lBhBfB(x)Botherwise, (12)

where l_A represents the loss function of the wrong decision that x is in class B when class(x)=A, and the same for l_B. The losses are equal to zero for correct corrections [55]. f_A and f_B are the probability density functions (PDFs) of classes A and B, respectively.

In a simple case that the l_A=l_B and h_A=h_B, the classifier predicts a new instance to the class with higher PDF. To embed the Gaussian kernel, the PDF of class A can be expressed as

(13)fA(x)=1(2π)N/2σN1TA∑j=1TAexp[-(x-xAj)T(x-xAj)2s2]. (13)

Here, s is the smoothing factor, T_A the number of training samples in class A, and x_Aj the j^th sample in class A.

Figure 3 illustrates the structure of a PNN. Its mathematical expressions are

Figure 3:

Structure of probabilistic neural network (PNN).

(14)a=fr(b∗|I-x|), (14)

(15)y=fc(a∗L). (15)

Here, I denotes the input weight, L the layer weight, f_r the radial basis function, and f_c the compete function,

(16)fr(x)=exp(-x2)fc(x)=ex=[0 ... 0 1x 0 ... 0]. (16)

The parameter setting of PNN in this article is the same as in the work of Saritha et al. [34].

Statistical setting

Table 4 shows the stratified cross validation (SCV) setting. Following common convention, six-fold and five-fold were used for Data66 and the other two datasets, respectively.

Table 4

Statistical setting.

Data	Training		Validation		Total		No. of fold
Data	A	N	A	N	A	N	No. of fold
Data66	40	15	8	3	48	18	6
Data160	112	16	28	4	140	20	5
Data255	176	28	44	7	220	35	5

A, Abnormal; N, normal.

Implementation

The pseudocode of this proposed methodology is given in Table 5. Compared to Saritha’s method in Table 1, this proposed method uses less procedures and can save computation time. Figure 4 shows the diagram of the proposed system.

Table 5

Pseudocode of the proposed method.

Algorithm 2: The proposed method (WE + BPSO-MT + PNN)
Step 1	Acquire the MR brain image
Step 2	Calculate the eight-level wavelet decomposition
Step 3	Obtain the wavelet entropy values on 25 sub-band coefficients
Step 4	Feature selection by BPSO-MT
Step 5	Output the final classifier constructed by the optimal feature subset and PNN

Figure 4:

Diagram of the proposed system.

Experiments, results, and discussions

Wavelet decomposition

A normal MR brain image is shown in Figure 5A. The one-level and two-level decomposition results are shown in Figure 5B and C, respectively. The high-level decomposition results are not shown, as the approximation coefficients are too small. After this step, we yield a total of 25 sub-bands and corresponding WE values.

Figure 5:

Illustration of the discrete wavelet transform (DWT) decomposition result.

BPSO-MT vs. BPSO-M and BPSO-T

We compared the proposed BPSO-MT with the two proposed methods (BPSO-M and BPSO-T). Each algorithm ran 10 times. The maximum, minimum, and mean of the number of the final selected features by different algorithms are listed in Table 6.

Table 6

BPSO-MT vs. other two proposed methods.

Algorithm	Maximum	Minimum	Mean
No-FS^a	25	25	25
BPSO-M (proposed)	7	2	4
BPSO-T (proposed)	5	3	4
BPSO-MT (proposed)	4	2	3

^aNo-FS represents not implementing any feature selection method.

Results in Table 6 show that among 10 runs, BPSO-M obtains the maximum feature number of 7, minimum number of 2, and mean number of 4; the BPSO-T obtains the maximum number of 5, minimum number of 3, and mean number of 4; and the BPSO-MT obtains the maximum number of 4, minimum number of 2, and mean number of 3. Therefore, we can conclude that mutation operator offers variability to the population, so the BPSO-M can find the global minimal value, but it fails in the robustness. The BPSO-T is more robust than BPSO-M; however, it cannot reach the global minimal value. The BPSO-MT combines the advantages of both BPSO-M and BPSO-T, so it finally not only obtains the global optimal result but also obtains the least mean values (more robust than BPSO-M and BPSO-T).

BPSO-MT vs. other heuristics methods

In the third experiment, we compared the best proposed method, BPSO-MT, with other five heuristics methods, including GA [8], PSO [59], restarted SA (RSA) [49], ant colony optimization (ACO) [46], and BPSO [24]. Parameter setting is the same as above and the results are listed in Table 7.

Table 7

Number of selected features (10 runs).

Algorithm	Maximum	Minimum	Mean
No-FS^a	25	25	25
GA [8]	10	4	7
RSA [49]	14	6	9
ACO [46]	8	3	6
PSO [59]	5	3	4
BPSO [24]	5	3	4
BPSO-MT (proposed)	4	2	3

^aNo-FS represents not implementing any feature selection method.

We can find from Table 7 that the proposed “BPSO-MT” method yields the least features among all algorithms with the maximum of 4, minimum of 2, and mean of 3. The BPSO and PSO yield the same results with the maximum feature number of 5, minimum of 3, and mean of 4. The ACO selects six features on average with the maximum of 8 and minimum of 3. The GA selected seven features on average with the maximum of 10 and minimum of 4. The RSA performs worst with the maximum feature number of 14, minimum of 6, and mean of 9.

Selected features

The two most important features obtained by BPSO-MT are the entropies over (V1, D1). The selected features of all other algorithms are listed in Table 8. Here, GA, RSA, PSO, and BPSO-MT find the same number of least features among 10 runs. The proposed BPSO-MT finds a two-feature combination of (V1, D1). Nevertheless, ACO and BPSO find different combinations among 10 runs. The ACO selects at least three features among 10 runs, they are either (H1, V1, D1) or (H2, V1, D1). The BPSO selects two different 3-feature combinations of (H1, V1, D1) and (V1, D1, A1) among 10 runs.

Table 8

Selected features (the least of 10 runs).

Algorithm	Least-feature combination(s)	No. of least features
No-FS	H1–H7, V1–V7, D1–D7, A1–A8	25
GA [8]	(H2, V2, H1, A1)	4
RSA [49]	(H2, V2, H1, V1, D1, A1)	6
ACO [46]	(H1, V1, D1)	3
	(H2, V1, D1)
PSO [59]	(H1, V1, D2)	3
BPSO [24]	(H1, V1, D1)	3
	(V1, D1, A1)
BPSO-MT (proposed)	(V1, D1)	2

Comparing the BPSO and PSO results in Table 8, we can find that BPSO finds two best solutions while PSO only finds one best solution. It suggests that BPSO have better exploration ability than PSO. The reason is because initialized and updated positions of the former method are coherent with the binary properties of the problem, while the latter does not consider the binary characteristics of the positions. The most important result in Table 8 is that the proposed BPSO-MT yields a two-feature combination of (V1, D1). The number of selected features of BPSO-MT is only two, less than all existing methods. This proves again the superiority of the proposed BPSO-MT. Next, we will test the performance of these two features selected by BPSO-MT.

We can conclude that the BPSO-MT performs better than GA, RSA, ACO, PSO, and BPSO with respect to selected features, as it obtains only two (the least) features. The reason may fall in the two improvements we suggested in Section 3.2.4, where we embed the mutation operator and TVAC into the BPSO. The classification of traditional GA, RSA, and ACO does not obtain good results, as they are not designed for binary problem (the FS problem) intentionally.

Comparison with Saritha’s method

Ten runs of k-fold SCV were carried out on three datasets. Note that true class means abnormal brains, and false class means normal brains. The average accuracy on 10 runs is recorded and listed in Table 9. Here, we can see the proposed “WE + BPSO-MT + PNN” yields higher classification accuracy (100.00% for Data66, 100.00% for Data160, and 99.53% for Data255) than “WE + SWP + PNN” (100.00% for Data66, 99.94% for Data160, and 98.86% for Data255). From the point of feature number, the proposed “WE + BPSO-MT + PNN” only uses two features of (V1, D1), less than the feature number by “WE + SWP + PNN” [34] of 3.

Table 9

Comparison with Saritha’s method (10 runs).

Algorithm	Feature number	Data66	Data160	Data255
WE + SWP + PNN [34]	3	100.00	99.94	98.86
WE + BPSO-MT + PNN (proposed)	2	100.00	100.00	99.53

Comparison with other MR brain classification methods

To further demonstrate the effectiveness of this proposed “WE + BPSO-MT + PNN”, we compared with existing different algorithms, such as DWT + SVM [4], DWT + SOM [4], DWT + RBF-SVM [4], DWT + PCA + ANN [10], DWT + PCA + KNN [10], DWT + PCA + SCG [9], RT + PCA + LS-SVM [6], DWT + PCA + SVM [50], DWT + PCA + HPOL-SVM [50], DWT + PCA + IPOL-SVM [50], DWT + PCA + RBF-SVM [50], PCNN + DWT + PCA + ANN [11], WE + NBC [63], SWT + PCA + IABAP-FNN [43], SWT + PCA + ABC-SPSO-FNN [43], SWT + PCA + HPA-FNN [43], DWPT + SE + GEPSVM [54], DWPT + TE + GEPSVM [54], and DWPT + SE + GEPSVM + RBF [54]. The meaning of these abbreviations can be found in Table 11.

From Table 10, we can see the Data66 is too small, which leads to many algorithms obtain 100% accuracy. For the Data160, SWT + PCA + HPA-FNN [43], DWPT + TE + GEPSVM [54], and this proposed WE + BPSO-MT + PNN achieved perfect classification. For the Data255, this proposed “WE + BPSO-MT + PNN” yields the highest accuracy of 99.53% while using the least features of two. The second best algorithm is “SWT + PCA + HPA-FNN” [43] with average accuracy of 99.45%. The third is “RT + PCA + LS-SVM” [6] with average accuracy of 99.39%.

Table 10

Comparison with other MR brain classification methods.

Algorithms	Feature number	Run number	Data66	Data160	Data255
DWT + SVM [4]	4761	5	96.15	95.38	94.05
DWT + SOM [4]	4761	5	94.00	93.17	91.65
DWT + RBF-SVM [4]	4761	5	98.00	97.33	96.18
DWT + PCA + ANN [10]	7	5	97.00	96.98	95.29
DWT + PCA + KNN [10]	7	5	98.00	97.54	96.79
DWT + PCA + SCG [9]	19	5	100.00	99.27	98.82
RT + PCA + LS-SVM [6]	9	5	100.00	100.00	99.39
DWT + PCA + SVM [50]	19	5	96.01	95.00	94.29
DWT + PCA + HPOL-SVM [50]	19	5	98.34	96.88	95.61
DWT + PCA + IPOL-SVM [17]	19	5	100.00	98.12	97.73
DWT + PCA + RBF-SVM [50]	19	5	100.00	99.38	98.82
PCNN + DWT + PCA + ANN [11]	7	10	100.00	98.88	98.24
WE + NBC [63]	7	10	92.58	91.87	90.51
SWT + PCA + IABAP-FNN [43]	7	10	100.00	99.44	99.18
SWT + PCA + ABC-SPSO-FNN [43]	7	10	100.00	99.75	99.02
SWT + PCA + HPA-FNN [43]	7	10	100.00	100.00	99.45
DWPT + SE + GEPSVM [54]	16	10	99.85	99.62	98.78
DWPT + TE + GEPSVM [54]	16	10	100.00	100.00	99.33
DWPT + SE + GEPSVM + RBF [54]	16	10	100.00	99.88	99.33
WE + BPSO-MT + PNN (proposed)	2	10	100.00	100.00	99.53

Table 11

Acronym list.

Abbreviation	Definition
(B)PSO(-M)(-T)(-MT)	(Binary) Particle swarm optimization (-mutation) (-TVAC) (-mutation and TVAC)
(H)(I)POL	(Homogeneous) (inhomogeneous) Polynomial
(S)(T)E	(Shannon) (Tsallis) entropy
ABC(-SPSO)	Artificial bee colony (-standard PSO)
ANN	Artificial neural network
CAD	Computer-aided diagnosis
CS	Cost sensitivity
DW(P)T	Discrete wavelet (packet) transform
FNN	Feed-forward neural network
FS	Feature selection
GEPSVM	Generalized eigenvalue proximal SVM
HPA	Hybridization of PSO and ABC
IABAP	Integrated algorithm based on ABC and PSO
LS-SVM	Lease-square SVM
KNN	K-nearest neighbors
MR(I)	Magnetic resonance (imaging)
NBC	Naive Bayesian classifier
PCA	Principal component analysis
PCNN	Pulse-coupled neural network
PNN	Probabilistic neural network
RBF	Radial basis function
RT	Ripplet transform
SCG	Scaled conjugate gradient
SCV	Stratified cross validation
SOM	Self-organizing map
SWT	Stationary wavelet transform
SVM	Support vector machine
TVAC	Time-varying acceleration coefficients
WE	Wavelet entropy

Conclusions and future research

In this work, we proposed a new approach based on Saritha’s method for MR brain image classification. We proposed a novel FS method named as BPSO-MT, in order to find optimal feature combination from the entropy of both approximation and detail sub-bands of eight-level DWT decomposition. The results show that the BPSO-MT yields better results than existing FS methods. Meanwhile, the proposed “WE + BPSO-MT + PNN” method excels existing MR brain classifiers.

Our contributions are the following: (1) We proposed three novel FS methods of BPSO-T, BPSO-M, and BPSO-MT and proved BPSO-MT was the best among all. (2) The proposed system “WE + BPSO-MT + PNN” is superior to state-of-the-art approaches in terms of both classification accuracy and feature number.

In the future, we will test other advanced feature extraction methods, such as fractional wavelet, fractional Fourier entropy [45], and dual-tree complex wavelet transform. Besides, features may be extracted from the reconstruction procedure [7]. To generalize our method to 3D, we may need the help of 3D printing [23].

Corresponding author: Yudong Zhang, School of Computer Science and Technology, Nanjing Normal University, Nanjing, Jiangsu 210023, China; Jiangsu Key Laboratory of 3D Printing Equipment and Manufacturing, Nanjing, Jiangsu 210042, China; and Guangxi Key Laboratory of Manufacturing System and Advanced Manufacturing Technology, Guilin, Guangxi 541004, China, Phone: +86-15905183664, E-mail: zhangyudong@njnu.edu.cn

Acknowledgments

This article was supported by NSFC (51407095, 61503188), Natural Science Foundation of Jiangsu Province (BK20150983, BK20150982), Jiangsu Key Laboratory of 3D Printing Equipment and Manufacturing (BM2013006), Key Supporting Science and Technology Program (Industry) of Jiangsu Province (BE2012201, BE2013012-2, BE2014009-3), Program of Natural Science Research of Jiangsu Higher Education Institutions (15KJB470010, 13KJB460011, 14KJB480004, 14KJB520021), Special Funds for Scientific and Technological Achievement Transformation Project in Jiangsu Province (BA2013058), Nanjing Normal University Research Foundation for Talented Scholars (2013119XGQ0061, 2014119XGQ0080), and Open Fund of Guangxi Key Laboratory of Manufacturing System and Advanced Manufacturing Technology (15-140-30-008K).

Conflict of interest: We have no conflicts of interest to disclose with regard to the subject matter of this article.

References

[1] Babu JJJ, Sudha GF. Adaptive speckle reduction in ultrasound images using fuzzy logic on Coefficient of Variation. Biomed Signal Proces 2016; 23: 93–103.10.1016/j.bspc.2015.08.001Search in Google Scholar

[2] Belei P, Schkommodau E, Frenkel A, et al. Computer-assisted single- or double-cut oblique osteotomies for the correction of lower limb deformities. P I Mech Eng H 2007; 221: 787–800.10.1243/09544119JEIM276Search in Google Scholar PubMed

[3] Carneiro TC, Melo SP, Carvalho PCM, Braga APdS. Particle swarm optimization method for estimation of Weibull parameters: a case study for the Brazilian northeast region. Renew Energ 2016; 86: 751–759.10.1016/j.renene.2015.08.060Search in Google Scholar

[4] Chaplot S, Patnaik LM, Jagannathan NR. Classification of magnetic resonance brain images using wavelets as input to support vector machine and neural network. Biomed Signal Proces 2006; 1: 86–92.10.1016/j.bspc.2006.05.002Search in Google Scholar

[5] Chen Y, Zhang Y, Yang J, et al. Curve-like structure extraction using minimal path propagation with back-tracing. IEEE T Image Process 2015; 99: 1–16.Search in Google Scholar

[6] Das S, Chowdhury M, Kundu MK. Brain MR image classification using multiscale geometric analysis of Ripplet. Prog Electromagn Res 2013; 137: 1–17.10.2528/PIER13010105Search in Google Scholar

[7] de la Fuente M, Ohnsorge JAK, Schkommodau E, Jetzki S, Wirtz DC, Radermacher K. Fluoroscopy-based 3-D reconstruction of femoral bone cement: a new approach for-revision total hip replacement. IEEE T Bio-Med Eng 2005; 52: 664–675.10.1109/TBME.2005.844032Search in Google Scholar PubMed

[8] De Stefano C, Fontanella F, Marrocco C, di Freca AS. A GA-based feature selection approach with an application to handwritten character recognition. Pattern Recogn Lett 2014; 35: 130–141.10.1016/j.patrec.2013.01.026Search in Google Scholar

[9] Dong Z, Wu L, WangS. A hybrid method for MRI brain image classification. Expert Systems with Applications 2011; 38: 10049–10053.10.1016/j.eswa.2011.02.012Search in Google Scholar

[10] El-Dahshan ESA, Hosny T, Salem ABM. Hybrid intelligent techniques for MRI brain images classification. Digit Signal Process 2010; 20: 433–441.10.1016/j.dsp.2009.07.002Search in Google Scholar

[11] El-Dahshan E-SA, Mohsen HM, Revett K, Salem A-BM. Computer-aided diagnosis of human brain tumor through MRI: A survey and a new algorithm. Expert Systems with Applications 2014; 41: 5526–5545.10.1016/j.eswa.2014.01.021Search in Google Scholar

[12] Fang L, Wu L, Zhang Y. A novel demodulation system based on continuous wavelet transform. Math Probl Eng 2015; 2015: 9.10.1155/2015/513849Search in Google Scholar

[13] Ghovvati M, Khayati G, Attara H, Vaziria A. Comparison across growth kinetic models of alkaline protease production in batch and fed-batch fermentation using hybrid genetic algorithm and particle swarm optimization. Biotechnol Biotec Eq 2015; 29: 1216–1225.10.1080/13102818.2015.1077686Search in Google Scholar

[14] Goh S, Dong Z, Zhang Y, DiMauro S, Peterson BS. Mitochondrial dysfunction as a neurobiological subtype of autism spectrum disorder: evidence from brain imaging. JAMA Psychiatry 2014; 71: 665–671.10.1001/jamapsychiatry.2014.179Search in Google Scholar PubMed PubMed Central

[15] Gong MG, Wu Y, Cai Q, et al. Discrete particle swarm optimization for high-order graph matching. Inform Sciences 2016; 328: 158–171.10.1016/j.ins.2015.08.038Search in Google Scholar

[16] Henriksen OM, Larsen VA, Muhic A, et al. Simultaneous evaluation of brain tumour metabolism, structure and blood volume using F-18 -fluoroethyltyrosine (FET) PET/MRI: feasibility, agreement and initial experience. Eur J Nucl Med Mol I 2016; 43: 103–112.10.1007/s00259-015-3183-6Search in Google Scholar PubMed

[17] Hirschauer TJ, Adeli H, Buford JA. Computer-aided diagnosis of Parkinson’s disease using enhanced probabilistic neural network. J Med Syst 2015; 39: 12.10.1007/s10916-015-0353-9Search in Google Scholar PubMed

[18] Jayachandran A, Sundararaj GK. Abnormality segmentation and classification of multi-class brain tumor in MR images using fuzzy logic-based hybrid kernel SVM. Int J Fuzzy Syst 2015; 17: 434–443.10.1007/s40815-015-0064-xSearch in Google Scholar

[19] Kanan HR, Faez K. GA-based optimal selection of PZMI features for face recognition. Appl Math Comput 2008; 205: 706–715.10.1016/j.amc.2008.05.114Search in Google Scholar

[20] Lahmiri S. Interest rate next-day variation prediction based on hybrid feedforward neural network, particle swarm optimization, and multiresolution techniques. Physica A 2016; 444: 388–396.10.1016/j.physa.2015.09.061Search in Google Scholar

[21] Lin S-W, Lee Z-J, Chen S-C, Tseng T-Y. Parameter determination of support vector machine and feature selection using simulated annealing approach. Appl Soft Comput 2008; 8: 1505–1512.10.1016/j.asoc.2007.10.012Search in Google Scholar

[22] Liu D, Jiang QL, Chen JX. Binary inheritance learning particle swarm optimisation and its application in thinned antenna array synthesis with the minimum sidelobe level. IET Microwaves Antennas Propag 2015; 9: 1386–1391.10.1049/iet-map.2015.0071Search in Google Scholar

[23] Lukic M, Clarke J, Tuck C, Whittow W, Wells G. Printability of elastomer latex for additive manufacturing or 3D printing. J Appl Polym Sci 2016; 133: 7.10.1002/app.42931Search in Google Scholar

[24] Menhas MI, Wang L, Fei M, Pan H. Comparative performance analysis of various binary coded PSO algorithms in multivariable PID controller design. Expert Syst Appl 2012; 39: 4390–4401.10.1016/j.eswa.2011.09.152Search in Google Scholar

[25] Mirhadi S, Soleimani M. Topology design of dual-band antennas using binary particle swarm optimization and discrete Green’s functions. Electromagnetics 2015; 35: 393–403.10.1080/02726343.2015.1053351Search in Google Scholar

[26] Moeskops P, Benders MJ, Chiţ SM, et al. Automatic segmentation of MR brain images of preterm infants using supervised classification. Neuroimage 2015; 118: 628-641.10.1016/j.neuroimage.2015.06.007Search in Google Scholar PubMed

[27] Munteanu CR, Fernandez-Lozano C, Abad VM, et al. Classification of mild cognitive impairment and Alzheimer’s disease with machine-learning techniques using H-1 magnetic resonance spectroscopy data. Expert Syst Appl 2015; 42: 6205–6214.10.1016/j.eswa.2015.03.011Search in Google Scholar

[28] Nagamani G, Radhika T. Dissipativity and passivity analysis of T-S fuzzy neural networks with probabilistic time-varying delays: a quadratic convex combination approach. Nonlinear Dynam 2015; 82: 1325–1341.10.1007/s11071-015-2241-8Search in Google Scholar

[29] Nayak DR, Dash R, Majhi B. Brain MR image classification using two-dimensional discrete wavelet transform and AdaBoost with random forests. Neurocomputing 2016; 177: 188–197.10.1016/j.neucom.2015.11.034Search in Google Scholar

[30] Nilakantan JM, Ponnambalam SG. Robotic U-shaped assembly line balancing using particle swarm optimization. Eng Optimiz 2016; 48: 231–252.10.1080/0305215X.2014.998664Search in Google Scholar

[31] Nojavan S, Mehdinejad M, Zare K, Mohammadi-Ivatloo B. Energy procurement management for electricity retailer using new hybrid approach based on combined BICA-BPSO. Int J Elec Power 2015; 73: 411–419.10.1016/j.ijepes.2015.05.033Search in Google Scholar

[32] Prabin A, Veerappan J. Modified micro structure descriptors and hybrid-RBF kernel SVM based diagnosis of brain tumor in MRI images. J Med Imaging Health Inf 2015; 5: 1194–1200.10.1166/jmihi.2015.1515Search in Google Scholar

[33] Saad NH, El-Sattar AA, Mansour AM. Improved particle swarm optimization for photovoltaic system connected to the grid with low voltage ride through capability. Renew Energ 2016; 85: 181–194.10.1016/j.renene.2015.06.029Search in Google Scholar

[34] Saritha M, Joseph KP, Mathew AT. Classification of MRI brain images using combined wavelet entropy based spider web plots and probabilistic neural network. Pattern Recogn Lett 2013; 34: 2151–2156.10.1016/j.patrec.2013.08.017Search in Google Scholar

[35] Shiroishi MS, Panigrahy A, Moore KR, et al. Combined MRI and MRS improves pre-therapeutic diagnoses of pediatric brain tumors over MRI alone. Neuroradiology 2015; 57: 951–956.10.1007/s00234-015-1553-1Search in Google Scholar PubMed PubMed Central

[36] Solanki R, Chaturvedi KT, Patidar NP. Different penalty handling based economic dispatch using time varying acceleration coefficients. In: Panigrahi BK, Suganthan PN, Das S, editors. Swarm, evolutionary, and memetic computing, semcco 2014. vol. 8947. Berlin: Springer-Verlag Berlin 2015, 750–764.10.1007/978-3-319-20294-5_64Search in Google Scholar

[37] Sung YC, Wang CY, Teo EH. Application of particle swarm optimisation to construction planning for cable-stayed bridges by the cantilever erection method. Struct Infrastruct E 2016; 12: 208–222.10.1080/15732479.2015.1008521Search in Google Scholar

[38] Tanweer MR, Suresh S, Sundararajan N. Dynamic mentoring and self-regulation based particle swarm optimization algorithm for solving complex real-world optimization problems. Inform Sciences 2016; 326: 1–24.10.1016/j.ins.2015.07.035Search in Google Scholar

[39] Thorsen F, Fite B, Mahakian LM, et al. Multimodal imaging enables early detection and characterization of changes in tumor permeability of brain metastases. J Control Release 2013; 172: 812–822.10.1016/j.jconrel.2013.10.019Search in Google Scholar PubMed PubMed Central

[40] Wang S, Ji G. A comprehensive survey on particle swarm optimization algorithm and its applications. Math Probl Eng 2015; 2015: 38.Search in Google Scholar

[41] Wang S, Wu L. A novel method for magnetic resonance brain image classification based on adaptive chaotic PSO. Prog Electromagn Res 2010; 109: 325–343.10.2528/PIER10090105Search in Google Scholar

[42] Wang S, Dong Z, Ji GL, et al. Classification of Alzheimer disease based on structural magnetic resonance imaging by kernel support vector machine decision tree. Prog Electromagn Res 2014; 144: 171–184.10.2528/PIER13121310Search in Google Scholar

[43] Wang S, Zhang Y, Dong Z, et al. Feed-forward neural network optimized by hybridization of PSO and ABC for abnormal brain detection. Int J Imag Syst Tech 2015; 25: 153–164.10.1002/ima.22132Search in Google Scholar

[44] Wang S, Zhang Y, Liu G, Phillips P, Yuan TF. Detection of Alzheimer’s disease by three-dimensional displacement field estimation in structural magnetic resonance imaging. J Alzheimers Dis 2016; 50: 233–248.10.3233/JAD-150848Search in Google Scholar PubMed

[45] Wang S, Zhang Y, Yang X, et al. Pathological brain detection by a novel image feature – fractional fourier entropy. Entropy 2015; 17: 8278–8296.10.3390/e17127877Search in Google Scholar

[46] Yang JH, Shi XH, Marchese M. An ant colony optimization method for generalized TSP problem. Prog Nat Sci 2008; 18: 1417–1422.10.1016/j.pnsc.2008.03.028Search in Google Scholar

[47] Yang G, Zhang Y, Yang J, et al. Automated classification of brain images using wavelet-energy and biogeography-based optimization. Multimed Tools Appl 2015; 1–17. doi: 10.1007/s11042-015-2649-7 (Online published).Search in Google Scholar

[48] Zhang Y, Wu L. Crop Classification by forward neural network with adaptive chaotic particle swarm optimization. Sensors 2011; 11: 4721–4743.10.3390/s110504721Search in Google Scholar PubMed PubMed Central

[49] Zhang Y, Wu L. A robust hybrid restarted simulated annealing particle swarm optimization technique. Advances in Computer Science and its Applications 2012; 1: 5–8.Search in Google Scholar

[50] Zhang Y, Wu L. An MR brain images classifier via principal component analysis and kernel support vector machine. Prog Electromagn Res 2012; 130: 369–388.10.2528/PIER12061410Search in Google Scholar

[51] Zhang Y, Dong Z, Ji G, Wang S. Effect of spider-web-plot in MR brain image classification. Pattern Recogn Lett 2015; 62: 14–16.10.1016/j.patrec.2015.04.016Search in Google Scholar

[52] Zhang Y, Dong Z, Phillips P, Wang S, Ji G, Yang J. Exponential wavelet iterative shrinkage thresholding algorithm for compressed sensing magnetic resonance imaging. Inform Sciences 2015; 322: 115–132.10.1016/j.ins.2015.06.017Search in Google Scholar

[53] Zhang Y, Dong Z, Phillips P, et al. Detection of subjects and brain regions related to Alzheimer’s disease using 3D MRI scans based on eigenbrain and machine learning. Front Comput Neurosci 2015; 66: 1–15.10.3389/fncom.2015.00066Search in Google Scholar PubMed PubMed Central

[54] Zhang Y, Dong Z, Wang S, Ji G, Yang J. Preclinical diagnosis of magnetic resonance (MR) brain images via discrete wavelet packet transform with Tsallis entropy and generalized eigenvalue proximal support vector machine (GEPSVM). Entropy 2015; 17: 1795–1813.10.3390/e17041795Search in Google Scholar

[55] Zhang Y, Wang S, Dong Z, Phillip P, Ji G, Yang J. Pathological brain detection in magnetic resonance imaging scanning by wavelet entropy and hybridization of biogeography-based optimization and particle swarm optimization. Prog Electromagn Res 2015; 152: 41–58.10.2528/PIER15040602Search in Google Scholar

[56] Zhang Y, Wang S, Ji G, Dong Z. An MR brain images classifier system via particle swarm optimization and kernel support vector machine. Scientific World Journal 2013; 2013: 9.10.1155/2013/130134Search in Google Scholar PubMed PubMed Central

[57] Zhang Y, Wang S, Phillips P, Ji G. Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl-Based Syst 2014; 64: 22–31.10.1016/j.knosys.2014.03.015Search in Google Scholar

[58] Zhang Y, Wang S, Sun Y, et al. Binary structuring elements decomposition based on an improved recursive dilation-union model and RSAPSO method. Math Probl Eng 2014; 2014: 12.10.1155/2014/272496Search in Google Scholar

[59] Zhang Y, Wang S, Wu L. Spam detection via feature selection and decision tree. Adv Sci Lett 2012; 5: 726–730.10.1166/asl.2012.1768Search in Google Scholar

[60] Zhang Y, Wanga S, Phillips P, Dong Z, Ji G, Yang J. Detection of Alzheimer’s disease and mild cognitive impairment based on structural volumetric MR images using 3D-DWT and WTA-KSVM trained by PSOTVAC. Biomed Signal Proces 2015; 21: 58–73.10.1016/j.bspc.2015.05.014Search in Google Scholar

[61] Zhang Y, Wu L, Wang S. UCAV path planning by fitness-scaling adaptive chaotic particle swarm optimization. Math Probl Eng 2013; 2013: 9.10.1155/2013/705238Search in Google Scholar

[62] Zhang Y-D, Wang SH, Yang XJ, et al. Pathological brain detection in MRI scanning by wavelet packet Tsallis entropy and fuzzy support vector machine. SpringerPlus 2015; 4: 716.10.1186/s40064-015-1523-4Search in Google Scholar PubMed PubMed Central

[63] Zhou X, Wang S, Xu W, et al. Detection of pathological brain in MRI scanning based on wavelet-entropy and naive Bayes classifier. In: Ortuño F, Rojas I, editors. Bioinformatics and biomedical engineering. vol. 9043, Granada, Spain: Springer International Publishing, 2015: 201–209.Search in Google Scholar

Received: 2015-7-30

Accepted: 2016-1-18

Published Online: 2016-2-25

Published in Print: 2016-8-1

Magnetic resonance brain classification by a novel binary particle swarm optimization with mutation and time-varying acceleration coefficients

Abstract

Aim:

Method:

Results:

Conclusions:

Background

Materials

Methodology

Wavelet entropy

Feature selection

Encoding

Particle swarm optimization

Binary particle swarm optimization

Two improvements

Probabilistic neural network

Statistical setting

Implementation

Experiments, results, and discussions

Wavelet decomposition

BPSO-MT vs. BPSO-M and BPSO-T

BPSO-MT vs. other heuristics methods

Selected features

Comparison with Saritha’s method

Comparison with other MR brain classification methods

Conclusions and future research

Acknowledgments

References

Journal and Issue

Articles in the same Issue