Elsevier

Applied Soft Computing

Volume 11, Issue 2, March 2011, Pages 2313-2325
Applied Soft Computing

GMM based SPECT image classification for the diagnosis of Alzheimer’s disease

https://doi.org/10.1016/j.asoc.2010.08.012Get rights and content

Abstract

We present a novel classification method of SPECT images based on Gaussian mixture models (GMM) for the diagnosis of Alzheimer’s disease. The aims of the model-based approach for density estimation is to automatically select regions of interest (ROIs) and to effectively reduce the dimensionality of the problem. The resulting Gaussians are constructed according to a maximum likelihood criterion employing the Expectation Maximization (EM) algorithm. By considering only the intensity levels inside the Gaussians, the resulting feature space has a significantly reduced dimensionality with respect to former approaches using the voxel intensities directly as features (VAF). With this feature extraction method one relieves the effects of the so-called small sample size problem and nonlinear classifiers may be used to distinguish between the brain images of normal and Alzheimer patients. Our results show that for various classifiers the GMM-based method yields higher accuracy rates than the classification considering all voxel values.

Introduction

Single Photon Emission Computed Tomography (SPECT) is a widely used technique to study the functional properties of the brain [1], [2]. After the reconstruction and a proper normalization of the SPECT raw data, taken with Tc-99m ethyl cysteinate dimer (ECD) as a tracer, one obtains an activation map displaying the local intensity of the regional cerebral blood flow (rCBF). Therefore, this technique is particularly applicable for the diagnosis of neuro-degenerative diseases like for instance Alzheimer’s disease (AD) [3], [4], [5], [6], [7], [8], [9]. This functional modality has lower resolution and higher variability than others such as positron emission tomography (PET), but the use of SPECT tracers is relatively cheap, and the longer half-lives as compared to PET tracers make SPECT well suited, if not required, when biologically active radiopharmaceuticals have slow kinetics. SPECT modality also eliminates the need for an expensive on-site cyclotron/radiochemistry production facility typically required for the use of PET tracers thus, the former is very popular and is used in clinical practice nowadays.

In order to improve the prediction accuracy especially in the early stage of the disease, where the patient could benefit most from drugs, computer aided diagnosis (CAD) tools are desirable. At this stage in the development of CADs systems, the main goal is to reproduce the knowledge of medical experts in the evaluation of a complete image database, i.e. distinguishing AD patients from normal controls, thus errors from single observer evaluation are avoided along with the achievement of a method for assisting the identification of early signs of AD.

In this sense, several approaches for a computer aided diagnosis (CAD) system have been proposed in order to analyze SPECT and other medical images. The most relevant univariate analysis based approach to date is the widely used Statistical Parametric Mapping (SPM) and its numerous variants [10]. SPM consists of doing a voxelwise statistical test, i.e. a two sample t-test, comparing the values of the image under study to the mean values of the group of normal images. Subsequently the significant voxels are inferred by using random field theory [11]. Its framework was first developed for the analysis of SPECT and PET studies, but it is now mainly used for the analysis of functional MRI (magnetic resonance imaging) data. However, SPM is not intended for the diagnosis problem using a single patient image but for comparing a group of images. Its application to this problem reports poor classification results since one of populations under study consist of a single test patient (biased estimation of the population mean), and the other one consists of a set of normal patients (the t-test does not include any information about the pathology under study) [6]. In addition, this method suffers from the problem of local and mono-variate approaches. On the other hand, multivariate approaches such as MANCOVA, consider as one observation all the voxels in a single scan to make inferences about distributed activation effects. The importance of them is that the effects due to activations, confounding effects and error effects are assessed statistically in terms of effects at each voxel and also interactions among voxels [12]. Nevertheless, with these techniques one cannot make statistical inferences about regionally specific changes, and they require a number of observations (i.e. scans) to be greater than the number of components of the multivariate observation (i.e. voxels). Clearly this is not the case for most functional imaging studies (SPECT, PET, fMRI).

In the context of supervised multivariate approaches [13], [14], [15], the classification is usually done by defining feature vectors representing the different SPECT images and training a classifier with a given set of known samples [6], [7], [8]. After the training process the classifier is used to distinguish between the brain images of normal and Alzheimer patients. Since their introduction in the late seventies, support vector machines (SVMs) marked the beginning of a new era in the learning from examples paradigm [16]. SVMs have attracted recent attention from the pattern recognition community due to a number of theoretical and computational merits derived from the Statistical Learning Theory [16] developed by Vladimir Vapnik at AT&T. These techniques have been successfully used in a number of applications including voice activity detection (VAD) [17], content-based image retrieval [18], texture classification [19] and medical imaging diagnosis [14], [8].

The advantage of such a statistical learning approach is that no specific knowledge about the disease is necessary and the method is applicable for different types of brain diseases and brain imaging techniques. In a straightforward approach the voxel intensities In of the SPECT image are directly used to construct the feature vectors v=(I1,,IN), see [6], [7]. Again, even after downsampling the image resolution and applying a brain mask this results in N  10,000 entries in the feature vectors. Therefore the dimensionality of the feature space is extremely large compared to the number of available training samples (50–100 is a realistic number), which leads to the so-called small sample size problem [20].

Principal Component Analysis (PCA) is a standard technique for extracting the most significant features from a dataset, frequently used to reduce the raw data to a subset of features that contains the largest amount of variance. It is based on a linear transformation acting on a zero mean dataset, that diagonalizes its covariance matrix. The resulting eigenvectors are a new set of uncorrelated variables, whose variance is represented by its eigenvalue. Several supervised learning based approaches have been successfully proposed as a tool for extracting a decorrelated eigenvector basis [14], [15], [21], however the main drawback of these techniques is that they only take into account pair-wise relationships between voxels of the brain images. In this sense, Independent Component Analysis (ICA) have been also applied to brain image analysis since it seems reasonable that important information may be contained in the high-order relationships among voxels. Nevertheless, the high variability of the classes under study makes the improvement on PCA difficult to be achieved using ICA and, again all these proposed techniques include, at the feature extraction stage, all the non-relevant information which may degrade the classification performance in the classification stage.

On the other hand, clustering or parcellation methods are often employed [22], [23] for the purpose of data segmentation or compression. The basic idea is to group data points, which are similar in some sense, into subsets or parcels. In the case of color images, for instance, these can be contiguous areas of similar color. Recently, parcels or clusters based on Gaussian mixtures have been used to quantify the spatial color distribution of images [24]. Moreover, clustering techniques are successfully applied in various fields like pattern recognition [25], speech detection [26], [27], or image segmentation [28]. In functional imaging studies, model-based clustering or parcellation has been employed in fMRI analysis for grouping relevant coordinates in the Talaraich space [29]. For this task, Activation Likelihood Estimation (ALE) is firstly employed for reducing the list of activation maxima which have one or more maxima in their vicinity and then, these coordinates xi with their membership zi to each cluster, are subjected to clustering based on finite mixture of probability distributions [30]. The main drawback of this method, derived from the use of ALE, is that hypo-perfusion patterns are not included in the model, i.e. the use of the SPECT image modality for the diagnosis of AD is mainly based on the detection of these regions. The application of model-based image analysis should take into account not only coordinates of activation maxima but the intensity value of each voxel.

In this work we present a different parcellation approach using Gaussian mixtures models (GMMs) for density estimation of the intensity profile, which allows us to reduce the dimension of the feature vector drastically. We approximate the intensity profile of a SPECT image by a sum of Gaussians satisfying a maximum likelihood criterion. For this purpose we use the well-known Expectation Maximization (EM) algorithm, due to its simplicity and robustness [29], although other maximization methods could be employed as well.

Each region of interest (ROI) is then represented by a single Gaussian with a certain center, shape and weight. The feature vectors are constructed by the mean intensities within the different Gaussians, so that the dimensionality of the feature space equals the number of them. In our case we reach a situation where the number of training samples exceeds the number of features by one order of magnitude, so that we relieve the effects of the small sample size problem. Therefore we enter a regime where the use of nonlinear classifiers makes sense which may increase the reliability of the classification. In this sense, our approach can be seen as an hybrid combination among univariate and multivariate approaches. Firstly, we reduce the required number of observations by selecting ROIs using GMMs, and thus we are able to preserve: i) regional specificity (e.g. local hypo-perfusion regions) at each ROI, and ii) global changes in the brain activation map given from the GMM configuration, that takes explicit account of interactions among brain regions. Secondly, we apply supervised statistical learning to feature vectors obtained from these ROIs using SVM [16].

The paper is organized as follows. In Section 2.1 we describe the database used in this paper and in Section 2.2 we introduce the basic principles which are later used for SPECT image modeling. We will demonstrate how to extract Gaussian mixtures from general statistical data which is given in form of a histogram. In the following Section 2.3 we outline the classification methods used in this work for the quantitative assessment of medical images. In Section 3 a GMM based parcellation method is applied to the intensity profile thus, in this context, (i) each Gaussian of the proposed model defines a ROI and (ii) each voxel contributes to each ROI with an implicit “membership” in terms of the Gaussians’ positions and variances. In this section we also show how to apply this method to SPECT images and present the resulting ROIs (Gaussians) obtained with our approach. After the definition of the ROIs we discuss the construction of feature vectors in Section 3.1. The use of the proposed approach finds its main application for image classification. In this sense, we summarize the classification performance obtained for various linear and nonlinear classifiers in Section 4, and give an outlook about different future applications in this context based on the proposed GMM based method.

Section snippets

Subjects and preprocessing

Baseline SPECT data from 97 participants were collected from the “Virgen de las Nieves” hospital in Granada (Spain). The patients were injected with a gamma emitting 99mTc-ECD radiopharmeceutical and the SPECT raw data was acquired by a three head gamma camera Picker Prism 3000. A total of 180 projections were taken with a 2-degree angular resolution. The images of the brain cross-sections were reconstructed from the projection data using the filtered backprojection (FBP) algorithm in

Feature extraction

We now apply the method described in the preceding section to extract ROIs from SPECT images. Such images are 3-dimensional intensity distributions discretized into V voxels with positions xj, j = 1  V. In functional imaging each voxel carries a gray-level intensity I(xj), which is related to the regional blood flow [4], glucose metabolism [42], etc. in the brain of a patient, depending on the image acquisition modality. We aim to fit the intensity profile by a mixture of k Gaussians according to

Results

In this section we show the experimental results in the application of the proposed method for ROI extraction and classification. In this sense, we should achieve several objectives:

  • (i)

    The model should be suitable for the characterization of the SPECT images. Independently of the image label, we should reconstruct the whole image by means of the resulting GMM configuration. This could be useful in other image modalities for several purposes, i.e. artifact extraction in fMRI analysis.

  • (ii)

    The model

Discussion

The use of imbalanced training datasets of Groups 1 and 2 causes two major problems. Firstly, the use of performance parameters as specificity or sensitivity is inappropriate, since they are sample prevalence dependent. For instance, a low specificity value in an imbalanced dataset may not reflect a high false negative rate. This problem is solved by using other sample prevalence independent parameters, as positive (PL) or negative (NL) likelihoods and/or by selecting balanced data sets such as

Conclusions and outlook

The core idea of this work is to perform space quantization by populating it with Gaussian kernels whose linear combination approximates image intensity. The resulting kernel locations act as new “super-voxels” whose intensity is estimated by projecting (integrating) the image onto the kernel function. This parcellation technique based on Gaussian mixtures poses a stable and successful method to efficiently compress the information contained in smooth gray-scale images. We applied the method to

Acknowledgments

This work was partly supported by the MICINN under the PETRI DENCLASES (PET2006-0253), TEC2008-02113, NAPOLEON (TEC2007-68030-C02-01) and HD2008-0029 projects and the Consejería de Innovación, Ciencia y Empresa (Junta de Andalucía, Spain) under the Excellence Projects (TIC-02566 and TIC-4530). Furthermore it was supported by a fellowship within the Postdoc-Programme of the German Academic Exchange Service (DAAD). We are grateful to M. Gómez-Río and coworkers from the “Virgen de las Nieves”

References (51)

  • G. Fung et al.

    SVM feature selection for classification of SPECT images of Alzheimer’s disease using spatial information

    Knowledge and Information Systems

    (2007)
  • E. Johnson et al.

    Modeling the effect of alzheimer’s disease on mortality

    The International Journal of Biostatistics

    (2007)
  • R. Adler

    The Geometry of Random Fields

    (1981)
  • R.S.J. Frackowiak et al.

    Human Brain Function

    (2003)
  • J. Ramírez, J. Górriz, D. Salas-Gonzalez, A. Romero, M. López, I. Álvarez, M. Gómez-Ro, Computer-aided diagnosis of...
  • I. Álvarez et al.

    Alzheimer’s diagnosis using eigenbrains and support vector machines

    IET Electronic Letters

    (2009)
  • V. Vapnik

    Statistical Learning Theory

    (1998)
  • J. Ramírez et al.

    SVM-based speech endpoint detection using contextual speech features

    Electronics Letters

    (2006)
  • D. Tao et al.

    Asymmetric bagging and random sub-space for support vector machines-based relevance feedback in image retrieval

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2006)
  • K.I. Kim et al.

    Support vector machines for texture classification

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2002)
  • R.P.W. Duin, Classifiers in almost empty spaces, in: Proceedings 15th International Conference on Pattern Recognition,...
  • M. López et al.

    Automatic tool for the alzheimer’s disease diagnosis using PCA and Bayesian classification rules

    IET Electronic Letters

    (2009)
  • A.K. Jain et al.

    Data clustering: a review

    ACM Computing Surveys

    (1999)
  • R. Xu et al.

    Survey of clustering algorithms

    IEEE Transactions on Neural Networks

    (2005)
  • Cited by (0)

    View full text