Reliability of tissue volumes and their spatial distribution for segmented magnetic resonance images

https://doi.org/10.1016/S0925-4927(01)00075-0Get rights and content

Abstract

Before using MRI tissue segmentation in clinical studies as a dependent variable or as a means to correct functional data for differential tissue contribution, we must first establish the volume reliability and spatial distribution reproducibility of the segmentation method. Although several reports of volume reliability can be found in the literature, there are no articles assessing the reproducibility of the spatial distribution of tissue. In this report, we examine the validity, volume reliability, and spatial distribution reproducibility for our K-means cluster segmentation. Validation was examined by classifying gray matter, white matter, and CSF on images constructed using an MRI simulator and digital brain phantom, with percentage volume differences of less than 5% and spatial distribution overlaps greater than 0.94 (1.0 is perfect). We also segmented repeat scan MRIs from 10 healthy subjects, with intraclass correlation coefficients greater than 0.92 for cortical gray matter, white matter, sulcal CSF, and ventricular CSF. The original scans were also coregistered to the repeat scan of the same subject, and the spatial overlap for each tissue was then computed. Our overlaps ranged from 0.75 to 0.86 for these tissues. Our results support the use of K-means cluster segmentation, and the use of segmented structural MRIs to guide the analysis of functional and other images.

Introduction

Use of structural magnetic resonance images (MRIs) to guide analysis of functional images (e.g. PET, SPECT, fMRI) or data from other imaging modalities (e.g. magnetization transfer imaging, MR spectroscopy, diffusion weighted imaging) is becoming increasingly popular. Anatomic regions of interest (ROIs) can be better delineated on high resolution structural MRIs than on lower resolution functional and other images, and these ROIs can then be used to guide quantitative analysis of the lower resolution images (e.g. Migneco et al., 1994, Mountz et al., 1994, Weiner et al., 1998). Tissue segmentation from the structural MRIs can be used to correct the lower resolution images for atrophic and differential tissue compartment contribution effects (Meltzer et al., 1990, Meltzer et al., 1996a, Meltzer et al., 1996b, Müller-Gärtner et al., 1992, Weiner et al., 1998).

MRI segmentation is most often used to assess tissue specific volumes as measures of atrophy or as differences in brain organization between diagnostic groups. The validity of the segmentation tissue classification can only be assessed against a true gold standard. Investigators have assessed the reliability of MRI segmentation by evaluating the repeatability of the resultant tissue volume measurements (Bonar et al., 1993, Byrum et al., 1996, Cohen et al., 1992, Fisher et al., 1997, Harris et al., 1999, Kikinis et al., 1992, Reiss et al., 1998). Sources of unreliability of MRI segmentation volumetric measures identified by these investigators include intra- and inter-operator variability, imperfections in data acquisition (RF inhomogeneity, motion and flow artifacts), drift in imager function over time, and partial volume effects (PVE). When segmented structural MRIs are used to guide the analysis of functional and other images, their utility depends on the validity and reliability of the segmentation on a pixel-by-pixel basis. In these cases, it is not just the volume of cortical gray matter that is important, but it is the spatial location of the gray matter voxels in the brain that is used to guide the analysis of the other imaging modality. We have found no publications assessing the reliability of segmentation algorithms on a pixel-by-pixel basis.

It is difficult to validate tissue segmentation algorithms using data acquired in vivo, since there is no way of determining the true tissue classifications of each MRI voxel. Manual segmentation by an expert is often used as a gold standard for validating segmentation algorithms (Harris et al., 1999), but such efforts are hindered by intra- and inter-rater variability. Investigators at the Montreal Neurologic Institute (MNI) (Collins et al., 1998, Kwan et al., 1996) have developed a realistic digital brain phantom and MRI simulator, which can be used to evaluate image-processing methods. Using a web interface (http://www.bic.mni.mcgill.ca/brainweb), differently weighted simulated MRIs can be downloaded and used to test segmentation algorithms. The output of segmentation can be compared to the digital brain phantom to compute an objective measure of performance.

This report examines the validity of our K-means clustering segmentation approach by applying it to MRI phantom data, and then focuses on the reproducibility or reliability of the volumetric measures and of the spatial distribution of tissue categories in serially collected, segmented, anatomic images. Only by comparing the reproducibility of the spatial distribution of tissue categories across repeat imaging studies can the utility of segmentation for voxel-by-voxel co-analysis of functional images be assessed.

Section snippets

McGill University brainweb images

We obtained simulated MRIs from http://www.bic.mni.mcgill.ca/brainweb, using their normal brain model. All images had 1-mm slice thickness with 1×1 mm2 in-plane resolution. We obtained T1, PD, and T2 weighted images, with the following parameters: T1 (simulated 3D spoiled FLASH TR/TE, FA — 18/10, 30°), 3% noise level, 0% inhomogeneity; PD (simulated early echo from 2D multislice dual spin echo TR/TE — 3300/35), 3% noise level, 20% inhomogeneity, T2 (simulated late echo from 2D multislice dual

Results

Table 1 shows the volumes obtained from the discrete anatomical brain phantom for gray matter, white matter, and CSF compared to the volumes output by the segmentation of the T1-, PD-, and T2-weighted images generated by the MRI simulator based on the discrete anatomical brain phantom. The percentage difference between these volumes were all less than 5%, and the overlaps were 0.94 or greater. Fig. 4 shows slices from the discrete anatomical brain phantom and the corresponding slices from the

Discussion

Our K-means clustering segmentation method, which utilizes intensity information from T1-, PD-, and T2-weighted images, performed extremely well on the data generated by MNI using their realistic digital brain phantom and MRI simulator. Our method produces a classification into gray matter, white matter, and CSF, and the tissue volume differences between the segmentation image and ‘truth’ were less than 5% for these three tissues. These tissue volume differences are similar to those for other

Acknowledgements

The authors would like to thank Diana Truran, Alanna McAlorum, Rosanna Jeremias, and Dawn Hardin for their assistance in recruiting subjects, running the magnet, and processing the MRIs. This work was supported by NIA grant AG12435, NIAAA grant P01AA11493, NIDA grant R01DA08365 and a DVA Research Career Scientist Award (George Fein).

References (25)

  • T.L. Jernigan et al.

    Methods for measuring brain morphologic features on magnetic resonance images

    Archives of Neurology

    (1990)
  • R. Kikinis et al.

    Routine quantitative analysis of brain and cerebrospinal fluid spaces with MR imaging

    Journal of Magnetic Resonance Imaging

    (1992)
  • Cited by (40)

    • Brain tissue volumes in the general population of the elderly. The AGES-Reykjavik Study.

      2012, NeuroImage
      Citation Excerpt :

      It has been suggested that overlaps are generally better for tissue classes with larger volumes and/or tissue classes that are contiguous with many interior voxels compared to boundary voxels. The underlying reason for this is that larger volumes with high overlaps will have relatively fewer partial-volume voxels than small regions (Cardenas et al., 2001). We believe this explains the relatively lower similarity index for WMH (0.62), compared to the other tissue classes (0.82, 0.82 and 0.83 for GM, NWM and CSF respectively).

    • Neuroimaging in Psychiatry

      2009, Neurologic Clinics
      Citation Excerpt :

      Relationships with liver function, cytokines, nutritional status, and hormone levels, however, are poor. By using deformation-based morphometric MRI, studies have demonstrated those patients able to maintain abstinence had significant tissue volume recovery in the frontal, parietal, and temporal lobes and in the thalamus, brainstem, corpus callosum, anterior cingulated, insula, and subcortical white matter.6 Findings for light drinkers were less pronounced.

    • Brain tissue volumes in the general elderly population. The Rotterdam Scan Study

      2008, Neurobiology of Aging
      Citation Excerpt :

      This was particularly due to low similarity index for small WML. The underlying reason is that the same amount of partial-volume voxels being classified differently will have a larger effect on the similarity index of smaller WML than on the similarity index of larger WML (Cardenas et al., 2001). Indeed, if we excluded two persons with smallest WML from our validation set, the similarity index for WML increased to 0.71.

    View all citing articles on Scopus
    View full text