Elsevier

NeuroImage

Volume 56, Issue 3, 1 June 2011, Pages 1398-1411
NeuroImage

Quantification of accuracy and precision of multi-center DTI measurements: A diffusion phantom and human brain study

https://doi.org/10.1016/j.neuroimage.2011.02.010Get rights and content

Abstract

The inter-site and intra-site variability of system performance of MRI scanners (due to site-dependent and time-variant variations) can have significant adverse effects on the integration of multi-center DTI data. Measurement errors in accuracy and precision of each acquisition determine both the inter-site and intra-site variability. In this study, multiple scans of an identical isotropic diffusion phantom and of the brain of a traveling human volunteer were acquired at MRI scanners from the same vendor and with similar configurations at three sites. We assessed the feasibility of multi-center DTI studies by direct quantification of accuracy and precision of each dataset. Accuracy was quantified via comparison to carefully constructed gold standard datasets while precision (the within-scan variability) was estimated by wild bootstrap analysis. The results from both the phantom and human data suggest that the inter-site variation in system performance, although relatively small among scanners of the same vendor, significantly affects DTI measurement accuracy and precision and therefore the effectiveness for the integration of multi-center DTI measurements. Our results also highlight the value of a DTI-specific phantom in identifying and quantifying measurement errors due to site-dependent variations in the system performance, and its usefulness for quality assurance/quality control in multi-center DTI studies. In addition, we observed that the within-scan variability of each data acquisition, as assessed by wild bootstrap analysis, is of the same magnitude as the inter-site and intra-site variability. We propose that by weighing datasets based on their variability, as evaluated by wild bootstrap analysis, one can improve the quality of the dataset. This approach will provide a more effective integration of datasets from multi-center DTI studies.

Highlights

► Variations (intra- and inter-site, within-scan) affect integration of multi-center DTI. ► DTI-specific phantom is useful for QA/QC in multi-center DTI studies. ► Combination of WBT and weighting statistics improves multi-site data integration.

Introduction

Diffusion tensor imaging (DTI) (Basser et al., 1994) is now widely used in the investigation of brain microstructural integrity. DTI-derived parameters, such as fractional anisotropy (FA) and mean diffusivity (MD), are often used to detect subtle changes of tissue diffusion characteristics in the early stage of disease, when no differences are detectable with other traditional MRI methods.

Studies using advanced MRI data, such as DTI, are unique in that the data consist of a large number of image elements (voxels) for each study subject but typically only a relatively small number of subjects can be recruited at a single research site. This motivates the implementation of multi-center studies to acquire an adequate sample size. One common critical question for such a typical multi-center study is whether the data from multiple scanners, either from the same or different vendors, can be integrated into a single dataset, i.e., with negligible site-dependent and time-variant measurement errors associated with the data acquisition.

Measurement error is traditionally attributed to both the accuracy and precision of the measurement (Bevington and Robinson, 1992). Measurement accuracy, δ(X), in general can be quantified by the difference between the true value and the mean value of a large number of repeated measurements of the same parameter X. Precision, σ(X), can be described by the standard deviation from these repeated measurements. The measurement accuracy and precision can affect the power of statistical inference. In the example of a two-group comparison in a single-center study, precision contributes mainly to the spread of data within each group, while the effect of bias in accuracy is more complicated. If the bias of each measurement is constant and time-invariant, in general, the accuracy of the data will not affect the statistical power. However, the bias is more often time-variant (e.g., due to unavoidable drift of scanners over time), and it will increase the standard deviation of the data. The power of statistical comparison will further decrease when data come from scanners of different vendors. Different accuracy levels due to intrinsic system differences will increase the inter-site variability, while the time-variant system performance within each site will result in acquisition-dependent variations of accuracy and precision and consequently increase the intra-site variability.

A multi-center DTI study faces even more challenges. Clinical DTI applications typically use data with low signal-to-noise ratio (SNR), contaminated by physiological noise, artifacts due to field inhomogeneity and eddy currents, and variability due to hardware instability during the lengthy image acquisition (Le Bihan et al., 2006). Except for the physiological noise, all other adverse effects are directly related to the scanner's performance and, therefore, are usually site-dependent and time-variant. Although all these sources of errors are frequently noted by DTI researchers, no comprehensive quantitative models have been reported to quantify their contributions other than the thermal noise (Pierpaoli and Basser, 1996, Jones et al., 1999, Hasan et al., 2001, Jones, 2004, Poonawalla and Zhou, 2004, Kingsley, 2005).

Recently, nonparametric bootstrap techniques such as wild bootstrap (Whitcher et al., 2008, Jones, 2008, Zhu et al., 2009) and residual bootstrap (Chung et al., 2006), have been introduced as robust estimators of precision for DTI measurements. They are particularly applicable to DTI acquisitions within the usual scanning time since only one complete DTI measurement is required. Previous studies (Tofts et al., 2000, Delakis et al., 2004, Nagy et al., 2007) have also demonstrated feasible approaches to quantify bias and to calibrate scanner's performance by scanning phantoms of isotropic solutions with known diffusivities.

Instead of direct measurement of accuracy and precision, prevailing designs of current multi-center studies rely on analyses of reproducibility and repeatability of data based on pilot studies, in which inter-site (measure for reproducibility) and intra-site (measure for repeatability) variance components are quantitatively analyzed (Zou et al., 2005, Friedman et al., 2008). To quantify measurement errors in multi-center DTI studies, a scan/re-scan theme was adopted in several DTI studies (Pfefferbaum et al., 2003, Marenco et al., 2006, Farrell et al., 2007). These studies provided reliable methods for quantification of data reproducibility and repeatability. However, outcomes from these analyses cannot be used to establish quantitative rejection/acceptance criteria for a given dataset, and there is no approach for utilizing known variability in the accepted data to boost the statistical power. One common approach proposed to deal with the data integration in multi-center studies is to incorporate the site-effect as a random variable into models of advanced variance component analysis (Zou et al., 2005, Friedman et al., 2008). Since the variability due to the site-effect is still included in the model, a relatively large sample size is required with this approach, although site-dependent errors will no longer significantly bias statistical results.

In this study, we directly quantified accuracy and precision of each dataset. For each acquisition, accuracy was estimated using a carefully constructed gold standard dataset while precision (the within-scan variability) was quantified using an optimized wild bootstrap analysis (Zhu et al., 2008). The study was specifically designed to address the following objectives: 1) to investigate inter-site and intra-site differences in DTI measurement accuracy and precision that are due to the site-dependent and time-variant performance of MR scanners in a typical multi-center DTI study; 2) to quantitatively compare the within-scan variability of each acquisition (typically not measurable by ANOVA without repeated measurements) with the inter-site and intra-site variance components, and 3) to evaluate the effectiveness of the weighting statistics, which integrates wild bootstrap estimations of the within-scan variability, in reducing the inter-site and intra-site variance. This would improve the quality of the dataset from a multi-center DTI study.

Section snippets

Material and methods

The effects of site-dependent and time-variant performance of scanners on data integration from multi-center DTI measurements were investigated using multiple scans of identical isotropic DTI phantoms and a human volunteer at three MRI sites of the HIV Neuroimaging consortium (University of Rochester, University of California at San Diego and Stanford University), with scanners from the same vendor and with similar system configurations.

Sources of measurement errors and validation for wild bootstrap estimate: phantom study

Fig. 1A shows a photograph and a representative MD map of the isotropic DTI phantom. Among all 15 phantom acquisitions, the average scan temperature was 22.3 °C (± 1.02 °C).The average value for MD (unit: × 10 3 mm2/s) measurements of three chemicals was 0.55 for cyclooctane, 0.97 for cycloheptane and 1.50 for cyclohexane, in close agreement with their known values. The average SNR values of the non-diffusion weighted image at three sites were 76.46, 78.95 and 75.39 respectively.

The average

Discussion

Multi-center DTI studies inevitably encounter systematic variations due to intrinsic differences among different MRI systems and due to time-dependent variations that occur in the context of a longitudinal study. Pooling data together without quantification and control of the inter-site and intra-site variability will significantly affect statistical analysis of the study which may lead to the need of a much larger sample size to compensate for such variability. A commonly used approach to deal

Conclusions

Consistent results from both phantom and human data show that inter-site variations, although small among scanners of the same vendor, will affect the integration of multi-center DTI measurements. Results from this study also indicate that with a DTI-specific phantom, such as the isotropic phantom applied in this study, it is possible to identify and quantify measurement errors due to site-dependent variations in system performances. We have also shown the usefulness of wild bootstrap analysis

Acknowledgments

The authors appreciate the thoughtful suggestions from anonymous reviewers. This study is a cooperative effort from three imaging centers within the HIV Neuroimaging consortium.

References (46)

  • A.W. Anderson

    Theoretical analysis of the effects of noise on diffusion tensor imaging

    Magn. Reson. Med.

    (2001)
  • M.A. Bernstein et al.
  • P.R. Bevington et al.

    Data Reduction and Error Analysis for the Physical Sciences

    (1992)
  • J.M. Bland et al.

    Weighted comparison of means

    BMJ

    (1998)
  • L.C. Chang et al.

    Variance of estimated DTI-derived parameters via first-order perturbation methods

    Magn. Reson. Med.

    (2007)
  • D.L. Collins

    What quality control procedures should we be adopting for single- and multi-center studies? And what should the minimal reporting requirements be?

    ISMRM Workshop on Methods for Quantitative Diffusion MRI of Human Brain, Lake Louise, Canada

    (2005)
  • I. Delakis et al.

    Developing a quality control protocol for diffusion imaging on a clinical MRI system

    Phys. Med. Biol.

    (2004)
  • B. Efron et al.

    An Introduction to the Bootstrap

    (1993)
  • J.A.D. Farrell et al.

    Effects of signal-to-noise ratio on the accuracy and reproducibility of diffusion tensor imaging-derived fractional anisotropy, mean diffusivity, and principal eigenvector measurement at 1.5 T

    J. Magn. Reson. Imaging

    (2007)
  • L. Friedman et al.

    Test–retest and between-site reliability in a multicenter fMRI study

    Hum. Brain Mapp.

    (2008)
  • K.M. Hasan et al.

    Comparison of gradient encoding schemes for diffusion-tensor MRI

    J. Magn. Reson. Imaging

    (2001)
  • D.K. Jones

    The effect of gradient sampling schemes on measures derived from diffusion tensor MRI: a Monte Carlo study

    Magn. Reson. Med.

    (2004)
  • D.K. Jones

    Tractography gone wild: probabilistic fiber tracking using the wild bootstrap with diffusion tensor MRI

    IEEE Trans. Med. Imaging

    (2008)
  • Cited by (117)

    View all citing articles on Scopus

    This project was supported by NIH RO1-NS036524.

    View full text