Quantification of accuracy and precision of multi-center DTI measurements: A diffusion phantom and human brain study☆
Highlights
► Variations (intra- and inter-site, within-scan) affect integration of multi-center DTI. ► DTI-specific phantom is useful for QA/QC in multi-center DTI studies. ► Combination of WBT and weighting statistics improves multi-site data integration.
Introduction
Diffusion tensor imaging (DTI) (Basser et al., 1994) is now widely used in the investigation of brain microstructural integrity. DTI-derived parameters, such as fractional anisotropy (FA) and mean diffusivity (MD), are often used to detect subtle changes of tissue diffusion characteristics in the early stage of disease, when no differences are detectable with other traditional MRI methods.
Studies using advanced MRI data, such as DTI, are unique in that the data consist of a large number of image elements (voxels) for each study subject but typically only a relatively small number of subjects can be recruited at a single research site. This motivates the implementation of multi-center studies to acquire an adequate sample size. One common critical question for such a typical multi-center study is whether the data from multiple scanners, either from the same or different vendors, can be integrated into a single dataset, i.e., with negligible site-dependent and time-variant measurement errors associated with the data acquisition.
Measurement error is traditionally attributed to both the accuracy and precision of the measurement (Bevington and Robinson, 1992). Measurement accuracy, δ(X), in general can be quantified by the difference between the true value and the mean value of a large number of repeated measurements of the same parameter X. Precision, σ(X), can be described by the standard deviation from these repeated measurements. The measurement accuracy and precision can affect the power of statistical inference. In the example of a two-group comparison in a single-center study, precision contributes mainly to the spread of data within each group, while the effect of bias in accuracy is more complicated. If the bias of each measurement is constant and time-invariant, in general, the accuracy of the data will not affect the statistical power. However, the bias is more often time-variant (e.g., due to unavoidable drift of scanners over time), and it will increase the standard deviation of the data. The power of statistical comparison will further decrease when data come from scanners of different vendors. Different accuracy levels due to intrinsic system differences will increase the inter-site variability, while the time-variant system performance within each site will result in acquisition-dependent variations of accuracy and precision and consequently increase the intra-site variability.
A multi-center DTI study faces even more challenges. Clinical DTI applications typically use data with low signal-to-noise ratio (SNR), contaminated by physiological noise, artifacts due to field inhomogeneity and eddy currents, and variability due to hardware instability during the lengthy image acquisition (Le Bihan et al., 2006). Except for the physiological noise, all other adverse effects are directly related to the scanner's performance and, therefore, are usually site-dependent and time-variant. Although all these sources of errors are frequently noted by DTI researchers, no comprehensive quantitative models have been reported to quantify their contributions other than the thermal noise (Pierpaoli and Basser, 1996, Jones et al., 1999, Hasan et al., 2001, Jones, 2004, Poonawalla and Zhou, 2004, Kingsley, 2005).
Recently, nonparametric bootstrap techniques such as wild bootstrap (Whitcher et al., 2008, Jones, 2008, Zhu et al., 2009) and residual bootstrap (Chung et al., 2006), have been introduced as robust estimators of precision for DTI measurements. They are particularly applicable to DTI acquisitions within the usual scanning time since only one complete DTI measurement is required. Previous studies (Tofts et al., 2000, Delakis et al., 2004, Nagy et al., 2007) have also demonstrated feasible approaches to quantify bias and to calibrate scanner's performance by scanning phantoms of isotropic solutions with known diffusivities.
Instead of direct measurement of accuracy and precision, prevailing designs of current multi-center studies rely on analyses of reproducibility and repeatability of data based on pilot studies, in which inter-site (measure for reproducibility) and intra-site (measure for repeatability) variance components are quantitatively analyzed (Zou et al., 2005, Friedman et al., 2008). To quantify measurement errors in multi-center DTI studies, a scan/re-scan theme was adopted in several DTI studies (Pfefferbaum et al., 2003, Marenco et al., 2006, Farrell et al., 2007). These studies provided reliable methods for quantification of data reproducibility and repeatability. However, outcomes from these analyses cannot be used to establish quantitative rejection/acceptance criteria for a given dataset, and there is no approach for utilizing known variability in the accepted data to boost the statistical power. One common approach proposed to deal with the data integration in multi-center studies is to incorporate the site-effect as a random variable into models of advanced variance component analysis (Zou et al., 2005, Friedman et al., 2008). Since the variability due to the site-effect is still included in the model, a relatively large sample size is required with this approach, although site-dependent errors will no longer significantly bias statistical results.
In this study, we directly quantified accuracy and precision of each dataset. For each acquisition, accuracy was estimated using a carefully constructed gold standard dataset while precision (the within-scan variability) was quantified using an optimized wild bootstrap analysis (Zhu et al., 2008). The study was specifically designed to address the following objectives: 1) to investigate inter-site and intra-site differences in DTI measurement accuracy and precision that are due to the site-dependent and time-variant performance of MR scanners in a typical multi-center DTI study; 2) to quantitatively compare the within-scan variability of each acquisition (typically not measurable by ANOVA without repeated measurements) with the inter-site and intra-site variance components, and 3) to evaluate the effectiveness of the weighting statistics, which integrates wild bootstrap estimations of the within-scan variability, in reducing the inter-site and intra-site variance. This would improve the quality of the dataset from a multi-center DTI study.
Section snippets
Material and methods
The effects of site-dependent and time-variant performance of scanners on data integration from multi-center DTI measurements were investigated using multiple scans of identical isotropic DTI phantoms and a human volunteer at three MRI sites of the HIV Neuroimaging consortium (University of Rochester, University of California at San Diego and Stanford University), with scanners from the same vendor and with similar system configurations.
Sources of measurement errors and validation for wild bootstrap estimate: phantom study
Fig. 1A shows a photograph and a representative MD map of the isotropic DTI phantom. Among all 15 phantom acquisitions, the average scan temperature was 22.3 °C (± 1.02 °C).The average value for MD (unit: × 10− 3 mm2/s) measurements of three chemicals was 0.55 for cyclooctane, 0.97 for cycloheptane and 1.50 for cyclohexane, in close agreement with their known values. The average SNR values of the non-diffusion weighted image at three sites were 76.46, 78.95 and 75.39 respectively.
The average
Discussion
Multi-center DTI studies inevitably encounter systematic variations due to intrinsic differences among different MRI systems and due to time-dependent variations that occur in the context of a longitudinal study. Pooling data together without quantification and control of the inter-site and intra-site variability will significantly affect statistical analysis of the study which may lead to the need of a much larger sample size to compensate for such variability. A commonly used approach to deal
Conclusions
Consistent results from both phantom and human data show that inter-site variations, although small among scanners of the same vendor, will affect the integration of multi-center DTI measurements. Results from this study also indicate that with a DTI-specific phantom, such as the isotropic phantom applied in this study, it is possible to identify and quantify measurement errors due to site-dependent variations in system performances. We have also shown the usefulness of wild bootstrap analysis
Acknowledgments
The authors appreciate the thoughtful suggestions from anonymous reviewers. This study is a cooperative effort from three imaging centers within the HIV Neuroimaging consortium.
References (46)
- et al.
MR diffusion tensor spectroscopy and imaging
Biophys. J.
(1994) - et al.
Comparison of bootstrap approaches for estimation of uncertainties of DTI parameters
Neuroimage
(2006) - et al.
Bootstrap quantification of cardiac pulsation artifact in DTI
Neuroimage
(2010) - et al.
Simulation and experimental verification of the diffusion in an anisotropic fiber phantom
J. Magn. Reson.
(2008) - et al.
Reducing Scanner-to-scanner variability of activation in a multi-center fMRI study: role of smoothness equalization
Neuroimage
(2006) - et al.
Reducing interscanner variability of activation in a multicenter fMRI study: controlling for signal-to-fluctuation-noise-ratio (SFNR) differences
Neuroimage
(2006) - et al.
Effects of physiological noise in population analysis of diffusion tensor MRI data
Neuroimage
(2011) - et al.
An empirical characterization of the quality of DTI data and the efficacy of dyadic sorting
Magn. Reson. Imaging
(2008) - et al.
Effects of susceptibility variations on NMR measurements of diffusion
J. Magn. Reson.
(1991) - et al.
An optimized wild bootstrap method for evaluation of measurement uncertainties of DTI-derived parameters in human brain
Neuroimage
(2008)
Theoretical analysis of the effects of noise on diffusion tensor imaging
Magn. Reson. Med.
Data Reduction and Error Analysis for the Physical Sciences
Weighted comparison of means
BMJ
Variance of estimated DTI-derived parameters via first-order perturbation methods
Magn. Reson. Med.
What quality control procedures should we be adopting for single- and multi-center studies? And what should the minimal reporting requirements be?
ISMRM Workshop on Methods for Quantitative Diffusion MRI of Human Brain, Lake Louise, Canada
Developing a quality control protocol for diffusion imaging on a clinical MRI system
Phys. Med. Biol.
An Introduction to the Bootstrap
Effects of signal-to-noise ratio on the accuracy and reproducibility of diffusion tensor imaging-derived fractional anisotropy, mean diffusivity, and principal eigenvector measurement at 1.5 T
J. Magn. Reson. Imaging
Test–retest and between-site reliability in a multicenter fMRI study
Hum. Brain Mapp.
Comparison of gradient encoding schemes for diffusion-tensor MRI
J. Magn. Reson. Imaging
The effect of gradient sampling schemes on measures derived from diffusion tensor MRI: a Monte Carlo study
Magn. Reson. Med.
Tractography gone wild: probabilistic fiber tracking using the wild bootstrap with diffusion tensor MRI
IEEE Trans. Med. Imaging
Cited by (117)
Contrastive semi-supervised harmonization of single-shell to multi-shell diffusion MRI
2022, Magnetic Resonance ImagingDiffusion MRI harmonization via personalized template mapping
2024, Human Brain MappingReduced cross-scanner variability using vendor-agnostic sequences for single-shell diffusion MRI
2024, Magnetic Resonance in Medicine
- ☆
This project was supported by NIH RO1-NS036524.