Quantification of accuracy and precision of multi-center DTI measurements: A diffusion phantom and human brain study

doi:10.1016/j.neuroimage.2011.02.010

NeuroImage

Volume 56, Issue 3, 1 June 2011, Pages 1398-1411

https://doi.org/10.1016/j.neuroimage.2011.02.010 Get rights and content

Abstract

The inter-site and intra-site variability of system performance of MRI scanners (due to site-dependent and time-variant variations) can have significant adverse effects on the integration of multi-center DTI data. Measurement errors in accuracy and precision of each acquisition determine both the inter-site and intra-site variability. In this study, multiple scans of an identical isotropic diffusion phantom and of the brain of a traveling human volunteer were acquired at MRI scanners from the same vendor and with similar configurations at three sites. We assessed the feasibility of multi-center DTI studies by direct quantification of accuracy and precision of each dataset. Accuracy was quantified via comparison to carefully constructed gold standard datasets while precision (the within-scan variability) was estimated by wild bootstrap analysis. The results from both the phantom and human data suggest that the inter-site variation in system performance, although relatively small among scanners of the same vendor, significantly affects DTI measurement accuracy and precision and therefore the effectiveness for the integration of multi-center DTI measurements. Our results also highlight the value of a DTI-specific phantom in identifying and quantifying measurement errors due to site-dependent variations in the system performance, and its usefulness for quality assurance/quality control in multi-center DTI studies. In addition, we observed that the within-scan variability of each data acquisition, as assessed by wild bootstrap analysis, is of the same magnitude as the inter-site and intra-site variability. We propose that by weighing datasets based on their variability, as evaluated by wild bootstrap analysis, one can improve the quality of the dataset. This approach will provide a more effective integration of datasets from multi-center DTI studies.

Highlights

► Variations (intra- and inter-site, within-scan) affect integration of multi-center DTI. ► DTI-specific phantom is useful for QA/QC in multi-center DTI studies. ► Combination of WBT and weighting statistics improves multi-site data integration.

Introduction

Diffusion tensor imaging (DTI) (Basser et al., 1994) is now widely used in the investigation of brain microstructural integrity. DTI-derived parameters, such as fractional anisotropy (FA) and mean diffusivity (MD), are often used to detect subtle changes of tissue diffusion characteristics in the early stage of disease, when no differences are detectable with other traditional MRI methods.

Studies using advanced MRI data, such as DTI, are unique in that the data consist of a large number of image elements (voxels) for each study subject but typically only a relatively small number of subjects can be recruited at a single research site. This motivates the implementation of multi-center studies to acquire an adequate sample size. One common critical question for such a typical multi-center study is whether the data from multiple scanners, either from the same or different vendors, can be integrated into a single dataset, i.e., with negligible site-dependent and time-variant measurement errors associated with the data acquisition.

Measurement error is traditionally attributed to both the accuracy and precision of the measurement (Bevington and Robinson, 1992). Measurement accuracy, δ(X), in general can be quantified by the difference between the true value and the mean value of a large number of repeated measurements of the same parameter X. Precision, σ(X), can be described by the standard deviation from these repeated measurements. The measurement accuracy and precision can affect the power of statistical inference. In the example of a two-group comparison in a single-center study, precision contributes mainly to the spread of data within each group, while the effect of bias in accuracy is more complicated. If the bias of each measurement is constant and time-invariant, in general, the accuracy of the data will not affect the statistical power. However, the bias is more often time-variant (e.g., due to unavoidable drift of scanners over time), and it will increase the standard deviation of the data. The power of statistical comparison will further decrease when data come from scanners of different vendors. Different accuracy levels due to intrinsic system differences will increase the inter-site variability, while the time-variant system performance within each site will result in acquisition-dependent variations of accuracy and precision and consequently increase the intra-site variability.

A multi-center DTI study faces even more challenges. Clinical DTI applications typically use data with low signal-to-noise ratio (SNR), contaminated by physiological noise, artifacts due to field inhomogeneity and eddy currents, and variability due to hardware instability during the lengthy image acquisition (Le Bihan et al., 2006). Except for the physiological noise, all other adverse effects are directly related to the scanner's performance and, therefore, are usually site-dependent and time-variant. Although all these sources of errors are frequently noted by DTI researchers, no comprehensive quantitative models have been reported to quantify their contributions other than the thermal noise (Pierpaoli and Basser, 1996, Jones et al., 1999, Hasan et al., 2001, Jones, 2004, Poonawalla and Zhou, 2004, Kingsley, 2005).

Recently, nonparametric bootstrap techniques such as wild bootstrap (Whitcher et al., 2008, Jones, 2008, Zhu et al., 2009) and residual bootstrap (Chung et al., 2006), have been introduced as robust estimators of precision for DTI measurements. They are particularly applicable to DTI acquisitions within the usual scanning time since only one complete DTI measurement is required. Previous studies (Tofts et al., 2000, Delakis et al., 2004, Nagy et al., 2007) have also demonstrated feasible approaches to quantify bias and to calibrate scanner's performance by scanning phantoms of isotropic solutions with known diffusivities.

Instead of direct measurement of accuracy and precision, prevailing designs of current multi-center studies rely on analyses of reproducibility and repeatability of data based on pilot studies, in which inter-site (measure for reproducibility) and intra-site (measure for repeatability) variance components are quantitatively analyzed (Zou et al., 2005, Friedman et al., 2008). To quantify measurement errors in multi-center DTI studies, a scan/re-scan theme was adopted in several DTI studies (Pfefferbaum et al., 2003, Marenco et al., 2006, Farrell et al., 2007). These studies provided reliable methods for quantification of data reproducibility and repeatability. However, outcomes from these analyses cannot be used to establish quantitative rejection/acceptance criteria for a given dataset, and there is no approach for utilizing known variability in the accepted data to boost the statistical power. One common approach proposed to deal with the data integration in multi-center studies is to incorporate the site-effect as a random variable into models of advanced variance component analysis (Zou et al., 2005, Friedman et al., 2008). Since the variability due to the site-effect is still included in the model, a relatively large sample size is required with this approach, although site-dependent errors will no longer significantly bias statistical results.

In this study, we directly quantified accuracy and precision of each dataset. For each acquisition, accuracy was estimated using a carefully constructed gold standard dataset while precision (the within-scan variability) was quantified using an optimized wild bootstrap analysis (Zhu et al., 2008). The study was specifically designed to address the following objectives: 1) to investigate inter-site and intra-site differences in DTI measurement accuracy and precision that are due to the site-dependent and time-variant performance of MR scanners in a typical multi-center DTI study; 2) to quantitatively compare the within-scan variability of each acquisition (typically not measurable by ANOVA without repeated measurements) with the inter-site and intra-site variance components, and 3) to evaluate the effectiveness of the weighting statistics, which integrates wild bootstrap estimations of the within-scan variability, in reducing the inter-site and intra-site variance. This would improve the quality of the dataset from a multi-center DTI study.

Section snippets

Material and methods

The effects of site-dependent and time-variant performance of scanners on data integration from multi-center DTI measurements were investigated using multiple scans of identical isotropic DTI phantoms and a human volunteer at three MRI sites of the HIV Neuroimaging consortium (University of Rochester, University of California at San Diego and Stanford University), with scanners from the same vendor and with similar system configurations.

Sources of measurement errors and validation for wild bootstrap estimate: phantom study

Fig. 1A shows a photograph and a representative MD map of the isotropic DTI phantom. Among all 15 phantom acquisitions, the average scan temperature was 22.3 °C (± 1.02 °C).The average value for MD (unit: × 10^− 3 mm²/s) measurements of three chemicals was 0.55 for cyclooctane, 0.97 for cycloheptane and 1.50 for cyclohexane, in close agreement with their known values. The average SNR values of the non-diffusion weighted image at three sites were 76.46, 78.95 and 75.39 respectively.

The average

Discussion

Multi-center DTI studies inevitably encounter systematic variations due to intrinsic differences among different MRI systems and due to time-dependent variations that occur in the context of a longitudinal study. Pooling data together without quantification and control of the inter-site and intra-site variability will significantly affect statistical analysis of the study which may lead to the need of a much larger sample size to compensate for such variability. A commonly used approach to deal

Conclusions

Consistent results from both phantom and human data show that inter-site variations, although small among scanners of the same vendor, will affect the integration of multi-center DTI measurements. Results from this study also indicate that with a DTI-specific phantom, such as the isotropic phantom applied in this study, it is possible to identify and quantify measurement errors due to site-dependent variations in system performances. We have also shown the usefulness of wild bootstrap analysis

Acknowledgments

The authors appreciate the thoughtful suggestions from anonymous reviewers. This study is a cooperative effort from three imaging centers within the HIV Neuroimaging consortium.

References (46)

P.J. Basser et al.
MR diffusion tensor spectroscopy and imaging
Biophys. J.
(1994)
S. Chung et al.
Comparison of bootstrap approaches for estimation of uncertainties of DTI parameters
Neuroimage
(2006)
S. Chung et al.
Bootstrap quantification of cardiac pulsation artifact in DTI
Neuroimage
(2010)
E. Fieremans et al.
Simulation and experimental verification of the diffusion in an anisotropic fiber phantom
J. Magn. Reson.
(2008)
L. Friedman et al.
Reducing Scanner-to-scanner variability of activation in a multi-center fMRI study: role of smoothness equalization
Neuroimage
(2006)
L. Friedman et al.
Reducing interscanner variability of activation in a multicenter fMRI study: controlling for signal-to-fluctuation-noise-ratio (SFNR) differences
Neuroimage
(2006)
L. Walker et al.
Effects of physiological noise in population analysis of diffusion tensor MRI data
Neuroimage
(2011)
N.E. Yanasak et al.
An empirical characterization of the quality of DTI data and the efficacy of dyadic sorting
Magn. Reson. Imaging
(2008)
J.H. Zhong et al.
Effects of susceptibility variations on NMR measurements of diffusion
J. Magn. Reson.
(1991)
T. Zhu et al.
An optimized wild bootstrap method for evaluation of measurement uncertainties of DTI-derived parameters in human brain
Neuroimage
(2008)

A.W. Anderson

Theoretical analysis of the effects of noise on diffusion tensor imaging

Magn. Reson. Med.

(2001)

M.A. Bernstein et al.

P.R. Bevington et al.

Data Reduction and Error Analysis for the Physical Sciences

(1992)

J.M. Bland et al.

Weighted comparison of means

BMJ

(1998)

L.C. Chang et al.

Variance of estimated DTI-derived parameters via first-order perturbation methods

Magn. Reson. Med.

(2007)

D.L. Collins

What quality control procedures should we be adopting for single- and multi-center studies? And what should the minimal reporting requirements be?

ISMRM Workshop on Methods for Quantitative Diffusion MRI of Human Brain, Lake Louise, Canada

(2005)

I. Delakis et al.

Developing a quality control protocol for diffusion imaging on a clinical MRI system

Phys. Med. Biol.

(2004)

B. Efron et al.

An Introduction to the Bootstrap

(1993)

J.A.D. Farrell et al.

Effects of signal-to-noise ratio on the accuracy and reproducibility of diffusion tensor imaging-derived fractional anisotropy, mean diffusivity, and principal eigenvector measurement at 1.5 T

J. Magn. Reson. Imaging

(2007)

L. Friedman et al.

Test–retest and between-site reliability in a multicenter fMRI study

Hum. Brain Mapp.

(2008)

K.M. Hasan et al.

Comparison of gradient encoding schemes for diffusion-tensor MRI

J. Magn. Reson. Imaging

(2001)

D.K. Jones

The effect of gradient sampling schemes on measures derived from diffusion tensor MRI: a Monte Carlo study

Magn. Reson. Med.

(2004)

D.K. Jones

Tractography gone wild: probabilistic fiber tracking using the wild bootstrap with diffusion tensor MRI

IEEE Trans. Med. Imaging

(2008)

Cited by (117)

Contrastive semi-supervised harmonization of single-shell to multi-shell diffusion MRI
2022, Magnetic Resonance Imaging
Diffusion weighted MRI (DW-MRI) harmonization is necessary for multi-site or multi-acquisition studies. Current statistical methods address the need to harmonize from one site to another, but do not simultaneously consider the use of multiple datasets which are comprised of multiple sites, acquisitions protocols, and age demographics. This work explores deep learning methods which can generalize across these variations through semi-supervised and unsupervised learning while also learning to estimate multi-shell data from single-shell data using the Multi-shell Diffusion MRI Harmonization Challenge (MUSHAC) and Baltimore Longitudinal Study on Aging (BLSA) datasets. We compare disentanglement harmonization models, which seek to encode anatomy and acquisition in separate latent spaces, and a CycleGAN harmonization model, which uses generative adversarial networks (GAN) to perform style transfer between sites, to the baseline preprocessing and to SHORE interpolation. We find that the disentanglement models achieve superior performance in harmonizing all data while at the same transforming the input data to a single target space across several diffusion metrics (fractional anisotropy, mean diffusivity, mean kurtosis, primary eigenvector).
Data harmonisation for information fusion in digital healthcare: A state-of-the-art systematic review, meta-analysis and future research directions
2022, Information Fusion
Removing the bias and variance of multicentre data has always been a challenge in large scale digital healthcare studies, which requires the ability to integrate clinical features extracted from data acquired by different scanners and protocols to improve stability and robustness. Previous studies have described various computational approaches to fuse single modality multicentre datasets. However, these surveys rarely focused on evaluation metrics and lacked a checklist for computational data harmonisation studies. In this systematic review, we summarise the computational data harmonisation approaches for multi-modality data in the digital healthcare field, including harmonisation strategies and evaluation metrics based on different theories. In addition, a comprehensive checklist that summarises common practices for data harmonisation studies is proposed to guide researchers to report their research findings more effectively. Last but not least, flowcharts presenting possible ways for methodology and metric selection are proposed and the limitations of different methods have been surveyed for future research.
Brain controllability and morphometry similarity of internet gaming addiction
2021, Methods
Internet gaming addiction (IGD) is a common disease in teenagers which usually reflects the abnormalities in brain function or structure. Several computational models have been applied to investigate the characteristic of IGD brain networks, for instance, the conception of brain controllability. The primary objective of this study was to explore the relationship between brain controllability and IGD related clinical behaviour. A sample of 101 subjects, including 49 IGD patients and 52 normal controls, were recruited to undergo MR T1 and DTI scanning. Specifically, the MR images were used to generate the white matter connectivity matrix and the morphometry similarity network. The morphometry similarity network was then divided into several communities using modular decomposition. After, average controllability, modal controllability and synchronizability were calculated through measuring the adjacency matrix. The results indicated that the IGD group had greater synchronizability and modal controllability compared to that of the control group, and different morphological-based brain communities had different controllability properties. Furthermore, the addiction demonstrated the mediating effects between nodal or modular brain controllability as well as anxiety. In conclusion, brain controllability could be a potential biomarker of IGD.
Cross-scanner and cross-protocol multi-shell diffusion MRI data harmonization: Algorithms and results
2020, NeuroImage
Cross-scanner and cross-protocol variability of diffusion magnetic resonance imaging (dMRI) data are known to be major obstacles in multi-site clinical studies since they limit the ability to aggregate dMRI data and derived measures. Computational algorithms that harmonize the data and minimize such variability are critical to reliably combine datasets acquired from different scanners and/or protocols, thus improving the statistical power and sensitivity of multi-site studies. Different computational approaches have been proposed to harmonize diffusion MRI data or remove scanner-specific differences. To date, these methods have mostly been developed for or evaluated on single b-value diffusion MRI data. In this work, we present the evaluation results of 19 algorithms that are developed to harmonize the cross-scanner and cross-protocol variability of multi-shell diffusion MRI using a benchmark database. The proposed algorithms rely on various signal representation approaches and computational tools, such as rotational invariant spherical harmonics, deep neural networks and hybrid biophysical and statistical approaches. The benchmark database consists of data acquired from the same subjects on two scanners with different maximum gradient strength (80 and 300 mT/m) and with two protocols. We evaluated the performance of these algorithms for mapping multi-shell diffusion MRI data across scanners and across protocols using several state-of-the-art imaging measures. The results show that data harmonization algorithms can reduce the cross-scanner and cross-protocol variabilities to a similar level as scan-rescan variability using the same scanner and protocol. In particular, the LinearRISH algorithm based on adaptive linear mapping of rotational invariant spherical harmonics features yields the lowest variability for our data in predicting the fractional anisotropy (FA), mean diffusivity (MD), mean kurtosis (MK) and the rotationally invariant spherical harmonic (RISH) features. But other algorithms, such as DIAMOND, SHResNet, DIQT, CMResNet show further improvement in harmonizing the return-to-origin probability (RTOP). The performance of different approaches provides useful guidelines on data harmonization in future multi-site studies.
Diffusion MRI harmonization via personalized template mapping
2024, Human Brain Mapping
Reduced cross-scanner variability using vendor-agnostic sequences for single-shell diffusion MRI
2024, Magnetic Resonance in Medicine

View all citing articles on Scopus

^☆: This project was supported by NIH RO1-NS036524.

View full text

Quantification of accuracy and precision of multi-center DTI measurements: A diffusion phantom and human brain study☆

Abstract

Highlights

Introduction

Section snippets

Material and methods

Sources of measurement errors and validation for wild bootstrap estimate: phantom study

Discussion

Conclusions

Acknowledgments

Biophys. J.

Neuroimage

Neuroimage

J. Magn. Reson.

Neuroimage

Neuroimage

Neuroimage

Magn. Reson. Imaging

J. Magn. Reson.

Neuroimage

Theoretical analysis of the effects of noise on diffusion tensor imaging

Magn. Reson. Med.

Data Reduction and Error Analysis for the Physical Sciences

Weighted comparison of means

BMJ

Variance of estimated DTI-derived parameters via first-order perturbation methods

Magn. Reson. Med.

What quality control procedures should we be adopting for single- and multi-center studies? And what should the minimal reporting requirements be?

ISMRM Workshop on Methods for Quantitative Diffusion MRI of Human Brain, Lake Louise, Canada

Developing a quality control protocol for diffusion imaging on a clinical MRI system

Phys. Med. Biol.

An Introduction to the Bootstrap

Effects of signal-to-noise ratio on the accuracy and reproducibility of diffusion tensor imaging-derived fractional anisotropy, mean diffusivity, and principal eigenvector measurement at 1.5 T

J. Magn. Reson. Imaging

Test–retest and between-site reliability in a multicenter fMRI study

Hum. Brain Mapp.

Comparison of gradient encoding schemes for diffusion-tensor MRI

J. Magn. Reson. Imaging

The effect of gradient sampling schemes on measures derived from diffusion tensor MRI: a Monte Carlo study

Magn. Reson. Med.

Tractography gone wild: probabilistic fiber tracking using the wild bootstrap with diffusion tensor MRI

IEEE Trans. Med. Imaging