Introduction

In extracranial carotid artery disease contrast-enhanced magnetic resonance angiography (CE-MRA) has become a noninvasive imaging alternative for catheter angiography [119]. The major task of carotid CE-MRA is to assess carotid artery stenosis. Severe carotid stenosis (70–99% by North American Symptomatic Carotid Endarterectomy Trial (NASCET) criteria) is a major risk factor for ischaemic stroke, causing about 20% of cases [20]. Surgical endarterectomy and transluminal angioplasty make it a treatable risk factor [21, 22].

According to two current meta-analyses, carotid CE-MRA has a high sensitivity of about 94% and a high specificity of about 92% for the diagnosis of severe carotid artery stenosis [23, 24]. Both meta-analyses treated carotid CE-MRA as one entity, although this entity comprises several concurrent methods. For example, the primary studies utilised three different timing techniques, which were bolus-timed MRA, fluoroscopically triggered MRA and time-resolved MRA. However, it is unclear whether these three timing techniques have similar diagnostic accuracy for the diagnosis of severe carotid disease. Other differences among the primary studies were the prevalence of severe carotid disease, the applied spatial resolution and other known or unknown factors.

The purpose of this meta-analysis of the diagnostic accuracy of CE-MRA in severe carotid stenosis was to identify possible sources of heterogeneity and to compare different CE-MRA techniques by metaregression.

Materials and methods

Literature search

The PubMed library was searched for articles published from January 2000 to October 2008 concerning the diagnostic accuracy of CE-MRA in the assessment of carotid stenosis (Table 1). The Cochrane and Embase libraries were searched with similar terms. Additionally, the reference lists of retrieved articles were hand-searched. English- and non-English-language articles were included.

Table 1 Literature search terms

Literature selection

Primary literature sources were included that had studied (a) the index test CE-MRA; (b) versus the reference test digital subtraction angiography; (c) in the diagnosis of severe 70–99% carotid stenosis (NASCET criteria); (d) in adults with known or suspected carotid artery disease and with a clinical indication for imaging. Further inclusion criteria for this meta-analysis were: (e) the CE-MRA were performed with 1.0- or 1.5-T magnets; (f) the CE-MRA technique was not changed during the study; (g) at least 20 patients were studied; (h) the publication provided sufficient data to construct 2 × 2 contingency tables of true-positive (TP), false-positive (FP), false-negative (FN),and true-negative (TN) test results, where true-positives are those with a diagnosis of 70–99% stenosis (NASCET criteria) by both the index and the reference tests; (i) if carotid stenoses were reported by European Carotid Surgery Trial (ESCT) criteria, the results could be transformed to NASCET grading including a subgroup with severe 70–99% (NASCET) stenosis [25]; and (j) data were reported for six covariates, which are defined below. If these inclusion criteria were not fulfilled, the corresponding literature source was excluded.

Data extraction

The primary study data were extracted using a standardised form. With more than one MRA reader, the reported results were averaged to give one 2 × 2 contingency table per study. Occasionally, this gave averaged counts with decimals such as 7.5. These numbers were used without rounding. In addition to the 2 × 2 count data six covariates were extracted. The first three covariates were continuous: (a) percentage prevalence of severe carotid artery disease within the study group, including those with 70–99% (NASCET criteria) stenosis and those with occlusion; (b) acquisition time (in seconds) of the CE-MRA sequence; (c) voxel volume (in mm3) of the CE-MRA before interpolation. This voxel volume was normalised to an isotropic voxel size (in millimetres) by calculating (voxel volume)1/3. The other three covariates were categorical factors with different classes: (d) MRA timing technique, with index “1” for bolus-timed, index “2” for fluoroscopically triggered and index “3” for time-resolved; (e) MRA image type, with index “1” for hardcopy films and index “2” for electronic monitor images; (f) MRA reading mode, with index “1” for using solely maximum intensity projection (MIP) images for diagnosis and index “2” for additionally using the original 3D MRA images or postprocessed 3D images like multiplanar reformations.

Diagnostic accuracy parameters

Table 2 summarises the diagnostic accuracy parameters that were calculated for each study from the 2 × 2 contingency tables. If the false-positives (FP) or false-negatives (FN) were zero, 0.5 was added to all 2 × 2 table cells in the calculation of parameters (c) to (i) to prevent division by zero. These diagnostic accuracy parameters were solely used to place study symbols in the figures. The bivariate meta-analysis itself used the 2 × 2 count data without the +0.5 correction for zero counts.

Table 2 Diagnostic accuracy parameters

Study quality and publication bias

Publication bias was visually assessed by a funnel plot of the studies’ LOR (logarithm of the diagnostic odds ratios) against its SE (standard error) [26]. Further bias sources and the methodological study quality were assessed with 11 predefined standard quality items from the QUADAS tool, using the meta-analysis program RevMan 5 [27, 28].

Study heterogeneity

Heterogeneity of sensitivity, specificity and LOR was assessed by the meta-analysis program Meta-DiSC 1.4 [29, 30]. In addition to Cochran’s Q test, I-squared statistics were obtained [2931] that quantify the amount of heterogeneity, i.e. the unexplained variation that remains after subtracting the expected variation in measurement errors.

Bivariate random effects meta-analysis

Pooled estimates for sensitivity and specificity were calculated with a bivariate generalized linear random effects model, using the new PROC GLIMMIX of SAS 9.2 (SAS Institute, Cary, NC, USA) [32, 33], according to the formula presented by Chu and Cole [34]. With PROC GLIMMIX the observed binomial data of sensitivity = TP/(TP + FN) and that of specificity = TN/(TN + FP) enter directly in the model with an events/trials syntax. The model uses an internal logit link function to link the observed data to the so-called linear predictor. This linear predictor is defined on the linearised logit scale and consists of fixed effects and random effects. The estimated pooled sensitivity and specificity belong to the model’s fixed effects. Further covariates may be added to the fixed effects to model a bivariate metaregression. The linear predictor also includes random effects to model unexplained between-study heterogeneity [32, 33]. These random effects account for randomly distributed differences between study populations, study methods or other unknown factors [31]. PROC GLIMMIX was used with the default RSPL pseudolikelihood estimation method. The unstructured covariance matrix was modelled in terms of its Cholesky parameterisation with TYPE = CHOL in the RANDOM statement, except from one metaregression where TYPE = UN was used to achieve convergence of the model’s iterative estimation [33]. Standard Wald confidence intervals and P values were implemented by the setting DF = 1,000, as described by van Houwelingen et al. and Reitsma et al. [35, 36]. The significance level was generally set at P < 0.05.

Bivariate random effects metaregression

Six bivariate random effects metaregressions were performed with the covariates described to study possible reasons for heterogeneity among the primary studies [35]. Meta-analytic plots were generated with PROC SGPLOT. In these plots the primary studies are represented by circles. The area of each study’s circle is inversely proportional to the variance of the study’s LOR and indicates the relative weight of that study (Table 2) [37].

Results

Primary study data

In total 2,505 potential sources were identified by the literature search (Fig. 1). After screening titles and abstracts, 40 sources were left. After reading the full text, 17 sources remained for the meta-analysis. The primary data comprise the imaging findings of 1,714 carotids in 905 patients. All patients had suspected or known carotid artery disease. Eight primary literature sources gave additional information about the numbers of symptomatic (stroke or transient ischaemic attacks) and asymptomatic patients, with the majority (448 of 495 patients) being symptomatic. The count data and the covariates are summarised in Table 3. This table also shows a forest plot of individual sensitivities and specificities.

Fig. 1
figure 1

Literature search

Table 3 Primary 2 × 2 count data with forest plots

Study quality and heterogeneity

The study quality was generally excellent. The funnel plot (Fig. 2) showed no major publication bias. The sensitivities were homogeneously distributed among the 17 studies (I 2 = 0%, Cochran Q = 15.6, P = 0.49). However, the specificities showed high between-study heterogeneity (I 2 = 73%, Cochran Q = 58.4, P < 0.01). This heterogeneous distribution of the specificities was not obvious from the LORs, which were slightly but not significantly overdispersed (I 2 = 20%, Cochran Q = 20.1, P = 0.22).

Fig. 2
figure 2

Funnel plot to assess publication bias. The funnel plot is approximately symmetrical and indicates no major publication bias. The 17 primary studies are represented by circles. The LOR (logarithm of the diagnostic odds ratio) is given on the horizontal axis and its SE (standard error) on the vertical axis. The dashed line represents the pooled LOR (= 5.40) from the bivariate random effects meta-analysis. Both diagonal lines are calculated by [5.40 − 1.96 × SE] and [5.40 + 1.96 × SE]. They represent the 95% confidence intervals of the LOR at different standard errors. In the absence of between-study heterogeneity 95% of the studies should lie within the funnel that is defined by these diagonal lines

Meta-analysis

The bivariate random effects meta-analysis gave the following pooled estimates (with 95% confidence intervals in parentheses): sensitivity 94.3% (92.0–96.0%), specificity 93.0% (89.8–95.3%) and LOR 5.40 (4.85–5.95). A receiver operating characteristic (ROC) plot of sensitivity versus specificity is shown in Fig. 3.

Fig. 3
figure 3

ROC plot of sensitivity versus specificity. The sensitivities and specificities of the 17 primary studies are represented by the small grey circles. The central black spot represents the bivariate summary estimate from the random effects meta-analysis. The surrounding confidence ellipse shows the corresponding bivariate 95% confidence interval. Here the confidence ellipse is asymmetrical, because it is shown on the original sensitivity/specificity scale instead of the linearised logit-link scale

Metaregressions with continuous covariates

The three continuous covariates showed trends that were, however, not significant (Fig. 4 and Table 4). Sensitivity and specificity increased slightly but not significantly if there were more cases with severe carotid artery disease (defined as 70–99% stenosis plus occlusion) within the study group (Fig. 4a–c). True-negative cases were tendentiously better detected with larger voxels (Fig. 4e). On the other hand, true-positive cases did not profit much from smaller voxel sizes (Fig. 4d). The acquisition time had no major influence on sensitivity. However, specificity was tendentiously higher with shorter acquisition times.

Fig. 4
figure 4

Bivariate random effects metaregression with continuous covariates. In each panel the circles represent the 17 primary studies. The area of each study’s circle is inversely proportional to the variance of the study’s LOR and indicates the relative weight of that study. The central line is the regression line. The upper and lower boundaries indicate the corresponding 95% confidence intervals. For further details see Results and Table 4. Upper row percentage of severe disease (70–100% stenosis) within the study group. Central row normalised isotropic voxel size (in millimeters) as a measure for the spatial resolution. Lower row acquisition time of the contrast-enhanced MRA sequence (in seconds). Left column logit(sens) = sensitivity on the linearised logit scale. Middle column logit(spec) = specificity on the linearised logit scale. Right column LOR (logarithm of the diagnostic odds ratio) = logit(sens) + logit(spec). The numbers in brackets on the right vertical axes of the left and middle columns are the sensitivities and specificities on their original percentage scale

Table 4 Metaregressions with continuous covariates (logit scale)

Metaregressions with categorical covariates

The results of the metaregressions with the three categorical covariates are depicted in Fig. 5 and numerically summarised in Tables 5 and 6. The three different timing techniques (1 = bolus-timed; 2 = fluoroscopically triggered; 3 = time-resolved) showed no significant differences in diagnostic accuracy. There was only a slight nonsignificant trend in favour of bolus-timing (Fig. 5a–c). With hardcopy films the pooled mean specificity was 89.6%. With electronic images on a workstation’s monitor the pooled mean specificity was 95.6% and thus significantly higher (Fig. 5e). For the sensitivity, a corresponding slight difference did not reach significance (Fig. 5d). The pooled mean specificity was 88.8% when using only MIP images for the diagnosis. The pooled mean specificity was 95.8% and thus significantly higher, when using 3D images in addition to MIP images (Fig. 5h). A corresponding slight difference in sensitivity was not significant (Fig. 5 g). These metaregression results are presented in more detail in Table 5 for the logit scale, and in Table 6 for the original sensitivity/specificity scale.

Fig. 5
figure 5

Bivariate random effects metaregression with categorical covariates. In each panel the circles represent the 17 primary studies. The area of each study’s circle is inversely proportional to the variance of the study’s LOR and indicates the relative weight of that study. The centre line of each box represents that subgroup’s pooled estimate, while the box itself represents the corresponding 95% confidence interval. Further details are given in Results and Tables 5 and 6. Upper row subgroup analysis for the three different MRA timing techniques. Central row analysis for using 1 hardcopy films versus 2 electronic monitor images. Lower row reading 1 only MIP images versus 2 reading MIP plus 3D images. Left column logit(sens) = sensitivity on the linearised logit scale. Middle column logit(spec) = specificity on the linearised logit scale. Right column LOR (logarithm of the diagnostic odds ratio) = logit(sens) + logit(spec). The numbers in brackets on the right vertical axes of the left and middle columns are the sensitivities and specificities on their original percentage scale

Table 5 Metaregressions with categorical covariates (logit scale)
Table 6 Pooled subgroup estimates of sensitivity and specificity

Discussion

Relevance of carotid CE-MRA

In symptomatic patients with recent transient ischaemic attacks or ischaemic stroke within the last 6 months and ipsilateral severe (70–99%) extracranial carotid stenosis, carotid endarterectomy or angioplasty with carotid artery stenting is recommended according to current guidelines to reduce the stroke risk [18, 21]. Additionally, endarterectomy may be considered in selected asymptomatic patients with severe carotid stenosis for the primary prevention of ischemic stroke [18, 22]. To apply these guidelines it is necessary to distinguish severe carotid stenosis from moderate stenosis and from occlusion. Currently digital subtraction angiography (DSA) is the diagnostic gold standard that offers angiographic images with high spatial resolution and high contrast between vessels and surrounding tissues [18]. However, DSA is invasive and has complications in about 1% of cases [1, 5, 18]. For diagnostic purposes carotid CE-MRA has become a well-established noninvasive alternative [5, 18].

Meta-analysis

In appropriate patients carotid MRA has a high sensitivity and specificity for diagnosing or ruling out severe carotid stenosis [18, 23, 24]. The presented bivariate random effects meta-analysis found a pooled sensitivity (with 95% confidence interval) of 94.3% (92.0–96.0%) and a pooled specificity of 93.0% (89.8–95.3%) based on 17 primary studies. These results are similar to two other current meta-analyses: Wardlaw et al. used a random effects model and found a pooled sensitivity of 94% (88–97%) and specificity of 93% (89–96%) [23]. Debrey et al. used a fixed effects model and found a pooled sensitivity of 94.6% (92.4–96.4%) and specificity of 91.9% (90.3–93.4%) [24]. In this case the results from random effects and fixed effects meta-analytic models were relatively similar.

Heterogeneity of diagnostic accuracy among the studies

Both of the previously mentioned meta-analyses found a homogeneous distribution of sensitivities, but a heterogeneous distribution of specificities among the primary studies, similar to the meta-analysis presented here [23, 24]. Wardlaw et al. focused on comparing several noninvasive carotid imaging techniques in general and, therefore, did not explore sources of heterogeneity in more detail [23]. Debrey et al. focused on comparing CE-MRA with time-of-flight MRA and studied sources of heterogeneity predominantly for time-of-flight MRA [24]. The presented meta-analysis focused on carotid CE-MRA and analysed possible sources of heterogeneity that have not been studied by either of the other meta-analyses [23, 24]. This was done by bivariate random effects metaregressions of sensitivity and specificity with six different covariates. The results are discussed below.

Prevalence of severe carotid artery disease in the study group

The 17 studies included had a high prevalence of severe carotid stenosis and occlusions, ranging from 21 to 59%. In the metaregression, sensitivity slightly increased with disease prevalence (Fig. 4a–c and Table 4). This has also been empirically observed in other situations, although there is no mathematical relationship between sensitivity and specificity on the one hand and prevalence on the other hand [38, 39]. Here, not only sensitivity but also specificity slightly increased. However, these trends were not significant. This qualifies sensitivity and specificity to be suitable diagnostic accuracy parameters across a wide range of prevalence percentages.

Spatial resolution of the CE-MRA

Within the isotropic voxel size range of 0.8 to 1.8 mm under study, the metaregression found no significance change in sensitivity with voxel size (Fig. 4d–e and Table 4). There was a trend for specificity to increase with larger voxels, but this trend was not significant either. These meta-analysis findings are consistent with the empirical findings of Cosottini et al., who studied the diagnostic accuracy of carotid CE-MRA by sequences with different voxel sizes [10]. They found that smaller voxels with a volume of 0.8 mm3 did not cause a significant improvement in diagnostic accuracy compared with larger voxels with a volume of 2.15 mm3. With reference to Fig. 4d–e, these voxel volumes correspond to a normalised isotropic voxel size of (0.8)1/3 = 0.5 mm and (2.15)1/3 = 1.3 mm. However, their findings and this study’s metaregression findings are limited to the techniques applied. With other carotid CE-MRA techniques smaller voxel sizes may allow for increased diagnostic accuracy, for instance when using 3-T magnets, parallel imaging or blood pool contrast agents with their prolonged vessel enhancement [4044].

Acquisition time of the CE-MRA sequence

The metaregression showed a slight increase in diagnostic accuracy with smaller acquisition times, although this trend was not significant (Fig. 4g–i and Table 4). Generally, in MRA a better temporal resolution comes at the expense of spatial resolution [1]. In carotid CE-MRA this trade-off between temporal and spatial resolution seems to be slightly in favour of temporal resolution. This may explain why time-resolved CE-MRA with its inherently low spatial resolution provides a high diagnostic accuracy, and thus remains concurrent with bolus-timed and fluoroscopically triggered sequences with their longer acquisition times and higher spatial resolution. A further explanation in favour of shorter acquisition times may be that the risk of slight motion artefacts increases with measurement time, perhaps not only in noncompliant patients [13].

MRA timing technique

In the literature there is currently no evidence that any of the three MRA timing techniques (bolus-timed, fluoroscopically triggered, time-resolved) has a higher diagnostic accuracy than the others. Only one primary study compared fluoroscopically triggered versus time-resolved CE-MRA intraindividually in a small number of 21 patients [15]. This study found sensitivities of 100% (11/11 carotids) with both timing techniques. The observed specificity was 80.6% (25/31 carotids) with fluoroscopically triggered CE-MRA and elliptic acquisition. The specificity was 83.9% (26/31 carotids) with time-resolved CE-MRA, and thus slightly but not significantly higher. In their discussion the authors favoured fluoroscopically triggered CE-MRA because of its higher spatial resolution, which allowed for greater diagnostic confidence and interobserver agreement [15]. Another study used time-resolved CE-MRA in 19 patients and examined the next 20 patients with bolus-timed CE-MRA [45]. However, the authors found no significant differences between the two timing techniques regarding the diagnostic accuracy in severe carotid stenosis [45]. Currently, there are no other studies that compare the different CE-MRA timing techniques at the person-level. At the study-level the metaregression presented here found a trend in favour of bolus-timed CE-MRA, but the differences in diagnostic accuracy observed among the three CE-MRA timing techniques did not reach significance (Fig. 5a–c, Tables 5 and 6). Therefore, it seems justified to apply and to further develop all three different CE-MRA timing techniques equally.

MRA image type

There are no primary studies that compare the use of hardcopy films versus electronic monitor images in carotid CE-MRA at the person-level. At the study-level the metaregression showed a significant improvement in sensitivity with electronic monitor images (P = 0.02) (Fig. 5d–e, Tables 5 and 6). This gives evidence of the trend in clinical practice, where conventional film systems are subsequently replaced by electronic viewing facilities and electronic picture archive and communication systems (PACS). This requires sufficiently suitable monitors not only for the radiologist, but also for the admitting clinician. However, the net costs of electronic systems are generally smaller than those of hardcopy films, and the fast image access is a further advantage.

MRA reading mode

In the metaregression of two different MRA reading modes the pooled specificity was 88.8% with (1) using solely MIP images; with (2) using MIP plus 3D images the pooled specificity was 95.8% and thus significantly higher (P = 0.01) (Fig. 5g–i, Tables 5 and 6). This can be explained by the maximum intensity projection (MIP) algorithm, which generates the MIP images from the original 3D images [46]. In MIP images it easier to detect vessels with large diameters than those with small diameters. A small vessel may be hidden in the MIP image, although it is visible in the original 3D data [46]. The same applies to a vessel stenosis, which also has a small diameter. Therefore, a vessel stenosis may be overestimated on the MIP image, although being less severe on the 3D source images. This increases the rate of false-positives and thus reduces the specificity of method (1) compared with method (2), which explains the metaregression findings. However, according to the metaregression, the sensitivity with method (1) was nearly similar to that with method (2). These findings give statistical evidence to support the following reading algorithm that has naturally developed in clinical practice: first, the MIP images are viewed to search for severe carotid stenoses; then, the 3D images are assessed, paying specific attention to the suspected stenoses to reduce the rate of false-positives. This image-reading algorithm is consistent with the findings of Huston et al., who compared the diagnostic accuracy of MIP images and 3D source images in 50 patients [3]. They found that MIP images have the highest sensitivity (93.3%) for the detection of severe carotid stenosis, whereas 3D images have the highest specificity (97.0%) for ruling out severe carotid stenosis.

Study limitations

The metaregression findings presented here are limited to carotid CE-MRA. Transfer of these findings to other imaging techniques like carotid computed tomographic angiography would mean an extrapolation of results and is perhaps invalid. It will be better to study the possible sources of heterogeneity directly for the other noninvasive carotid imaging techniques.

The metaregressions were limited to analysing the six covariates at the study-level. In the literature there are only a few primary studies that studied these covariates at the person-level. Future diagnostic accuracy studies might, for example, compare the different MRA timing techniques intraindividually [15, 45]. Another limitation of this meta-analysis results from analysing only six covariates. It is possible that the diagnostic accuracy of carotid CE-MRA is influenced by further factors. To some degree such factors are statistically considered by the random effects component of this meta-analysis.

The 17 primary studies applied DSA as their gold standard for the grading of carotid stenosis. However, DSA has its own measurement errors and may additionally underestimate noncircular carotid stenoses because of the angle-dependent projection. This has, for example, been illustrated by Serfaty et al. (Fig. 5 in ref. [1]). Rotational angiography, with its multiple projections at different angles, overcomes this problem and may thus become a more accurate gold standard than DSA [16]. However, for a meta-analysis there are currently not enough studies that compare carotid CE-MRA with rotational angiography [16]. In summary, the true diagnostic accuracy of CE-MRA may be somewhat higher than when assessed by DSA.

The populations of the 17 primary studies consisted of preselected patients with a high probability for severe carotid artery disease. Thus, there is no major selection bias, when applying the meta-analysis results in similar, clinically high-risk collectives. However, extrapolation of the meta-analysis results to asymptomatic screening populations with less than 20% disease prevalence might produce such selection bias. Currently, this is not relevant, because Doppler sonography, not CE-MRA, is the first-line technique in the general population. Furthermore, the net benefit of such screening for carotid artery disease is unclear [4749].

This meta-analysis is limited to primary CE-MRA studies with 1.0- or 1.5-T magnets. In principle, CE-MRA with 3-T magnets allows for higher spatial resolution and image contrast [40, 41]. It may be hypothesised that the diagnostic accuracy of carotid CE-MRA at 3-T might be somewhat higher than with 1.5-T magnets. However, currently, there are not many publications on the sensitivity and specificity of carotid CE-MRA at 3-T [41, 50]. Perhaps such studies could be conducted with rotational angiography as the reference standard [16].

Conclusion

In CE-MRA of severe carotid artery stenosis the three major timing techniques yield comparably high diagnostic accuracy, electronic images are more specific than hardcopies, and 3D images should be used in addition to MIP images to increase specificity.