Background
The ultimate goal in the development of pharmacological therapies for acute myocardial infarction (AMI) is a reduction in mortality. Current treatment strategies in AMI are quite effective, and further reduction in mortality with novel therapies will require increasingly larger sample sizes. The resources associated with large sample sizes limits the number of new therapies that can be tested in clinical trials. Hence, surrogate endpoints of mortality that can assess the efficacy of novel therapies are of interest, and infarct size appears to be particularly attractive given its strong link with outcome. [
1,
2] Late gadolinium enhancement (LGE) cardiovascular magnetic resonance (CMR) is considered the imaging reference standard for the assessment of AMI, [
2,
3] offering advantages in detecting small and subendocardial infarcts. [
4,
5].
Quantification of LGE infarct size can be accomplished by manual planimetry. [
6‐
8] Automated methods, which use the image signal intensity of the infarct and/or normal myocardium to define infarct borders, are believed to be more objective and, therefore, more reproducible. [
8‐
11] However, all automated methods require manual tracing of the LV myocardial contours. This is because there are no automated algorithms that can reliably distinguish the bright LV cavity from the bright endocardial border of the infarct using conventional pulse-sequences, although there is some work attempting to tackle this problem. [
12‐
15] The importance of this component in the overall reproducibility of infarct size quantification is unknown. Prior studies evaluating the reproducibility of methods for infarct quantification reported results only after the step of manually tracing the endocardial/epicardial contours had already been performed. [
6‐
11].
A simple method of infarct size quantification is visual scoring of hyperenhanced tissue on a standard 17-segment model with a 5-point scale for each segment. [
2,
5] This method allows rapid assessment of infarct size without the need for planimetry of endocardial/epicardial borders. Previous investigations evaluating the reproducibility of visual and manual planimetry methods did not explicitly define how a user should treat partially bright regions with intermediate signal intensities, which are typically located at the infarct border zone and result from partial volume or other effects. [
16].
A limitation in the use of LGE for infarct size quantification in clinical trials is the lack of studies evaluating the reproducibility of infarct size measurements at multiple centers. [
17] The aim of the present study was to assess sources of variability among automated, manual, and visual methods in the quantification of AMI size. Unlike prior reports, we (a) compared measurements at 3 separate core laboratories, (b) included the step of tracing endocardial/epicardial borders for a complete assessment of reproducibility (e.g. to assess interobserver variability), and (c) explicitly defined how users should treat intermediate signal intensities for manual and visual methods. Finally, in order to illustrate the significance of the findings in the context of clinical trials, the impact of the findings on sample size was calculated.
Discussion
In this study we found that automated quantification with a computer algorithm, manual planimetry, and visual scoring can have similar reproducibility when used in core-laboratories for infarct size quantification. This is a surprising finding as many consider a computer algorithm objective and therefore more reproducible than the subjective judgment by a human user. Considerable research has been dedicated to developing and evaluating various thresholding algorithms for infarct size quantification. Bondarenko et al. tested the “n-SD” approach, which is based on measuring the mean and standard deviation of signal in normal, noninfarcted myocardium. [
8] Amado et al. advocated the full-width at half-maximum (FWHM) technique, which uses the signal intensity of the infarct rather than normal myocardium for finding the appropriate threshold. [
9] Heiberg et al. validated an algorithm, which assigns a weighting to each myocardial voxel depending on its signal intensity above a fixed number of standard deviations above remote, and infarct size is calculated by summing weighted volumes rather than dichotomous volumes. [
21] In patient studies, manual planimetry by “experienced observers” is often used as the reference standard since pathology is not available. Usually, excellent agreement between the computer algorithm approach and manual planimetry is reported in these studies. [
8,
21] On the other hand, Flett et al. compared the reproducibility of infarct size quantification methods, and found the FWHM technique to have superior reproducibility compared with manual planimetry and the n-SD approach. [
10] Regarding prior studies, however, it is important to note that none have taken into account the subjective determination of endocardial/epicardial borders, which all methods require as a necessary first step before determining the infarct borders. In the current study, the results show there can be considerable within-patient variability in infarct size measurements even if the infarct border is determined solely by computer, since variability is introduced during the planimetry of endocardial/epicardial contours.
The importance of this finding is that it is necessary to consider the variability in endocardial/epicardial borders for a thorough comparison between quantification methods and for an accurate calculation of sample size in a clinical trial, since it is a substantial portion of the variability in reproducing measurements. Not surprisingly, Flett et al. in AMI patients reported intraclass correlation coefficients ranging from approximately 0.94 to 0.99 based on predrawn endocardial/epicardial contours, [
10] whereas in the current study ICCs were lower, ranging from 0.85 to 0.96. The appreciable variability in endocardial/epicardial borders also highlights that there may be an upper limit in improving reproducibility by means of computer algorithms, and suggests that moderate differences in reproducibility between analysis methods may have limited practical significance, if these differences are based on predrawn LV myocardial contours.
One could try to avoid the variability introduced by subjective planimetry of LV myocardial borders by expressing infarct size in terms of absolute mass (ie. numerator alone) rather than as a percentage of LV myocardial mass (ie. numerator/denominator ratio). This would, however, introduce the variability in heart size. Furthermore, this is unlikely to succeed in the setting of a subendocardial MI since the endocardial border of the infarct is almost always the same as the local LV myocardial − bloodpool border. Hence, this portion of the infarct contour will be the result of manual planimetry even if a computer algorithm is used to determine the infarct borders. In the setting of a transmural MI, both the endocardial and epicardial aspects of the infarct will result from manual planimetry. Another theoretical approach to reduce variability would be to use an automated method to determine LV myocardial borders on LGE images. However, we are not aware of any automated tool that is publicly available, [
12‐
15] and any attempt to develop such a method will likely be troubled by the fact that both infarction and LV blood-pool are bright and have similar signal intensities. [
25] Because the endocardial border of the infarct displays the smallest gradient in image intensities, and can constitute up to 50% of the infarct perimeter, this portion of the infarct border is likely the largest source of variability in infarct size measurements.
To our knowledge, the present study is the first to explicitly define how myocardial regions with intermediate signal-intensity should be considered for quantitative infarct size measurement by manual planimetry or visual scoring. Without explicit instruction, readers might include all, include part, or exclude such regions as part of the infarct. In the current study we tested two approaches (see Fig.
2): (a) to include all regions with intermediate signal-intensity, and (b) to include an adjusted percentage of regions with intermediate signal-intensity. The observation that the two approaches lead to appreciable differences in infarct size for both manual planimetry and visual scoring (e.g. MANUAL-ISI vs MANUAL and VISUAL-ISI vs VISUAL) indicates the spatial extent of these regions can be substantial. It also suggests that without explicit instruction, reader inconsistency in interpreting regions with intermediate signal-intensity, could in part, explain some of the variability that has been found previously with non-automated methods.
Interestingly, explicit instructions to include an adjusted percentage of regions with intermediate signal-intensity, rather than all regions with intermediate signal-intensity, improved the reproducibility of infarct size measurements for manual planimetry and visual scoring. The reason for this not clear, however, it is possible that incorporating a process to “weight” regions with intermediate signal-intensity, may provide a self-correcting mechanism for some of the more idiosyncratic subjective assessments of infarct size.
Similarly, it may seem paradoxical that incorporating subjective user input with AUTO could improve reproducibility compared with excluding user input (AUTO-UCSegment versus AUTOSegment: ICC, 0.96 vs 0.91; CV, 8.3% vs 10.6%). However, regarding this point recall that AUTO includes the variability introduced during manual planimetry of endocardial/epicardial borders. Hence imprecise endocardial contours may lead to bright LV cavity blood-pool or epicardial fat pixels mistakenly included as part of quantitative infarct size, even when remote from the infarct zone. In this situation, allowing user input could reduce variability in infarct size measurements since users could “self-correct” for obvious imperfections in the endocardial/epicardial contours. We note that this process of user correction of the endocardial/epicardial contours reflects the actual process by which infarct size quantification commonly is performed in core-laboratories. Hence AUTO-UC (with any thresholding technique used) most closely reflects the standard process, and differences between AUTO-UC and AUTO, provide a quantitative assessment of the user correction step.
In the setting of an acute MI trial, it would be difficult to obtain a baseline MRI before treatment. Hence, infarct size measurements cannot be compared before and after therapy, and efficacy will be based only on an unpaired analysis of the MRI after therapy. In the current study, the best method in each category had excellent and similar reproducibility (AUTO-UC
Segment, MANUAL-ISI, and VISUAL-ISI: CV = 8.3%, 8.3%, and 8.4%, respectively). Moreover, for these three methods, the within-patient variability due to the method was less than 10% of total variability. In other words, the inherent variability in infarct size in a STEMI cohort—the between-patient variability—was far larger than the variability due to the analysis method. The consequence was minimal differences in sample size calculations among the 3 optimized methods (see Table
5). This finding suggests that if performed in a trained core-laboratory, and explicit instructions are given to account for intermediate signal-intensities, manual planimetry and visual scoring may have comparable reproducibility to an automated technique.
Study limitations
In this study there is no pathology-based reference standard for infarct size. However, the primary aim of the study was to examine the reproducibility of methods for infarct size quantification, which is highly relevant for clinical trials using CMR infarct size as a surrogate endpoint. We tested only two computer algorithms for the automated approach (Segment and FWHM). Previous investigations have compared the reproducibility of various infarct contouring algorithms, [
10,
21] and our goal was not to confirm these findings. Instead, we aimed to evaluate the variability introduced by manual planimetry of LV endocardial/epicardial borders, for which there are no prior data. Since this is a required first step before the application of any automated algorithm, it is independent of the specific algorithm and is a relevant component for accurate sample size calculations. Visual identification of regions with intermediate signal-intensities requires an experienced reader, and even then is subjective. However, the current study was designed to simulate the ‘real-life’ situation of a CMR core-laboratory for a clinical trial, which typically involves experienced readers, and for which many steps are ultimately subjective. Moreover, the main goal was to show that despite some variability associated with this subjective step, if explicit instructions are provided, the variability of a subjective approach can be reduced to a level so that it no longer significantly impacts on sample size calculations. That said, it is important to highlight that our results are based on experienced readers, group completion of training sessions prior to performing measurements, and specific protocols on two scanner platforms and with one particular sequence for LGE. Findings should not be extrapolated outside this scenario, and it is possible that an untrained reader will have more reproducible infarct size measurements with an automated algorithm than with manual planimetry or visual scoring. Finally, the sample sizes reported in Table
5 should be treated with caution, since they are highly dependent on the standard deviation of infarct size in the enrolled population.