Introduction
Positron-emission tomography (PET) enables in vivo assessment of metabolic and intracellular processes. Whereas in clinical practice, PET is predominantly used to qualitatively assess tracer uptake, PET(/computed tomography [CT]) may also serve as a surrogate quantitative biomarker of, for example, tumor metabolism and proliferation. The application of quantitative tumor assessment methods for distinguishing benign from malignant lesions, staging, prognostication, and determining or predicting response to therapy has garnered increasing interest [
1‐
4].
Accurate quantification of metabolic volumes <2–3× the spatial resolution of PET is hampered by partial-volume effects, leading to underestimations of standardized uptake value (SUV), and possibly compromising lesion detection [
5,
6]. Many methods for partial-volume correction (PVC) have been advocated [
7]. The simplest technique uses recovery coefficients (RC) obtained from phantom experiments under the assumption that true metabolic volume is known and that lesions are spherically shaped with homogeneous uptake. More sophisticated methods have been developed, but all suffer from limitations [
7,
8]. Voxel-wise resolution recovery methods, incorporating the point spread function (PSF) within iterative reconstruction [
9] (PSF reconstruction) or performing post-reconstruction iterative deconvolution [
10], could improve both qualitative and quantitative reads. To date, consensus on standardized application of PVC in oncological PET/CT studies is lacking, and perhaps as a consequence PVC is not yet routinely applied. In fact, most current clinical quantitative PET studies merely exclude small lesions (e.g. <2 cm in diameter), as recommended in the PET Response Criteria in Solid Tumors (PERCIST) criteria [
3].
The clinical impact of PVC in an oncological setting, and thus the need for standardized application, is not yet fully elucidated [
7]. We performed a systematic review and meta-analysis to assess the impact of PVC in clinical PET studies, focusing on diagnosis, staging, prognostication, and response assessment.
Materials and methods
Search strategy
This systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement. A comprehensive search (Supplemental Tables
1 and
2), in collaboration with a medical librarian (LJS), was performed in PubMed and Embase.com from inception to May 9, 2016. Both controlled terms (MesH in PubMed, Emtree in Embase) and free-text terms were included in the search. The following were used (including synonyms and closely related words) as index terms or free-text words: ‘positron-emission tomography or ‘PET’ and ‘partial volume correction’ or ‘point spread function reconstruction’ and ‘neoplasms’ or ‘cancer’.
Selection process
Abstracts and titles of all studies retrieved from the search were independently screened by two researchers (MCFC and GMK). Afterwards, eligible articles were studied in full text. In case of differences in judgment, consensus was reached through discussion. Cross-referencing was performed to further identify relevant articles.
The following were included: studies applying PVC in clinical PET studies, using oncological patients, reporting PET data with and without PVC, and investigating clinical impact of PVC on either diagnosis, staging, prognostication (reporting survival data), or response assessment.
Exclusion criteria were as follows: reviews, letters, editorials, conference abstracts, case reports, full text not available or not in English, no adequate reference data, no description of or reference to PVC method, combined PVC and motion blur correction method, or patient cohort overlapping with another included study.
Quality assessment
The quality of included articles was assessed (independently by MCFC and GMK) according to the QUADAS-2 [
11] (
n = 25) or QUIPS [
12] (
n = 12) tools. QUADAS-2 assesses bias and applicability of diagnostic studies, whereas QUIPS assesses bias of studies investigating prognostic factors. Staging and response assessment studies were assigned to either of the quality assessment tools. Consensus was reached through discussion.
Both researchers independently extracted results regarding impact of PVC on diagnostic accuracy (for diagnosis and staging), prediction of survival (for prognostication), and response assessment. Measures of diagnostic accuracy were derived with and without PVC. If test characteristics were described for subgroups, overall measures of accuracy were calculated when possible. When p-values of differences in accuracy between uncorrected and PVC data were not reported, these differences were deemed not statistically significant. Descriptive data regarding cancer type, number of patients, lesion sizes, scanner type, and PVC method were also extracted. Unless stated otherwise, we presented data on SUV quantification.
Diagnostic studies on the same topic were pooled using bivariate random effects meta-regression analysis, which is the recommended method for meta-analysis of diagnostic studies [
13]. This method provides summary estimates of sensitivity and specificity with 95% confidence intervals, taking into account the correlation between sensitivity and specificity and heterogeneity in results between studies. We tested for differences in overall diagnostic accuracy between different diagnostic tests using a likelihood ratio test, comparing models that included and excluded a covariate for the diagnostic test. For illustrative purposes, summary receiver operating characteristic (ROC) curves were calculated according to the Moses-Littenberg method [
14]. We used Stata software (version 14; StataCorp LP, College Station, TX) for statistical analyses.
Discussion
Quantification of functional tumor characteristics with PET is considered to be useful in clinical oncology, and often uses semi-quantitative analyses, resulting in SUVs. Unfortunately, partial-volume effects are known to cause underestimation of tumor activity, and hence the necessity of PVC for accurate semi-quantitative reads for small lesions is well recognized [
5]. However, many factors affect its accuracy and potentially hamper its optimal usage. Perhaps as a consequence, its resulting advantage in oncological PET studies is not yet evident. Additionally, the lack of consensus on the preferred PVC and delineation method may result in suboptimal results and could hamper comparisons between studies. This review discusses the clinical impact of PVC and provides recommendations for specific research questions and analyses to be included in future studies applying PVC.
When applied to diagnosis of primary lesions and (mainly nodal) staging, PVC often yielded higher sensitivity at the expense of specificity (Tables
1 and
2 and Figs.
3 and
4), which is an obvious consequence when using the same test positivity SUV thresholds for uncorrected and PVC data. In the subset of studies which allowed statistical pooling (679 lesions), meta-analysis showed that PVC did not significantly alter the overall diagnostic accuracy in characterizing pulmonary lesions with PET (Fig.
5). When estimating the effect of PVC, the optimal trade-off between sensitivity and specificity (the SUV threshold of test positivity) may be different for PVC and uncorrected data. At an exploratory level, one should define this cut-off for either method. Of note, Degirmenci et al. (on pulmonary nodules) used data-driven SUV cut-offs of 2.4 and 2.9 for uncorrected and PVC data, respectively, which yielded a specificity fixed at 80%, with sensitivity of 62 and 73% for uncorrected and PVC data, respectively [
21]. We performed a similar analysis using the (individual patient) data from Hickeson et al. [
18]. At a predefined SUV cut-off of 2.5, PVC decreased specificity and increased sensitivity (Table
1). However, when applying cut-offs of 2.55 and 2.8 (as derived from ROC analysis) for uncorrected and PVC data, respectively, PVC increased sensitivity from 72 to 94%, while specificity remained constant at 91%. This further demonstrates that PVC may indeed increase diagnostic accuracy when SUV cut-offs are adequately adapted for this correction. Obviously, each proposed threshold requires external validation.
Another explanation for the limited impact of PVC on diagnostic accuracy as published in the literature may relate to the size spectra of included lesions, with the distribution of benign and malignant lesions therein. When performing PVC analysis simultaneously on all lesions, both large and small, the overall impact of PVC on diagnostic accuracy will be diminished. Indeed, several studies demonstrated a high impact of PVC on accuracy for small lesions (when stratifying for lesion size), but less so when including all lesions regardless of size [
18,
32]. Therefore, we suggest that investigators stratify diagnostic performance results for lesion size in secondary analyses. However, since partial-volume effects are not merely size-dependent, but are also affected by lesion contrast and shape, reliable classification of lesions that are (most) affected by partial-volume effects will be difficult. In our previous simulation study, we observed that for high-contrast spherical lesions, partial-volume effects started to occur below 3-cm diameter [
8]. A practical approach for stratification would thus be to stratify results using a 3-cm lesion diameter or a 14-mL metabolic volume cut-off (corresponding to a 3-cm-diameter sphere). Even though larger lesions may also be somewhat affected by partial-volume effects, depending on their shape and contrast, such a size cut-off will ensure that lesions that are most affected by partial-volume effects are separated. Another approach would be to plot the percentage increases in SUV after PVC as a function of metabolic tumor volume to determine an appropriate size cut-off for stratification of results within studies (not possible when applying the RC method).
Regarding visual nodal staging, PSF reconstruction did not significantly alter accuracy, but tended to increase sensitivity in lung, breast, and colorectal cancer (Table
2) [
28,
30,
34]. This may be attributed to improved qualitative reads, improved (small) lesion detection, and higher diagnostic confidence [
28,
30,
34]. Therefore, it may be worthwhile to validate these higher-resolution reconstruction algorithms for use in clinical practice, especially for detection of small lymph node metastases and lesions embedded in high background activity such as in the liver or mediastinum. However, PSF reconstructions may suffer from Gibbs artifacts (overshoot in activity); moreover, they are known not to guarantee full signal recovery [
9]. Also, further research into their impact on compliance with European Association of Nuclear Medicine (EANM) standards is needed to ensure equal scanner calibration in multicenter quantitative PET/CT studies, which may require an SUV harmonization procedure [
46].
We found that PVC might improve prognostication in head and neck cancer [
39,
40], but these studies did not stratify for the human papillomavirus status, a prognostic marker associated with lower tumor SUV and smaller metabolically active tumor volume (MATV) [
47]. For future studies, please note that appropriate PVC may not necessarily improve prognostication with SUV, but instead may enable it to reflect its true prognostic value. For example, Vesselle et al. found that PVC mitigated the correlation between primary tumor SUV and overall survival in NSCLC patients, and they also observed that the correlation between SUV and overall TNM stage, which in essence is based on patient prognosis, disappeared after PVC, suggesting that the ‘prognostic value’ of uncorrected SUV was based on tumor volume rather than metabolic activity [
5,
25,
48].
For response assessment, no conclusions regarding the effect of PVC can be made at this point due to the small number of heterogeneous studies. One included study demonstrated that after PVC PERCIST classification of response was altered for 5/24 NSCLC lesions during radio- or radiochemotherapy [
45]. This is an important observation, since, conceptually, PVC may correct changes in SUV during treatment for changes in tumor volume and contrast, allowing for more appropriate PET-based classification of tumor response. Interestingly, two studies (excluded since no clinical verification was performed) demonstrated PVC to alter response classifications according to European Organisation for Research and Treatment of Cancer (EORTC) or PERCIST criteria in patients with bone metastases and NSCLC [
39,
49]. In conclusion, future PET response assessment studies should include PVC to allow for metabolic response assessment, irrespective of tumor shrinkage or growth, and quantify its clinical impact.
To improve comparison of PVC’s impact between studies, consensus on the preferred combination of PVC and lesion delineation methodologies should be reached. Many PVC methods have been advocated, some specific for oncological application [
5,
7,
50,
51]. Still, most studies in this review applied an RC method, a quite simple method assuming spherically shaped lesions, homogeneous activity distributions, and known tumor sizes. Using this method, even small errors in tumor size measurements may result in over- or underestimations of true SUVs. Also, size measurements are often CT-based, whereas partial-volume effects affect metabolic volumes, which may be different from anatomical tumor volume [
52] (e.g. due to necrosis and treatment effects). In a previous phantom and simulation study we found that voxel-wise PVC methods such as iterative deconvolution may be preferred, since this only assumes approximate knowledge of PET/CT systems’ resolution kernel size, has low dependency on accurate delineation, and has only limited effect on precision [
8]. Additionally, such a voxel-wise PVC method could allow for more accurate delineation of tumors [
53] and, theoretically, heterogeneous tumor background. However, iterative deconvolution is known to increase image noise levels, which may require some form of a denoising algorithm to be applied [
37]. Iterative deconvolution may be relatively easy to implement, and has been demonstrated to perform well using commonly applied background-adapted threshold-based delineation methods [
8]. To date, iterative deconvolution has been applied predominantly by the same research group (Supplemental Table
3); more extensive clinical evaluation is warranted. Our previous phantom and simulation study showed that for lesions ≤10 mm in diameter, even with PVC, the acquisition of fully accurate results was not yet possible [
8], which may contribute to the relatively low impact of PVC. Owing to heterogeneity between studies, the impact of chosen PVC methods on outcomes cannot be established in this review.
A limitation of this systematic review and the meta-analysis was the small number of studies included (only six diagnostic studies could be pooled; which is the maximum number of studies in any of the other subsections), with several sources of heterogeneity, such as the included lesion types, malignancy prevalence, lesion size spectra, PET acquisition and reconstruction settings, quantitation methods, and methodological quality. The overall study quality as assessed by QUADAS and QUIPS was good (Fig.
2), but more specific research questions regarding PVC are needed, along with more rigorous designs. Although it was a limitation in this review, the small number of retrieved studies applying PVC in oncology is also an important finding, highlighting the reduced application of PVC in recent decades.