Introduction
Over the last 20 years,
18F-fluorodeoxyglucose (FDG) positron emission tomography (PET) has played an increasing role in the management of non-small cell lung cancer (NSCLC) patients for staging [
1] and restaging [
2,
3]. More recently,
18F-FDG PET has been used for response evaluation of chemotherapy and molecularly targeted therapies [
4‐
6]. The standardized uptake value (SUV) is the most frequently used quantitative parameter in oncology [
7]. When using SUV as a diagnostic [
8,
9] or prognostic [
10,
11] tool (i.e. single measurement) or for therapy monitoring (i.e. longitudinal studies) in multicentre trials or in sites equipped with multiple scanners, one needs to minimize the variability in semi-quantitative measurements by harmonizing both patient preparation in the PET unit and acquisition and reconstruction parameters [
12‐
14].
The European Association of Nuclear Medicine (EANM) and the Society of Nuclear Medicine (SNM) have published guidelines [
15,
16] regarding patient preparation, data acquisition, reconstruction parameters and definition of volume of interest (VOI) in or around the tumours. With regard to reconstruction parameters, the EANM guidelines, in line with the Netherlands protocol for standardization and quantification of
18F-FDG PET studies in multicentre trials [
17], provide recommendations based on an expected spatial resolution of the PET system equal to 7 mm. These recommendations include the use of the NEMA NU-2 phantom to check that activity concentration recoveries are concordant with those expected. Regarding quantitative analysis, SUV
max is currently the most frequently used quantitative parameter in oncological studies [
18] despite being a suboptimal parameter due to noise-induced bias [
19]. Therefore the EANM guidelines focus on getting comparable SUVs when using SUV
max in multicentre studies.
Hardware and software evolutions can lead to important device-dependent and reconstruction-dependent variations in quantitative values [
20‐
22]. For instance, point spread function (PSF) reconstruction, which improves spatial resolution throughout the entire field of view, has recently become commercially available in clinical PET/CT systems. Our group has shown that, by improving activity recovery, especially for non-enlarged nodes, PSF reconstruction significantly improves the diagnostic performance of
18F-FDG PET for nodal staging in NSCLC [
23]. On average, PSF reconstruction increases SUV
max and SUV
mean by 48 and 28 %, respectively. As a result, recovery coefficient (RC) values obtained with PSF reconstruction are much higher than EANM’s expected activity concentration recoveries as shown recently by Boellaard [
24].
There is therefore a need for standardization of reconstruction protocols, keeping in mind that centres running PET systems with advanced reconstruction algorithms that participate in multicentre trials often wish to use their PET system with parameters chosen in order to achieve optimal lesion detection. A solution to optimize PET image quality for diagnostic purposes and at the same time to be able to use quantitative values within the framework of multicentre trials is the use of an additional filtering step [
25] or to generate two sets of images: one to provide optimal diagnostic quality and a second one to meet quantitative harmonizing standards [
24], with NEMA NU-2 phantom-based filtering chosen so that activity concentration recoveries are as close as possible to those recommended by EANM guidelines.
We aimed at prospectively evaluating such a strategy in NSCLC patients imaged on a PET/CT system equipped with PSF reconstruction. For that purpose, in order to mimic a situation in which a patient would undergo pre- and post-treatment scans on different generation PET systems, the same PET raw data were reconstructed with an ordered subset expectation maximization (OSEM) algorithm known to produce activity concentration recoveries meeting EANM requirements, PSF reconstruction for optimal tumour detection and PSF reconstruction with a filter optimized to fulfil EANM requirements. In addition, the potential impact of several confounding factors [tumour size, location and type as well as patient body mass index (BMI) and image noise] on the accuracy of our method was studied.
Materials and methods
Patient population
During 6 months, 52 patients referred to our institution for staging or restaging of a NSCLC were included in this study. The study was approved by the local Ethics Committee (ref A12-D24-VOL13,
Comité de protection des personnes Nord Ouest III) waiving signed informed consent. Among these patients, ten underwent two PET examinations for the purpose of therapy monitoring. Patient demographics are described in Table
1.
Table 1
Patient demographics
Sex ratio (M/F) | 7.7 |
Age (years) |
Range | 46–80 |
Mean (SD) | 63.9 (7.9) |
Body habitus, n (%) |
BMI < 25 | 22 (42.3) |
BMI ≥ 25 to < 30 | 22 (42.3) |
BMI ≥ 30 | 8 (15.4) |
Histological diagnosis, n (%) |
Adenocarcinoma | 26 (50.0) |
Squamous cell carcinoma | 18 (34.6) |
Undifferentiated carcinoma | 4 (7.7) |
Large cell carcinoma | 2 (3.9) |
Adenosquamous carcinoma | 1 (1.9) |
Neuroendocrine carcinoma | 1 (1.9) |
Calibration and cross-calibration of the PET system
The calibration of the PET system was performed daily with a 68Ge cylinder with a known radioactive concentration. This cross-calibration procedure was performed twice during the present study. A solution of 18F-FDG (70.6 and 70.5 MBq, as assessed by the dose calibrator) was introduced into a cylindrical phantom with an exactly known volume and completed with water, which resulted in a solution with an exactly known concentration. A two-bed acquisition of the phantom was performed and images were reconstructed with attenuation and scatter correction identical to patient studies. Twelve VOIs were drawn on consecutive axial slices to determine the average activity concentration of 18F-FDG within the phantom. The cross-calibration factor was calculated as the ratio of the calculated activity and the true activity. The cross-calibration factors were found to be 0.99 and 1.04.
Phantom preparation
The phantom set is the International Electrotechnical Commission body phantom set, which consists of a torso cavity containing a 5-cm-diameter cylindrical insert filled with foam pellets with an average density of 0.30 g/ml positioned in the centre of the phantom to simulate lung tissue and six coaxial isocentred spheres with internal diameters of 10, 13, 17, 22, 28 and 37 mm. According to the EANM guidelines, the phantom was filled with a solution of 18F-FDG (2.0 kBq/ml) and all of the spheres with a radioactivity concentration of 20.0 kBq/ml resulting in a lesion to background activity ratio equal to 10.
Patient studies
The weight and height of patients on the day of the PET examination were recorded. BMI was computed as follows and was used to separate overweight (BMI > 25 to < 30 kg/m
2) and obese patients (BMI ≥ 30 kg/m
2) from low to normal weight patients (BMI < 25 kg/m
2):
$$ BMI=\frac{{weight\,\left( {kg} \right)}}{{height\,\left( {kg} \right)}} $$
After a 15-min rest in a warm room, patients who had been fasting for 6 h were injected with
18F-FDG. Mean (SD) injected activity was 4 (0.2) MBq per kg of body weight. The delay (SD) between tracer injection and image acquisition was 62 (4) min, thus meeting EANM guidelines [
15].
PET/CT acquisition and reconstruction parameters
All PET imaging studies were performed on a Biograph TrueV (Siemens Medical Solutions) with a 6-slice spiral CT component. The technical and performance characteristics of the PET component of the TrueV system can be found elsewhere [
26].
CT acquisition was performed first, with the following parameters: 60 mAs, 130 kVp, pitch 1 and 6 × 2 mm collimation. Subsequently, the PET emission acquisition was performed in 3-D mode. Patients were scanned from the skull base to the mid-thighs. For low to normal weight and overweight to obese patients, the duration was 2 min 40 s and 3 min 40 s, respectively. For phantom scanning, two bed positions were performed. The duration of each bed position was set to 2 min 40 s and 10 min, as per EANM guidelines. In addition, phantom studies with durations of 1 min 40 s and 3 min 40 s were performed in order to study the impact of image noise on the accuracy of our method.
In our department, PET images are reconstructed with a PSF reconstruction algorithm (HD; TrueX, Siemens Medical Solutions; 3 iterations and 21 subsets) without filtering (PSF
allpass), as modelling the PSF during iterative reconstruction introduces correlations between neighbouring voxels in a manner similar to smoothing filters and thus has been shown to achieve maximal performance with little or no filtering [
27].
For the purpose of this study, raw data were also reconstructed with the OSEM 3-D reconstruction algorithm (4 iterations and 8 subsets) and the PSF reconstruction algorithm (HD; TrueX, Siemens Medical Solutions; 3 iterations and 21 subsets) using a Gaussian filter and an increasing kernel ranging from 6 to 8 mm with a 0.5-mm increment. Only the PSF-reconstructed data without filtering were used for the purpose of diagnostic workup. The OSEM reconstruction parameters were chosen as recommended by the manufacturer. These parameters meet the EANM requirements regarding activity recoveries and they were recently used by another group with the same PET system [
28]. For all reconstructions, matrix size was 168 × 168, resulting in a 4.07 × 4.07 × 4.07 mm voxel size. Scatter and attenuation corrections were applied.
PET/CT analysis
Phantom studies
Activity concentration RCs as a function of sphere (tumour) size were measured. RCs are defined as the ratio between measured and true activity concentration in a sphere. For that purpose, 3-D 50 % isocontour VOIs were drawn over each sphere for each set of reconstructed data and maximum and mean pixel values were recorded.
Patient analysis
The same reader (CL) analysed all PET data sets to extract PET quantitative values for OSEM and PSF reconstructions. Regions of interest (ROIs) were drawn over primary tumour lesions, mediastinal and hilar nodes considered to have pathologically increased uptake and metastatic lesions. ROIs were drawn on the axial slice on which lesions displayed the highest 18F-FDG uptake, by means of a 50 % isocontour method.
The mean and maximum pixel values were extracted from each ROI and mean and maximum SUVs were computed as follows:
$$ SUV=\frac{{tumour\,activity\,\left( {{Bq \left/ {cc } \right.}} \right)\times body\,weight(g)}}{{injected\,dose\,\left( {Bq} \right)}} $$
Finally, short axis size (cm), as determined on axial CT slices, was recorded for each mediastinal and hilar lymph node.
For patients who underwent a post-therapeutic examination, the post-therapeutic status of each lesion was determined by using European Organization for Research and Treatment of Cancer (EORTC) criteria [
29,
30]. SUV
max, recorded as described above, was used. The changes in SUV
max between the PET1 and PET2 scans were recorded for all lesions. The percentage change in SUV
max allowed classification into the following groups:
-
Complete metabolic response (CMR): complete resolution of 18F-FDG uptake in the tumour volume (indistinguishable from surrounding normal tissue)
-
Partial metabolic response (PMR): at least 25 % reduction in tumour uptake
-
Stable metabolic disease (SMD): less than 25 % increase or less than 25 % decrease in tumour 18F-FDG SUV and no visible increase in extent of tumour uptake
-
Progressive metabolic disease (PMD): greater than 25 % increase in 18F-FDG tumour SUV within the tumour
Statistical analysis
The first step of the analysis was to determine the optimal filter settings for PSF reconstruction to meet EANM harmonizing standards. For that purpose, for all sets of reconstructed data, RCs for all spheres were compared to EANM expected values by means of the root mean square error (RMSE) method. The kernel size that minimizes the RMSE when compared to EANM expected values was selected as the optimal filter for PSF reconstruction on our PET/CT system. RMSE were computed with R, a freeware statistical package (
http://www.r-project.org/foundation/).
Quantitative data extracted from clinical PET/CT examinations are presented as mean (standard deviation, SD). In all statistical tests, a two-tailed
p value of less than 0.05 was considered statistically significant. The ratios between PSF
EANM and OSEM quantitative values (SUV
mean, SUV
max), according to lesion size, location and type (heterogeneous vs homogeneous uptake), BMI (low to normal weight vs overweight vs obese patients) and acquisition time per bed position (2 min 40 s vs 3 min 40 s) were compared using the Mann–Whitney test for unpaired samples and the Kruskal-Wallis test to compare multiple groups. The relationship between PSF
allpass or PSF
EANM and OSEM quantitative values was assessed using a linear regression analysis and Bland-Altman plots [
31]. In the subset of ten patients that underwent two PET/CT examinations for therapy monitoring purposes, levels of agreement between the different types of reconstruction were evaluated using the kappa statistic. The use of OSEM reconstruction both for pre- and post-therapeutic PET examination (OSEM
PET1/OSEM
PET2) was used as the “current standard” to determine the post-treatment status of each lesion. This was compared to the use of PSF
EANM reconstruction either for pre-therapeutic PET evaluation (PSF
EANM-PET1/OSEM
PET2) or for post-therapeutic PET evaluation (OSEM
PET1/PSF
EANM-PET2), to the use of PSF
allpass reconstruction either for pre-therapeutic PET evaluation (PSF
allpass-PET1/OSEM
PET2) or for post-therapeutic PET evaluation (OSEM
PET1/PSF
allpass-PET2) and to the use of PSF
EANM reconstruction for both pre- and post-therapeutic PET evaluation (PSF
EANM-PET1/PSF
EANM-PET2). Kappa values were reported using the benchmarks of Landis and Koch [
32] (0.81–1 almost perfect agreement, 0.61–0.8 substantial agreement, 0.41–0.6 moderate agreement and 0.21–0.4 fair agreement). For the kappa estimates, 95 % confidence intervals were calculated using bootstrapping. Graphs and analyses were carried out using the GraphPad software and VassarStats (
http://vassarstats.net/).
Discussion
18F-FDG PET has an increasing role in oncology for staging, restaging and therapy monitoring of chemotherapy and molecularly targeted therapies and is being increasingly implemented in clinical trials, especially for the early assessment of antineoplastic treatments. This prospective study in NSCLC patients validates a strategy allowing the use of quantitative values within the framework of multicentre trials, which is based on the production of protocol-specific images, in addition to images optimized for diagnostic purpose.
Standardized quantification of PET data in multicentre trials as described in the EANM guidelines allows for reliable and reproducible treatment response assessment. However, standardization remains a major challenge as new, more sensitive PET systems and reconstruction algorithms are continuously being developed and introduced into clinical practice [
20,
23,
35]. In the present study, we validated a strategy in which the recently introduced PSF reconstruction algorithm can be used not only for visual but also for quantitative analysis of PET imaging, whilst adhering to the EANM guidelines. Our results demonstrate, by mimicking a situation in which a patient would undergo the pre- and post-therapy PET scans on different generation PET systems, that it is possible to minimize reconstruction-dependent variability. Hence, Bland-Altman analysis (Fig.
2) showed that after having applied an adequate filter (PSF
EANM) the upper limit of the confidence intervals was 12 %, a value well below the 25 and 30 % cut-off values recommended by EORTC [
30] and PERCIST [
36], respectively, to discriminate between responders and non-responders when using
18F-FDG PET for therapy monitoring. Importantly, we confirmed this finding in a subset of ten patients who underwent two PET examinations for response assessment (Table
2). In these patients, an excellent agreement was found (kappa values 0.95 and 0.99) in the post-treatment classification of 84 lesions according to EORTC criteria when comparing PSF
EANM either pre- or post-therapy to OSEM as the current standard, and no major discordance occurred. However, when the PSF
allpass data were used either pre- or post-therapy compared to OSEM, we saw considerably less agreement. Due to system updates on existing PET systems or the purchase of a new PET machine, OSEM
PET1/PSF
allpass-PET2 is the situation most likely to occur. In this situation, our data showed discordance in 27.4 % of lesions.
The proposed strategy can be useful in the case of patients undergoing pre- and post-treatment scans on different PET systems, for example in centres running two or more PET systems or updating their equipment during the course of a trial. Of course, it would be preferable to scan the patient repeatedly on the same machine, but in practice this is often not possible. Moreover, in the setting of multicentre trials there are two other situations in which standardization of PET quantitative values is required: when pooling SUV from different PET/CT systems for diagnostic purposes (i.e. to determine a specific diagnostic threshold value for a given disease) [
8,
9] or as a prognostic tool (i.e. to search for the impact of tumour tracer uptake on disease-free and overall survival) [
10,
11].
Regarding practical issues related to the proposed methodology, determination of the appropriate filter per PET system is required by performing the phantom studies and reconstructions with a Gaussian filter with increasing kernel as described in the “
Materials and methods” section. Once the optimal filter meeting the EANM expected values is determined, the filtered PET data can be used for both local and multicentre quantitative PET analysis. This method can be readily applied on any PET scanner equipped with PSF; the purchase of additional software is not necessary. However, this method does not obviate the need to generate a second data set which is time consuming. Of course, the choice to use either an OSEM reconstruction or a filtered PSF algorithm for the standardized quantitative analysis remains a choice of local nuclear medicine physicians, physicists and researchers, just like the choice to systematically reconstruct non-attenuation-corrected images or only when clinically needed. Choosing PSF
EANM could be the preferred solution, as PSF reconstruction is meant to progressively replace former generation PET systems.
As pointed out by Boellaard [
24], patients are frequently included in clinical trials after the first PET examination has been performed. This emphasizes the need to standardize the PET procedure from the very beginning of patient care. However, PET acquisition and reconstruction parameters are not the only source of variability that has to be taken into account. Other technical and biological factors also affect SUV measurements. These factors have been discussed extensively elsewhere [
12,
24,
37]. In the present study, one technical factor, the reconstruction protocol, has been analysed. To minimize the influence of the other technical and biological factors affecting SUV measurements in this study, all PET examinations were performed according to the EANM guidelines. Of note, the injected activity per kilogram and the delay between injection and acquisition met the EANM requirements.
The potential impact of image noise on the accuracy of our method was evaluated in phantom studies by varying the acquisition time. Calculation of the RMSE values between PSF
EANM and EANM expected values showed that our strategy performed well when image noise was higher, the values being similar for the shortest and longest acquisition times. This was confirmed by clinical data showing no difference in PSF
EANM/OSEM ratios for the 2 min 40 s and 3 min 40 s per bed position acquisition times (Fig.
3e).
We found no confounding factors (lesion size and location, tumour heterogeneity, patient BMI) affecting the accuracy of our method. However, we noticed a trend towards higher PSF
EANM/OSEM ratios in overweight and obese patients for SUV
max (Fig.
3b). This may be due to the fact that noise in PET images is higher in obese patients and SUV
max is more affected by noise than SUV
mean. The observed difference was minimal and did not affect the EORTC classification based on SUV
max (Table
3). The use of SUV
peak, which is defined as the mean value within an ROI centred on the area with the highest uptake, has been reported as a slightly more robust alternative for assessing the most metabolically active part of a tumour [
19]. However, SUV
peak is highly sensitive to the ROI
peak definition (i.e. shape, size and location) [
38], was shown to have similar repeatability as compared to SUV
max [
39] and does not necessarily perform better than SUV
max for therapy assessment [
40]. In the present study, a wide range of tumour intensities was studied and no systematic error was depicted by Bland-Altman analysis (i.e. the strategy performs equally for lesions with low
18F-FDG avidity and for those with very intense
18F-FDG uptake). This finding, taken together with the lack of confounding factors affecting our strategy, suggests that it could be applicable in other solid tumours.
Conclusion
The generation of protocol-specific images with NEMA NU-2 phantom-based filtering to meet EANM quantitative harmonizing standards, in addition to images optimized for diagnostic purposes, reduces reconstruction-dependent variation in SUVs. This can be of use in multicentre trials, when using SUV for therapy monitoring, or as a diagnostic or prognostic tool. As no confounding factors (lesion size and location, tumour heterogeneity, patient BMI, image noise) affecting the accuracy of our method were found, this strategy validated in NSCLC patients could be extrapolated to other solid tumours.