Introduction
Despite the recent progress made in cancer diagnosis and treatment, cancer remains the number one cause of death in the Western world [
1]. Although treatment can be very effective, most regimens fail for a substantial number of patients. Early response evaluation enables the treating physician to differentiate responders from non-responders, to stop the treatment in the non-responder cohort timely and reliably. This potentially helps to limit side effects of anticancer therapies and avoid treatment delay of subsequent lines, thereby reducing patient burden and healthcare costs.
Several imaging modalities can be used to non-invasively assess response to treatment. Most modalities only evaluate morphological features, yet slow changes in tumor morphology or even pseudoprogression, as can be seen in case of immunotherapy, impair the use of morphological features in early repsonse assessment [
2,
3]. However, morphological changes are often preceded by changes in tumor metabolism [
4]. These early functional changes can be assessed using molecular imaging techniques such as PET, which may allow for more accurate early response evaluation.
There are several different radiotracers available to assess a variety of metabolic processes. One of these tracers is 3′-deoxy-3′-[
18F]fluorothymidine (
18F–FLT) and provides a method to evaluate cellular proliferation. Proliferation is a central hallmark of tumor growth and previous studies have validated
18F–FLT against the immunohistochemistry proliferation marker Ki67 in pathological specimens for several tumor types [
5‐
7]. Unfortunately,
18F–FLT PET did not improve tumor detection or staging compared to 2-deoxy-2-[
18F]fluoro-
d-glucose (
18F–FDG) due to lower sensitivity [
8]. As proliferation is more cancer-specific compared to glycolysis,
18F–FLT PET has potential as an imaging biomarker for response assessment.
Cytotoxic and cytostatic therapies aim, respectively, to kill tumor cells (mainly highly proliferating cells) and diminish tumor growth, both leading to a decrease in cellular proliferation. After initiation of any antitumor treatment, this change in proliferation can be evaluated using
18F–FLT PET/CT. Several studies have been performed investigating
18F–FLT PET/CT as quantitative imaging biomarker of response [
9], nevertheless most did not take variability into account.
For
18F–FDG, the repeatability of quantitative uptake measures has been widely investigated [
10‐
13] and integrated into the response assessment criteria PERCIST [
2]. Up to now, repeatability of quantitative
18F–FLT PET/CT has only been studied in a few small single-center cohorts (≤ 10 patients) [
14‐
17]. Moreover, there was variability in uptake intervals, tumor delineation methods, and image analyses. The aim of this study was therefore to perform an individual patient data meta-analyses by re-analyzing all available
18F–FLT repeatability data from previously published studies and to determine the repeatability of several quantitative
18F–FLT tumor uptake metrics using similar uptake intervals, the same tumor segmentation method, and the same repeatability metrics as would be done in a prospective multi-center study.
Discussion
This individual patient data meta-analysis combined available data from four different
18F–FLT PET test–retest cohorts acquired in three different cancer types at three different centers. Of the quantitative
18F–FLT uptake measures commonly used in oncological setting, SUV metrics showed better repeatability overall than the volumetric metrics. Unfortunately, we did not obtain permission from one study to re-analyze their data [
17]. However, individual SUV
max, SUV
peak, and SUV
mean values were reported in this article. If these numbers are included in the analysis, RCs of the SUV metrics improve by approximately 2%, yet do not influence the results significantly.
If we compare our results to those published in the original reports, similar variability was found for SUV
max [
15,
16]. Repeatability of SUV
mean improved when threshold based segmentation was applied for the Trigonis et al. [
16] cohort (RC: 29.8 vs. 21.1). In contrast, variability of SUV
mean increased in the FBP dataset compared to manual delineation (RC: 20.6 vs. 41.9) [
14]. This is also seen when other segmentation algorithms are used for lesion delineation in this FBP reconstructed dataset and raises the issue of appropriateness of semi-automatic segmentation in FBP reconstructed images [
21]. Unfortunately, the raw data of this dataset were not available, so no reconstruction using OSEM could be performed.
The repeatability of
18F–FLT SUV metrics from this study is better than the 30% threshold suggested by PET response criteria in solid tumors (PERCIST) for
18F–FDG PET. The repeatability is similar to that found in a recent prospective multi-center study (
n = 10 patients, one lesion per patient; five institutions) on
18F–FLT in gliomas (RCs 19–23%) [
22]. In addition, our results are in line with multiple other single-center repeatability studies for several different tracers [
12,
23,
24]. In general, multi-institutional studies yield higher variability (RCs 28–47%) [
10,
11,
13]. The lower variability found in this study might be partly explained by the fact that data were acquired in strictly controlled single-center setting. Moreover, no differences in uptake time between the test and retest scans were present because static images were generated from dynamic scans. This removed the variability in uptake time on SUV that is typically encountered when acquiring static images. However, a previous study has shown that
18F–FLT tumor uptake reached equilibrium at 30 min post injection in NSCLC [
19].
Several other studies also found poorer repeatability of volumetric metrics compared to SUV metrics (RCs > 30%) [
12,
18]. In our study, VOIs were defined using semi-automatic segmentation to minimize user dependency. In two out of three original reports, manual delineation was used, potentially contributing to the observed differences [
14,
16]. It was expected that repeatability of volumetric metrics would be slightly worse in the FBP dataset due to higher noise levels and streak artifacts. In contrast to our expectation, PET/CT data showed a higher variability of proliferative volume and TLU compared to PET only data. Moreover, variability of proliferative volume was larger in our study compared to the original report for the PET/CT data (RCs 43.7 vs. 30.6%) [
16]. This discrepancy was mainly caused by low
18F–FLT uptake of lesions in the PET/CT dataset, resulting in low tumor-to-background ratios. As semi-automatic segmentation methods require adequate contrast between tumor and background radioactivity, accurate VOI definition can be compromised. This is supported by the fact that results significantly improve when including only lesions with SUV
max > 4.0.
Two studies validating simplified quantitative metrics of
18F–FLT uptake in NSCLC showed a stronger correlation of TBR with the uptake constant K
i (estimated from kinetic analysis) compared to SUV [
19,
25]. In our study, we found that normalizing SUV to blood pool radioactivity concentrations significantly increases variability for
18F–FLT images reconstructed with FBP. Moreover, TBR has been shown to be highly time dependent for
18F–FLT, limiting its use in response assessment, especially in busy clinical settings [
19,
26].
It is suggested that assessment of response per patient rather than per lesion may improve correlation with patient outcome [
27]. Similar to other studies, assessing repeatability per patient improved RCs by reducing the non-systematic differences between the test-and-retest scans. To our knowledge, only one study has been performed comparing response assessment per patient and per lesion [
28]. Here, no significant differences in performance of the two methods were found. Yet, in this
18F–FDG study, the same threshold of 30% to differentiate between stable disease and progressive disease or partial response was used for both methods [
28]. We therefore propose that future response assessment studies with
18F–FLT PET/CT should also assess the response per patient, while taking the per-patient variability into account.
In the current study, we have used symmetric limits to assess repeatability of quantitative
18F–FLT uptake metrics. Symmetrical RCs are commonly used in PET repeatability literature, however recent papers have discussed their applicability in daily clinical practice [
10,
29]. In test–retest studies, often no golden standard is available and therefore relative differences are calculated using the average of the two measurements. This differs from response assessment in clinical setting where change is determined relative to a single baseline value and therefore asymmetrical RCs are suggested to be more suitable. If we calculate asymmetric RCs at lesion level, the overall upper (URC) and lower limits (LRC) of the RCs are: SUV
max (URC: 29.4%; LRC: -22.7%); SUV
mean (URC: 29.0%; LRC: -22.5%); SUV
peak (URC: 26.0%; LRC -20.6%); TLU (URC: 44.6%; LRC -30.9%); and volume (URC: 43.7%; LRC: -30.4%). These results show a slight shift in RCs of SUV metrics compared to the symmetric limits, however remain within 30%. On a patient level, asymmetrical RCs improved RCs of SUV: SUV
max (URC: 21.1%; LRC: -18.3%); SUV
mean (URC: 15.3%; LRC: -23.3%); SUV
peak (URC: 16.8%; LRC -18.8%); TLU (URC: 34.1%; LRC -27.9%); and volume (URC: 36.3%; LRC: -28.7%).
The use of different PET scanners and the heterogeneity in reconstruction methods between cohorts could have contributed to the variability in the uptake and volumetric metrics. However, despite these limitations, repeatability of 18F–FLT was better compared to several other standardized multi-center studies that prospectively evaluated repeatability of 18F–FDG. In contrast to other meta-analyses, we increased robustness by re-analyzing all scans and thus minimizing variability due to data analysis and allowing direct comparison of quantitative uptake metrics. To date, this individual patient data meta-analysis provides the largest test–retest 18F–FLT PET cohort. These results should ideally be confirmed in a large prospective multi-center PET/CT study.