Introduction
The incidence of renal cell carcinoma up to age 75 is 4.4/100 000 worldwide (2.4% of new cancer cases, 1.7% of cancer deaths) (Ferlay et al.
2015) with higher incidence rates and declining mortality in developed countries (Znaor et al.
2015). For the survival of patients, it is crucial that radiological treatment monitoring provides substantial information for therapeutic decision making. However, a recent study suggests that the value of radiologic reports considerably depends on the methodical approach of assessment (Goebel et al.
2017).
Whilst in routine clinical practice, quantitative assessment criteria for morphologic CT image interpretation are rarely standardized and the method of free text reporting (FTR) is common, in clinical studies, consideration of standardized response evaluation criteria in solid tumors (RECIST) is required. With FTR, in general, no specific criteria for evaluation are defined. Radiologists routinely refer to the most previous finding. In contrast, RECIST based reporting refers either to baseline or to nadir. Moreover, with RECIST, selection of target lesions is specified. RECIST were updated in 2009 (validated RECIST 1.1 guideline) and are mainly to be applied in case of cytotoxic chemotherapy (Eisenhauer et al.
2009) whereas immune-RECIST (iRECIST consensus guideline) were proposed in 2017 and are supposed to be applied in patients who receive immunotherapy (Seymour et al.
2017). With immunotherapy, even initial progression due to infiltration of various immune cells into the tumor with subsequent reduction or stabilization of the tumor size (pseudoprogression) is associated with prolonged survival (Ma et al.
2019; Aykan et al.
2020). Application of iRECIST for the assessment of metastatic renal cell carcinoma (mRCC) associated tumor burden, supported by semi-automated comparison using specified references (i.e.: baseline or nadir) may improve radiological assessment and reporting quality even in all day clinical practice.
The purpose of our study was to evaluate whether assessment of disease response in patients with mRCC according to common practice FTR without specified evaluation criteria sufficiently agrees with that based on software-aided application of iRECIST.
Methods
Study population
This retrospective, single-center study was approved by the institutional review board.
We searched the institutional medical database and included 50 consecutive patients with mRCC who had been treated with immunotherapy between January 2015 and October 2020. Patients with a measurable tumor burden at baseline according to current guidelines on tumor response criteria (RECIST 1.1 and iRECIST Eisenhauer et al.
2009; Seymour et al.
2017)), who had completed at least three follow-up examinations with contrast-enhanced CT of thorax, abdomen, and pelvis were eligible for inclusion.
CT data acquisition
CT scans were performed using Revolution CT (GE Healthcare) with a detector width of 160 mm. After intravenous administration of a 1 ml/kg body weight bolus of contrast agent (Accupaque 350 mg, GE Healthcare) followed by a 50 ml saline flush, the imaging started with a bolus-triggered technique (monitoring frequency from 10 s after contrast injection: 1 per second; trigger threshold: an increase of 100 HU in the descending aorta; delay from trigger to initiation of scan: 15 s). For CT scans, parameters were as follows: 120 kVp, automatically set mAs values, reconstructed to a slice thickness of 3 mm and a pitch of 53.
Image analysis
Based on morphological evaluation of CT scans, FTRs had been prepared as part of daily clinical practice by two radiologists in agreement. At least one of them had more than 5 years of experience concerning assessment of tumor burden. Overall, 58 radiologists had participated in preparation of the considered FTRs. FTRs covered both description of pathologies and clinical interpretation including indication of tumor development. Assessment criteria had not been specified. Tumor burden after a given treatment had usually been compared to the recent prior CT examination, however, reference CT was not mentioned explicitly. For study purposes, if not already done by the examiner, a resident physician of the radiology department retrospectively assigned interpretation of FTRs to one of the following four disease response categories: complete response (CR), partial response (PR), stable disease (SD), or progressive disease (PD).
In parallel, we imported the same image datasets into a commercially available semi-automatic software (mint lesion version 3.7.3, MINT Medical GmbH) (Goebel et al.
2017). A radiologist with 13 years of experience, blinded to FTRs, retrospectively selected target lesions for baseline entries. Classification of target lesions was based on specified criteria including size, number per organ system, and reproducible measurability according to RECIST 1.1 guidelines. Lesions which did not meet these criteria were classified as non-target lesions (Eisenhauer et al.
2009). Subsequently, the radiologist manually measured the longest/shortest axis of each lesion with the aid of the software. The software automatically summed up the longest diameter of non-nodal target lesions and the short axis diameter of nodal target lesions. Lesions from follow-up CT scans were detected with support of the software and measured manually. The software again calculated the sum of lesion axis diameters at every follow-up to automatically compare it with the appropriate reference and assigned the respective disease category. According to iRECIST (Seymour et al.
2017), baseline CT serves as reference to determine response to treatment (iCR or iPR) and stable disease (iSD). Nadir (smallest sum of diameters so far) serves as reference to determine progression. There are two definitions of progress: unconfirmed progressive disease (iUPD) and confirmed progressive disease (iCPD: triggered by further progress after iUPD) (Online Resource: ESM Table 1). In addition, image analysis based on RECIST 1.1 was conducted to estimate to what extent pseudoprogression contributed to different assessment of tumor development.
Study outcome
In this study, unstandardized FTR on disease response was compared to software-supported assessment using iRECIST. The latter had been chosen as comparator because, in contrast to RECIST 1.1, iRECIST take effects of immunotherapy into account that may mimic tumor progression (pseudoprogression) (Seymour et al.
2017; Ma et al.
2019). Outcome of primary interest was strength of agreement regarding change in tumor burden according to FTR and iRECIST, quantified as weighted kappa. Secondary outcomes were proportionate agreement between the two approaches and associations of selected variables with different rating among FTR and iRECIST-based reports.
Statistical analysis
Categorical variables are presented as counts and percentages and continuous variables as means and standard deviations. Agreement was assessed with Cohen’s kappa statistics. Amount of difference in rating was considered using weighted kappa. Strength of agreement was interpreted according to Landis and Koch (
1977), (kappa ≤ 0.00: poor; 0.01–0.20: slight; 0.21–0.40: fair; 0.41–0.60: moderate; 0.61–0.80: substantial; 0.81–1.0: almost perfect agreement). Mann–Kendall test was used to determine whether agreement in tumor assessment had a trend over the series of follow-up examinations. Univariable analysis using logistic regression was applied to assess associations between selected variables with different rating of tumor burden with FTR or iRECIST. The p value threshold was adjusted for multiple testing (
p < 0.006). Analysis was performed using StatsDirect statistical software version 2.8.0. (StatsDirect Ltd.) and XLSTAT version 2015.6.01.24026 (Addinsoft).
Discussion
We conducted a retrospective comparison of FTR with reporting based on software-supported application of iRECIST to assess change in tumor burden on grounds of anatomically unidimensional CT scan measurement in patients with mRCC. Agreement between both approaches was only moderate. According to unstandardized FTR, new lesions which were already present in recent prior follow-ups were frequently not recognized as such. Different assessment following FTR compared to iRECIST was more frequently seen if lymph nodes were target lesions.
In daily clinical practice, evaluation of tumor burden in addition to symptomatic criteria represents a crucial parameter for therapy control. Although, the iRECIST guideline basically does not apply for clinical decision making (Seymour et al.
2017) and criteria still need to be validated, standardized response criteria could facilitate objectivity of assessment even in daily practice.
A previous trial on tumors of various origins (Goebel et al.
2017) reported on fair to moderate agreement between FTR and RECIST 1.1 based reports. In most cases, different reporting with FTR was attributed to assignment of even minor changes in tumor burden to PR or PD instead of SD. Another reason for discrepancies were comparisons to the most recent prior follow-up examinations instead to baseline or nadir. The latter was confirmed in our study. In addition, with FTR, new lesions that were already present were frequently not recognized as still existing new lesions, with the result of a too favorable rating. In case of lymph nodes, discrepancies might be based on lesion selection (iRECIST: a maximum of two nodes in total, even from different nodal basins) and measurement (iRECIST: at least 15 mm in short axis). Additionally, we found that the odds of too worse rating with FTR was reduced with increasing number and sum of diameters of target lesions. However, it seems conclusive that in advanced disease, worse rating is more frequently appropriate.
Assessment of pseudoprogression with iRECIST can only be determined with certainty after 4 weeks from the first detection of iUPD. After 4–8 weeks, disease could further progress and trigger iCPD or could stay progredient compared to nadir (iUPD) or show stable disease (iSD), partial response (iPR), or complete response (iCR) compared to baseline. This new assessment should prevent patients to be withdrawn from immunotherapy in case of pseudoprogression. An earlier study found that continued immunotherapy beyond iUPD can prolong survival (George et al.
2016). Such a review of assessment is not automatically provided with FTR, however, could be visualized using longitudinal analysis and graphical methods (Shen et al.
2014). In our study, FTR investigators identified pseudoprogression in some of the patients with increased lymph node axis despite of decreased sum of parenchymal lesion diameters. Frequency of pseudoprogression, determined with iRECIST was similar to previous findings in patients with mRCC (9–14%) (Queirolo et al.
2017).
For unexperienced readers, even RECIST contains pitfalls that may origin in baseline lesion selection, reassessment of lesions, and identification of new lesions (Abramson et al.
2015; Keil et al.
2014) identified the choice of target lesions as major source of disagreement between readers, even with consideration of RECIST 1.1. Thus, target lesion selection probably remains a subjective confounder in the assessment of tumor development. Target lesions should be unequivocally metastases, but not postoperative seroma or granulation tissue. Lesions within a radiation field should not be selected as target lesions unless progression is demonstrated. At follow-up, lesions should be remeasured in the same phase of contrast. Finally, even in case of axis shift, it is required to remeasure the true long (parenchymal lesions) or short axis (nodal lesions), respectively (Abramson et al.
2015).
RECIST/iRECIST are well described and known in clinical practice, however, rarely implemented. This may be due to need for assignment of the appropriate reference CT which requires request for baseline information including treatment modalities from oncologists. Moreover, reproducibility is supposed to increase when reference and follow-up CT scan is read by the same radiologist (Muenzel et al.
2012; Olthof et al.
2018). According to Krajewski et al. intra-rater variability in CT tumor size measurement ensures reproducibility of 10% tumor shrinkage measured by a single radiologist (Krajewski et al.
2014). However, this approach might be difficult to apply in small radiology departments. Thus, software-aided application of RECIST/iRECIST, however, can be a sufficient and time-saving support to increase reproducibility of the reporting. From this one might conclude that the large number of radiologists who prepared FTRs might have contributed to the decreased agreement with iRECIST-based reports conducted by a single radiologist.
This study has some limitations. First, assessment criteria applied for FTR were not reported. In addition, assignment of FTR interpretation to disease categories for study purposes was not conducted by oncologists, the actual receivers, but by radiologists. Anyway, assignment left scope for interpretation. Furthermore, only a single reader conducted assessment according to iRECIST and we did neither systematically consider treatment decisions nor clinical outcomes. However, as observed in patients in whom FTR indicated progressive- but iRECIST found stable disease, treatment decision, was not necessarily associated with CT assessment. Finally, our small-scale trial was conducted retrospectively at a single center and thus, should be considered as exploratory.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.