ReviewObserver variability in RECIST-based tumour burden measurements: a meta-analysis
Introduction
Assessment of tumour burden is essential in phase II and III oncology trials in order to evaluate the therapeutic effect of anti-cancer drugs. Since the Response Evaluation Criteria in Solid Tumours (RECIST) guideline was introduced in 2000 [1] and revised in 2009 [2] for the standardised assessment of tumour burden, it has been widely implemented in cancer trials. According to this guideline [2], unidimensional longest diameters of target lesions are measured and summed on cross-sectional imaging modalities, principally computed tomography (CT), to capture tumour burdens. The interval change of the measured tumour burden between pre- and post-treatment with an anti-cancer agent is then calculated to determine tumour response. There are four categories of response: complete remission, partial response (−30% or less), stable disease (−30% to 20%), and progressive disease (20% or more). Of these, precise determination of progressive disease is especially important as ‘progression-free survival’ is increasingly replacing ‘overall survival’ as a primary end-point in current cancer trials [3].
The RECIST guidelines depend on measurements of tumour burden being reproducible [1], [2], and response categorisation has been regarded as relatively consistent [4], [5], [6]. However, as target lesions are selected and unidimensional longest diameters are measured manually, discrepancies within multiple readings or between different individuals can cause inconsistency in response categorisation [7], [8]. The impact of measurement variability in response categorisation might be underappreciated in cases where measured tumour burdens are close to the 20% cut-off value of progressive disease (i.e. 10–30%) and are vulnerable to miscategorization [9]. Observer variability in measurements tends to attenuate estimates of treatment effect and increase type II errors; this reduces statistical power and requires greater sample sizes, impacting the design of clinical trials [10], [11]. Figuring out the extent of observer variability in manual measurements of tumours can be a requisite for defining cut-off values for semi- or fully automated measurements [12] and for the introduction of promising alternative continuous end-points such as time to tumour growth in clinical trials [13].
To assess the reproducibility and the repeatability of the tumour size measurements in relevant studies, the relative measurement difference (RMD) between two measurements is commonly computed as a difference between the two measurements divided either by the first measurement or by an average of the first and the second measurements. Agreement in the two measurements is shown then visually using Bland–Altman plots with 95% limit of agreement (LOA) by plotting the RMDs against its denominator [14], [15], [16]. There have been quite a few studies investigating the observer variability in RECIST-based tumour burden measurement mainly in terms of the RMD and yet there has not been an overall suggestion based on a comprehensive summary of those results so far.
Hence, we conducted a meta-analysis to investigate inter- and intra-observer variability in manual measurements of tumour burdens on CT scanning according to the RECIST guidelines.
Section snippets
Search strategy
Two authors (S.H.Y. and K.W.K.) independently performed OVID/MEDLINE and EMBASE database literature searches to identify relevant publications on observer-related variability of RECIST measurements using CT scans. Search terms included keywords relating to ‘tumor’, ‘measurement’, ‘variability’, and ‘CT’. Searches were limited to English-language publications and human studies (Supplemental Table 1). The search was current as of March 2014 and supplemented by screening the bibliographies of
Literature search
Of 9231 references identified during our initial database search, 11 studies were ultimately included in our analysis [15], [16], [20], [21], [22], [23], [24], [25], [26], [27], [28] (Table 1, Fig. 1 and Supplemental Table 2).
There were nine studies (eight inter-observer and five intra-observer) [15], [16], [20], [21], [22], [23], [24], [25], [26] evaluating observer variability in measuring the unidimensional longest diameter of a single lesion and three studies (three inter-observer and two
Discussion
Our meta-analysis revealed that inter-observer variability occurs in both measuring the tumour burden and calculating the interval change for single lesions, including whether the 20% RECIST cut-off value for progressive disease has been exceeded. Measurement variability decreased when a single observer measured tumour burden and when multiple lesions were measured. The single-observer results are consistent with those of previous studies, while the decrease in variability with multiple lesions
Funding
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (no. 2010-0028631).
Conflict of interest statement
Seokyung Hahn had received consulting honorarium from the Roche and research funding from the Novartis not related with this study. Other authors have no conflicts of interest to declare.
References (43)
- et al.
New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1)
Eur J Cancer
(2009) - et al.
Blinded independent central review of progression in cancer clinical trials: results from a meta-analysis
Eur J Cancer
(2011) - et al.
CT tumor volume measurement in advanced non-small-cell lung cancer: performance characteristics of an emerging clinical tool
Acad Radiol
(2011) - et al.
Semi-automatic software increases CT measurement accuracy but not response classification of colorectal liver metastases after chemotherapy
Eur J Radiol
(2012) - et al.
Inter-observer reproducibility of semi-automatic tumor diameter measurement and volumetric analysis in patients with lung cancer
Lung Cancer
(2013) - et al.
Exploring intra- and inter-reader variability in uni-dimensional, bi-dimensional, and volumetric measurements of solid tumors on CT scans reconstructed at different slice intervals
Eur J Radiol
(2013) - et al.
Volumetric response classification in metastatic solid tumors on MSCT: initial results in a whole-body setting
Eur J Radiol
(2013) - et al.
A theoretical approach to choosing the minimum number of multiple tumors required for assessing treatment response
J Clin Epidemiol
(2005) - et al.
A simulation study to evaluate the impact of the number of lesions measured on response assessment
Eur J Cancer
(2009) - et al.
Tumor response assessment by measuring the single largest lesion per organ in patients with advanced non-small cell lung cancer
Lung Cancer
(2014)
Relation between initial blood-pressure and its fall with treatment
Lancet
New guidelines to evaluate the response to treatment in solid Tumors
J Natl Cancer Inst
Overview: progression-free survival as an endpoint in clinical trials with solid tumors
Clin Cancer Res
Interobserver and intraobserver variability in the response evaluation of cancer therapy according to RECIST and WHO-criteria
Acta Oncol
Evaluation of blinded independent central review of tumor progression in oncology clinical trials: a meta-analysis
Ther Innov Regul Sci
Response rate accuracy in oncology trials: reasons for interobserver variability. Groupe Francais d'Immunotherapie of the Federation Nationale des Centres de Lutte Contre le Cancer
J Clin Oncol
Interobserver and intraobserver variability in measurement of non-small-cell carcinoma lung lesions: implications for assessment of tumor response
J Clin Oncol
Use and misuse of waterfall plots
J Natl Cancer Inst
Measurement error in the timing of events: effect on survival analyses in randomized clinical trials
Clin Trials
Attenuation of treatment effect due to measurement variability in assessment of progression-free survival
Pharm Stat
The imaging viewpoint: how imaging affects determination of progression-free survival
Clin Cancer Res
Cited by (65)
Challenges, Complexities, and Considerations in the Design and Interpretation of Late-Phase Oncology Trials
2023, Seminars in Radiation OncologyAppropriate use of morphological imaging for assessing treatment response and disease progression of neuroendocrine tumors
2023, Best Practice and Research: Clinical Endocrinology and MetabolismEvaluation of total tumor volume reduction ratio in initially unresectable colorectal liver metastases after first-line systemic treatment
2023, European Journal of RadiologyPrimary cutaneous lymphoma: recommendations for clinical trial design and staging update from the ISCL, USCLC, and EORTC
2022, BloodCitation Excerpt :If determination of nodal response is by LN size, for nodes ≤1.5 cm in LDi prior to baseline, they must increase to >1.5 cm and increase by ≥50% from PPD nadir to be considered PD unless biopsy proves otherwise. This definition of PD specifically prevents the 10% to 20% standard error in imaging47,48,49 from creating an erroneous response of PD in LNs of 1 to 1.4 cm LDi that show minimal enlargement. If determination of nodal response in PET-CT is by metabolic score, the recommendations of the Lugano classification, or any published update that is accepted by the Tri-Societies, should be used.
Reducing number of target lesions for RECIST1.1 to predict survivals in patients with advanced non-small-cell lung cancer undergoing anti-PD1/PD-L1 monotherapy
2022, Lung CancerCitation Excerpt :The number of lesions measured accounts for one of the sources of intra- and inter-observer variability in response evaluation [25]. Assessing only one deposit seemed to provide adequate prediction of OS, as shown in our work and other studies [13,26], yet we found that measuring one target lesion yielded lower intra-individual and inter-method agreement than measuring two or more target lesions, as found in other studies [27,28]. Taken together, our data support the rationale of a two-lesion-based response assessment and the consideration of a revision for RECIST guidelines to improve convenience and feasibility of RECIST.
An omic and multidimensional spatial atlas from serial biopsies of an evolving metastatic breast cancer
2022, Cell Reports MedicineCitation Excerpt :All lesions meeting these criteria were recorded both on FDG-PET/CT and combined with long axis measurements (e.g., liver, splenic, lung lesions) and long and short-axis measures (lymph nodes) at all time points during the care of the patient. Variability in the measurement of the long axis of each lesion was estimated to be about 20%.85 Change in tumor burden was assessed for each phase of treatment using RECIST 1.1 criteria.8