Observer variability in RECIST-based tumour burden measurements: a meta-analysis

doi:10.1016/j.ejca.2015.10.014

European Journal of Cancer

Volume 53, January 2016, Pages 5-15

https://doi.org/10.1016/j.ejca.2015.10.014 Get rights and content

Highlights

•
Relative measurement difference may exceed 20% in single lesion between observers.
•
Variability decreased when measured by a single observer or with multiple lesions.
•
Studies on interval change for single observer or with multiple lesions were lacking.

Abstract

Background

Response Evaluation Criteria in Solid Tumours (RECIST)-based tumour burden measurements involve observer variability, the extent of which ought to be determined.

Methods

A literature search identified studies on observer variability during manual measurements of tumour burdens via computed tomography according to the RECIST guideline. The 95% limit of agreement (LOA) values of relative measurement difference (RMD) were pooled using a random-effects model.

Results

Twelve studies were included. Pooled 95% LOAs of RMD in measuring unidimensional longest diameters of single lesions ranged from −22.1% (95% confidence interval [CI], −30.3% to −14.0%) to 25.4% (95% CI, 17.2% to 33.5%) between observers and −17.8% (95% CI, −23.6% to −11.9%) to 16.1% (95% CI, 10.1% to 21.8%) for a single observer. Pooled 95% LOAs of RMD in measuring the sum of multiple lesions ranged from −19.2% (95% CI, −23.7% to −14.9%) to 19.5% (95% CI, 15.2% to 23.9%) between observers, and −9.8% (95% CI, −19.0% to −0.3%) to 13.1% (95% CI, 3.6% to 22.6%) for a single observer. Pooled 95% LOA of RMD in calculating the interval change of tumour burden with a single lesion ranged from −31.3% (95% CI, −46.0% to −16.5%) to 30.3% (95% CI, 15.3% to 44.8%) between observers. Studies on calculating the interval change of tumour burden for a single observer or with multiple lesions were lacking.

Conclusion

Interobserver RMD in measuring single tumour burden and calculating its interval change may exceed the 20% cut-off for progression. Variability decreased when tumour burden was measured by a single observer or assessed by the sum of multiple lesions.

Introduction

Assessment of tumour burden is essential in phase II and III oncology trials in order to evaluate the therapeutic effect of anti-cancer drugs. Since the Response Evaluation Criteria in Solid Tumours (RECIST) guideline was introduced in 2000 [1] and revised in 2009 [2] for the standardised assessment of tumour burden, it has been widely implemented in cancer trials. According to this guideline [2], unidimensional longest diameters of target lesions are measured and summed on cross-sectional imaging modalities, principally computed tomography (CT), to capture tumour burdens. The interval change of the measured tumour burden between pre- and post-treatment with an anti-cancer agent is then calculated to determine tumour response. There are four categories of response: complete remission, partial response (−30% or less), stable disease (−30% to 20%), and progressive disease (20% or more). Of these, precise determination of progressive disease is especially important as ‘progression-free survival’ is increasingly replacing ‘overall survival’ as a primary end-point in current cancer trials [3].

The RECIST guidelines depend on measurements of tumour burden being reproducible [1], [2], and response categorisation has been regarded as relatively consistent [4], [5], [6]. However, as target lesions are selected and unidimensional longest diameters are measured manually, discrepancies within multiple readings or between different individuals can cause inconsistency in response categorisation [7], [8]. The impact of measurement variability in response categorisation might be underappreciated in cases where measured tumour burdens are close to the 20% cut-off value of progressive disease (i.e. 10–30%) and are vulnerable to miscategorization [9]. Observer variability in measurements tends to attenuate estimates of treatment effect and increase type II errors; this reduces statistical power and requires greater sample sizes, impacting the design of clinical trials [10], [11]. Figuring out the extent of observer variability in manual measurements of tumours can be a requisite for defining cut-off values for semi- or fully automated measurements [12] and for the introduction of promising alternative continuous end-points such as time to tumour growth in clinical trials [13].

To assess the reproducibility and the repeatability of the tumour size measurements in relevant studies, the relative measurement difference (RMD) between two measurements is commonly computed as a difference between the two measurements divided either by the first measurement or by an average of the first and the second measurements. Agreement in the two measurements is shown then visually using Bland–Altman plots with 95% limit of agreement (LOA) by plotting the RMDs against its denominator [14], [15], [16]. There have been quite a few studies investigating the observer variability in RECIST-based tumour burden measurement mainly in terms of the RMD and yet there has not been an overall suggestion based on a comprehensive summary of those results so far.

Hence, we conducted a meta-analysis to investigate inter- and intra-observer variability in manual measurements of tumour burdens on CT scanning according to the RECIST guidelines.

Section snippets

Search strategy

Two authors (S.H.Y. and K.W.K.) independently performed OVID/MEDLINE and EMBASE database literature searches to identify relevant publications on observer-related variability of RECIST measurements using CT scans. Search terms included keywords relating to ‘tumor’, ‘measurement’, ‘variability’, and ‘CT’. Searches were limited to English-language publications and human studies (Supplemental Table 1). The search was current as of March 2014 and supplemented by screening the bibliographies of

Literature search

Of 9231 references identified during our initial database search, 11 studies were ultimately included in our analysis [15], [16], [20], [21], [22], [23], [24], [25], [26], [27], [28] (Table 1, Fig. 1 and Supplemental Table 2).

There were nine studies (eight inter-observer and five intra-observer) [15], [16], [20], [21], [22], [23], [24], [25], [26] evaluating observer variability in measuring the unidimensional longest diameter of a single lesion and three studies (three inter-observer and two

Discussion

Our meta-analysis revealed that inter-observer variability occurs in both measuring the tumour burden and calculating the interval change for single lesions, including whether the 20% RECIST cut-off value for progressive disease has been exceeded. Measurement variability decreased when a single observer measured tumour burden and when multiple lesions were measured. The single-observer results are consistent with those of previous studies, while the decrease in variability with multiple lesions

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (no. 2010-0028631).

Conflict of interest statement

Seokyung Hahn had received consulting honorarium from the Roche and research funding from the Novartis not related with this study. Other authors have no conflicts of interest to declare.

References (43)

E.A. Eisenhauer et al.
New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1)
Eur J Cancer
(2009)
O. Amit et al.
Blinded independent central review of progression in cancer clinical trials: results from a meta-analysis
Eur J Cancer
(2011)
M. Nishino et al.
CT tumor volume measurement in advanced non-small-cell lung cancer: performance characteristics of an emerging clinical tool
Acad Radiol
(2011)
C.S. van Kessel et al.
Semi-automatic software increases CT measurement accuracy but not response classification of colorectal liver metastases after chemotherapy
Eur J Radiol
(2012)
J. Dinkel et al.
Inter-observer reproducibility of semi-automatic tumor diameter measurement and volumetric analysis in patients with lung cancer
Lung Cancer
(2013)
B. Zhao et al.
Exploring intra- and inter-reader variability in uni-dimensional, bi-dimensional, and volumetric measurements of solid tumors on CT scans reconstructed at different slice intervals
Eur J Radiol
(2013)
A.M. Wulff et al.
Volumetric response classification in metastatic solid tumors on MSCT: initial results in a whole-body setting
Eur J Radiol
(2013)
M. Mazumdar et al.
A theoretical approach to choosing the minimum number of multiple tumors required for assessing treatment response
J Clin Epidemiol
(2005)
C.S. Moskowitz et al.
A simulation study to evaluate the impact of the number of lesions measured on response assessment
Eur J Cancer
(2009)
H.S. Kim et al.
Tumor response assessment by measuring the single largest lesion per organ in patients with advanced non-small cell lung cancer
Lung Cancer
(2014)

J.S. Gill et al.

Relation between initial blood-pressure and its fall with treatment

Lancet

(1985)

P. Therasse et al.

New guidelines to evaluate the response to treatment in solid Tumors

J Natl Cancer Inst

(2000)

R.L. Korn et al.

Overview: progression-free survival as an endpoint in clinical trials with solid tumors

Clin Cancer Res

(2013)

C. Suzuki et al.

Interobserver and intraobserver variability in the response evaluation of cancer therapy according to RECIST and WHO-criteria

Acta Oncol

(2010)

C. Huanyu et al.

Evaluation of blinded independent central review of tumor progression in oncology clinical trials: a meta-analysis

Ther Innov Regul Sci

(2013)

P. Thiesse et al.

Response rate accuracy in oncology trials: reasons for interobserver variability. Groupe Francais d'Immunotherapie of the Federation Nationale des Centres de Lutte Contre le Cancer

J Clin Oncol

(1997)

J.J. Erasmus et al.

Interobserver and intraobserver variability in measurement of non-small-cell carcinoma lung lesions: implications for assessment of tumor response

J Clin Oncol

(2003)

T. Shao et al.

Use and misuse of waterfall plots

J Natl Cancer Inst

(2014)

E.L. Korn et al.

Measurement error in the timing of events: effect on survival analyses in randomized clinical trials

Clin Trials

(2010)

S. Hong et al.

Attenuation of treatment effect due to measurement variability in assessment of progression-free survival

Pharm Stat

(2012)

D.C. Sullivan et al.

The imaging viewpoint: how imaging affects determination of progression-free survival

Clin Cancer Res

(2013)

Cited by (65)

Challenges, Complexities, and Considerations in the Design and Interpretation of Late-Phase Oncology Trials
2023, Seminars in Radiation Oncology
Optimal management of cancer patients relies heavily on late-phase oncology randomized controlled trials. A comprehensive understanding of the key considerations in designing and interpreting late-phase trials is crucial for improving subsequent trial design, execution, and clinical decision-making. In this review, we explore important aspects of late-phase oncology trial design. We begin by examining the selection of primary endpoints, including the advantages and disadvantages of using surrogate endpoints. We address the challenges involved in assessing tumor progression and discuss strategies to mitigate bias. We define informative censoring bias and its impact on trial results, including illustrative examples of scenarios that may lead to informative censoring. We highlight the traditional roles of the log-rank test and hazard ratio in survival analyses, along with their limitations in the presence of nonproportional hazards as well as an introduction to alternative survival estimands, such as restricted mean survival time or MaxCombo. We emphasize the distinctions between the design and interpretation of superiority and noninferiority trials, and compare Bayesian and frequentist statistical approaches. Finally, we discuss appropriate utilization of phase II and phase III trial results in shaping clinical management recommendations and evaluate the inherent risks and benefits associated with relying on phase II data for treatment decisions.
Appropriate use of morphological imaging for assessing treatment response and disease progression of neuroendocrine tumors
2023, Best Practice and Research: Clinical Endocrinology and Metabolism
Neuroendocrine tumors (NETs) are relatively rare neoplasms displaying heterogeneous clinical behavior, ranging from indolent to aggressive forms. Patients diagnosed with NETs usually receive a varied array of treatments, including somatostatin analogs, locoregional treatments (ablation, intra-arterial therapy), cytotoxic chemotherapy, peptide receptor radionuclide therapy (PRRT), and targeted therapies. To maximize therapeutic efficacy while limiting toxicity (both physical and economic), there is a need for accurate and reliable tools to monitor disease evolution and progression and to assess the effectiveness of these treatments. Imaging morphological methods, primarily relying on computed tomography (CT) and magnetic resonance imaging (MRI), are indispensable modalities for the initial evaluation and continuous monitoring of patients with NETs, therefore playing a pivotal role in gauging the response to treatment. The primary goal of assessing tumor response is to anticipate and weigh the benefits of treatments, especially in terms of survival gain. The World Health Organization took the pioneering step of introducing assessment criteria based on cross-sectional imaging. This initial proposal standardized the measurement of lesion sizes, laying the groundwork for subsequent criteria. The Response Evaluation Criteria in Solid Tumors (RECIST) subsequently refined and enhanced these standards, swiftly gaining acceptance within the oncology community. New treatments were progressively introduced, targeting specific features of NETs (such as tumor vascularization or expression of specific receptors), and achieving significant qualitative changes within tumors, although associated with minimal or paradoxical effects on tumor size. Several alternative criteria, adapted from those used in other cancer types and focusing on tumor viability, the slow growth of NETs, or refining the existing size-based RECIST criteria, have been proposed in NETs. This review article aims to describe and discuss the optimal utilization of CT and MRI for assessing the response of NETs to treatment; it provides a comprehensive overview of established and emerging criteria for evaluating tumor response, along with comparative analyses. Molecular imaging will not be addressed here and is covered in a dedicated article within this special issue.
Evaluation of total tumor volume reduction ratio in initially unresectable colorectal liver metastases after first-line systemic treatment
2023, European Journal of Radiology
Total tumor volume (TTV) may play an essential role in the estimation of tumor burden. This study is aimed to investigate the clinical value of the reduction ratio of TTV as a valuable indicator of clinical outcomes in patients with colorectal liver metastases (CRLM).
A total of 240 initially unresectable CRLM patients who underwent first-line systemic treatment were enrolled in this study. TTV at baseline and at the end of first-line treatment were assessed using a three-dimensional reconstruction system according to CT or MRI images. Survival was evaluated using Kaplan-Meier analysis and compared using Cox proportional hazard ratios (HR).
A total of 212 (88.3%) patients achieved tumor regression with a median reduction ratio of TTV of 86.0%. An increasing reduction ratio of TTV was associated with a gradually ascending successful conversion outcome. Patients with a reduction ratio >86.0% had better survival than those with a reduction ratio 0–86.0% or <0 (5-year overall survival (OS) rates, 64.4% vs. 44.9% vs. 23.5%, P < 0.001; 5-year progression-free survival (PFS) rates, 36.3% vs. 28.2% vs. 6.5%, P < 0.001). Multivariate analysis indicated that the reduction ratio of TTV ≤ 86.0% (OR [95%CI]: 4.956 [2.654–9.253], P < 0.001) was an independent factor for conversion failure outcome. Cox analyses revealed that the reduction ratio of TTV ≤ 86.0% was an independent factor for both unfavorable OS (HR [95%CI]: 2.216 [1.332–3.688], P = 0.002) and PFS (HR [95%CI]: 2.023 [1.376–2.974], P < 0.001).
The reduction ratio of TTV was an effective indicator for conversion outcome and long-term prognosis in patients with initially unresectable CRLM after first-line systemic treatment.
Primary cutaneous lymphoma: recommendations for clinical trial design and staging update from the ISCL, USCLC, and EORTC
2022, Blood
Citation Excerpt :
If determination of nodal response is by LN size, for nodes ≤1.5 cm in LDi prior to baseline, they must increase to >1.5 cm and increase by ≥50% from PPD nadir to be considered PD unless biopsy proves otherwise. This definition of PD specifically prevents the 10% to 20% standard error in imaging47,48,49 from creating an erroneous response of PD in LNs of 1 to 1.4 cm LDi that show minimal enlargement. If determination of nodal response in PET-CT is by metabolic score, the recommendations of the Lugano classification, or any published update that is accepted by the Tri-Societies, should be used.
The number of patients with primary cutaneous lymphoma (PCL) relative to other non-Hodgkin lymphomas (NHLs) is small and the number of subtypes large. Although clinical trial guidelines have been published for mycosis fungoides/Sézary syndrome, the most common type of PCL, none exist for the other PCLs. In addition, staging of the PCLs has been evolving based on new data on potential prognostic factors, diagnosis, and assessment methods of both skin and extracutaneous disease and a desire to align the latter with the Lugano guidelines for all NHLs. The International Society for Cutaneous Lymphomas (ISCL), the United States Cutaneous LymphomaConsortium (USCLC), and the Cutaneous Lymphoma Task Force of the European Organization for the Research and Treatment of Cancer (EORTC) now propose updated staging and guidelines for the study design, assessment, endpoints, and response criteria in clinical trials for all the PCLs in alignment with that of the Lugano guidelines. These recommendations provide standardized methodology that should facilitate planning and regulatory approval of new treatments for these lymphomas worldwide, encourage cooperative investigator-initiated trials, and help to assess the comparative efficacy of therapeutic agents tested across sites and studies.
Reducing number of target lesions for RECIST1.1 to predict survivals in patients with advanced non-small-cell lung cancer undergoing anti-PD1/PD-L1 monotherapy
2022, Lung Cancer
Citation Excerpt :
The number of lesions measured accounts for one of the sources of intra- and inter-observer variability in response evaluation [25]. Assessing only one deposit seemed to provide adequate prediction of OS, as shown in our work and other studies [13,26], yet we found that measuring one target lesion yielded lower intra-individual and inter-method agreement than measuring two or more target lesions, as found in other studies [27,28]. Taken together, our data support the rationale of a two-lesion-based response assessment and the consideration of a revision for RECIST guidelines to improve convenience and feasibility of RECIST.
The Response Evaluation Criteria in Solid Tumors (RECIST) version 1.1 provides conventional and standardized response assessment for multiple solid tumors. We investigated the smallest number of target lesions that can be measured without compromising response categorization and survival prediction in patients with advanced non-small-cell lung cancer (aNSCLC) undergoing anti-PD-1/PD-L1 monotherapy.
125 aNSCLC patients with at least two measurable lesions undergoing PD-1/PD-L1 inhibitor treatment were retrospectively studied. Tumor measurements allowing up to two lesions per organ and five lesions in total were reviewed. Inter-individual agreement and κ values for inter-method concordance on response status were evaluated based on up to five target lesions versus the largest one through four lesions. C-index was calculated to evaluate the prognostic accuracy of response categorization based on the selected number of target lesions for predicting overall survival (OS). Cox regression analysis was conducted for survival analysis.
The highly consistent response assignment (99.2%) could be obtained when measuring the largest two lesions versus up to five lesions. Using the largest two through four lesions produced κ values of 0.986, 1.000 and 1.000 for response assessment, values significantly higher than those obtained when measuring the largest single lesion (κ = 0.850). C-index for overall survival (OS) was similar when assessing the largest one through five lesions, ranging from 0.646 to 0.654. Cox regression analyses showed that radiological response significantly predicted OS, irrespective of the number of target lesions selected.
Reducing the number of target lesions does not affect OS prediction in aNSCLC patients treated with anti-PD-1/PD-L1 therapy. Considering the high intra-individual and inter-method concordance, using the largest two lesions in total is proposed to assess response.
An omic and multidimensional spatial atlas from serial biopsies of an evolving metastatic breast cancer
2022, Cell Reports Medicine
Citation Excerpt :
All lesions meeting these criteria were recorded both on FDG-PET/CT and combined with long axis measurements (e.g., liver, splenic, lung lesions) and long and short-axis measures (lymph nodes) at all time points during the care of the patient. Variability in the measurement of the long axis of each lesion was estimated to be about 20%.85 Change in tumor burden was assessed for each phase of treatment using RECIST 1.1 criteria.8
Mechanisms of therapeutic resistance and vulnerability evolve in metastatic cancers as tumor cells and extrinsic microenvironmental influences change during treatment. To support the development of methods for identifying these mechanisms in individual people, here we present an omic and multidimensional spatial (OMS) atlas generated from four serial biopsies of an individual with metastatic breast cancer during 3.5 years of therapy. This resource links detailed, longitudinal clinical metadata that includes treatment times and doses, anatomic imaging, and blood-based response measurements to clinical and exploratory analyses, which includes comprehensive DNA, RNA, and protein profiles; images of multiplexed immunostaining; and 2- and 3-dimensional scanning electron micrographs. These data report aspects of heterogeneity and evolution of the cancer genome, signaling pathways, immune microenvironment, cellular composition and organization, and ultrastructure. We present illustrative examples of how integrative analyses of these data reveal potential mechanisms of response and resistance and suggest novel therapeutic vulnerabilities.

View all citing articles on Scopus

View full text

ReviewObserver variability in RECIST-based tumour burden measurements: a meta-analysis

Highlights

Abstract

Background

Methods

Results

Conclusion

Introduction

Section snippets

Search strategy

Literature search

Discussion

Funding

Conflict of interest statement

Eur J Cancer

Eur J Cancer

Acad Radiol

Eur J Radiol

Lung Cancer

Eur J Radiol

Eur J Radiol

J Clin Epidemiol

Eur J Cancer

Lung Cancer

Lancet

New guidelines to evaluate the response to treatment in solid Tumors

J Natl Cancer Inst

Overview: progression-free survival as an endpoint in clinical trials with solid tumors

Clin Cancer Res

Interobserver and intraobserver variability in the response evaluation of cancer therapy according to RECIST and WHO-criteria

Acta Oncol

Evaluation of blinded independent central review of tumor progression in oncology clinical trials: a meta-analysis

Ther Innov Regul Sci

Response rate accuracy in oncology trials: reasons for interobserver variability. Groupe Francais d'Immunotherapie of the Federation Nationale des Centres de Lutte Contre le Cancer

J Clin Oncol

Interobserver and intraobserver variability in measurement of non-small-cell carcinoma lung lesions: implications for assessment of tumor response

J Clin Oncol

Use and misuse of waterfall plots

J Natl Cancer Inst

Measurement error in the timing of events: effect on survival analyses in randomized clinical trials

Clin Trials

Attenuation of treatment effect due to measurement variability in assessment of progression-free survival

Pharm Stat

The imaging viewpoint: how imaging affects determination of progression-free survival

Clin Cancer Res

Review
Observer variability in RECIST-based tumour burden measurements: a meta-analysis