Main

Bladder cancer (BCa) is the most common malignancy of the urinary tract with an estimated 73 510 new cases and 14 880 deaths in 2012 in the United States (Siegel et al, 2013). At initial diagnosis, >70% of patients have non-muscle-invasive bladder cancer (NMIBC), which is managed by transurethral resection of the bladder (TURB) with or without intravesical therapy (Babjuk et al, 2011). Recurrence rates for NMIBC range from 50% to 70%, and 10–15% of patients experience disease progression to muscle-invasive disease over a 5-year period (Sylvester et al, 2006; Fernandez-Gomez et al, 2009; Babjuk et al, 2011). These statistics underscore the need for a continuous, costly follow-up, making BCa the most expensive malignancy per patient (Stenzl et al, 2008).

To predict the short-term and long-term probabilities of disease recurrence and progression, the European Organization for Research and Treatment of Cancer (EORTC) Genitourinary (GU) group has developed a scoring system and risk tables (Sylvester et al, 2006), built on data from 2596 patients diagnosed with Ta/T1 tumours, who were randomised in seven previous EORTC-GU group trials. The proposed scoring system was based on the six most relevant clinical and pathological predictors of outcomes, such as tumour stage and grade, number of tumours, size, carcinoma in situ (cis), and prior recurrence rate. The main limitation of this scoring system was the low number of patients treated with bacillus Calmette Guérin (BCG). To overcome this limitation, the Club Urológico Español de Tratamiento Oncológico (CUETO) developed a scoring model, which predicts the short- and long-term probability of disease recurrence and progression in BCG-treated patients (n=1062) (Fernandez-Gomez et al, 2009). Despite their potential usefulness in daily practice, to date, few studies have externally validated these models (Fernandez-Gomez et al, 2011; Hernandez et al, 2011).

The aim of the present multicentre study was to externally validate the EORTC risk tables and the CUETO scoring model for predicting disease recurrence and progression in NMIBC patients with or without BCG therapy.

Patients and methods

Patients

The study was performed with the approval and oversight of the institutional review board at each institution, with all participating sites providing the necessary data-sharing agreements before initiation. Templates for NMIBC data collection were sent out to eight international centres. In total, 4784 patients underwent TURB for NMIBC between 2000 and 2007. Patients who had a pure pTis disease (n=75) were also excluded, as this group was too small for separate analyses, leaving a final cohort of 4689 patients. The data of these patients were frozen for analyses in January 2011.

TURB and instillation therapy

All patients had cystoscopically proven UCB and underwent complete TURB according to guideline recommendations (Babjuk et al, 2011). A re-resection was performed according to guideline recommendations and at surgeons’ discretion within 2–6 weeks after initial treatment based on pathologic and intraoperative findings. The first adjuvant instillation was given within 7–21 days of the diagnostic TURB and repeated once weekly for 6 weeks. In all, 51% of patients (n=2405) received a single immediate postoperative instillation of chemotherapy (essentially mitomycin C). All BCG patients were proposed some form of maintenance therapy (at least 1 year). None of the patients had upper tract urothelial carcinoma (UTUC) at diagnosis.

Pathologic evaluation

All surgical specimens were processed according to standard pathologic procedures. Genitourinary pathologists assigned tumour grade according to the 1973 World Health Organisation grading system. Pathological stage was reassigned according to the 2002 American Joint Committee on Cancer TNM staging system. The presence of concomitant cis was defined as the presence of cis in conjunction with another tumour.

Follow-up regimen

Patients were followed every 3–6 months during the first 2 years after TURB, biannually up to 5 years, and annually thereafter (Babjuk et al, 2011; Burger et al, 2012). Follow-up consisted of a history, physical examination, urinary cytology, cystoscopy, and biopsy of suspicious lesions. Radiographic evaluation of the upper urinary tract to rule out UTUC was done at NMIBC diagnosis in every patient and yearly or in case of disease recurrence or suspicion, such as positive cytology during follow-up. When disease recurrence was detected, the tumour was resected. When no evidence of cancer was seen but urinary cytology was positive, bladder and prostatic urethra biopsies in addition to upper urinary tract work-up were performed. Disease recurrence was defined as first tumour relapse in the bladder regardless of tumour stage. Disease progression was defined as tumour relapse at tumour stage T2 or higher in the bladder or in the prostatic urethra. In case of death, cause of death was determined by treating physicians, by chart review corroborated by death certificates, or by death certificates alone (Rink et al, 2012). Tumour recurrence in the upper urinary tract was not considered as tumour recurrence but rather as a second primary tumour.

Statistical analyses

The previously published EORTC model incorporated the number of tumours (single, 2–7 or 8), tumour size (<3 cm or 3), prior recurrence rate (primary, 1 recurrence per year, >1 recurrence per year), T stage (Ta or T1), concomitant carcinoma in situ, and grade (G1, G2, or G3). Each of these variables received a scoring value as previously described (Sylvester et al, 2006). Patient scores were then categorized into risk groups and a probability of disease recurrence and progression were given for both end points of 1 and 5 years. We applied this scoring methodology to our cohort to calculate each patient’s EORTC risk score. Using Kaplan–Meier methods, we then calculated each patient’s risk for disease recurrence and progression at the same time points of 1 and 5 years. To evaluate the discrimination of the EORTC risk tables, we created Cox proportional hazard regression models for time to recurrence and time to progression. We incorporated the patients calculated risk score as a predictor into both of these models and then calculated the concordance indexes. We compared the concordance index of our models with the concordance index reported for the 5-year EORTC model. Unfortunately, the methodology for calculating separate 1-year and 5-year concordance indexes was not reported in the EORTC paper, so we could not compare our models with both time points. Calibration plots were used to compare the EORTC predicted risks with the actual risks seen in our cohort. Calibration plots usually show the predicted versus actual risk among different risk groups of patients. Our calibration plots were slightly modified to show the EORTC predicted risk vs the actual risk for each individual risk score and not just by the risk score categories.

We then repeated our analyses for the CUETO risk tables. The previously published CUETO model incorporated gender, age (<60, 60–70, >70), recurrent tumour, number of tumours (3 or >3), T stage (Ta or T1), concomitant carcinoma in situ, and grade (G1, G2, and G3). Similarly to the EORTC method, each of these variables received a risk scoring value, patients were categorised according to this value, and probabilities of recurrence and progression were given as previously described (Fernandez-Gomez et al, 2009). Concordance indexes and calibrations plots were again utilised to evaluate the accuracy of the CUETO risk tables. Unfortunately, it was not reported how separate c-indexes were calculated at 1 and 5 years, so the 1-year c-index was used for our comparisons.

We repeated the above analyses in the following subgroups of patients to explore the robustness of both risk tables: treated with BCG, not treated with BCG, primary and recurrent tumours. All statistical analyses were performed using Stata 12.0 (StataCorp, College Station, TX, USA).

Results

Patient’s clinico-pathologic characteristics and oncological outcomes

Table 1 shows the clinico-pathologic characteristics of the 4689 patients included in the study. The BCG immunotherapy was administered to 538 (11%) patients. In all, 2110 patients experienced disease recurrence and 591 patients experienced disease progression. Median follow-up for patients who did not experience disease recurrence was 46 months and was 57 months for those who did not experience disease progression.

Table 1 Clinical characteristics and pathologic features of the 4689 patients with non-muscle-invasive urothelial carcinoma of the bladder

External validation of the EORTC and the CUETO models to predict disease recurrence and progression in the entire cohort

The application of the EORTC risk groupings to our cohort resulted in c-indexes for disease recurrence and progression of 0.597 and 0.662, respectively. For comparison, the published EORTC c-indexes for disease recurrence and progression at 1 year were 0.660 and 0.740, respectively (Table 2A). When the CUETO risk groupings were applied to our cohort, the c-indexes for disease recurrence and progression were 0.523 and 0.616, respectively. For comparison, the published CUETO c-indexes were 0.636 and 0.687 for disease recurrence and progression at 1 year, respectively (Table 2B).

Table 2a C-indexes for application of EORTC model in the entire cohort of 4689 patients with non-muscle-invasive urothelial carcinoma of the bladder
Table 2b C-indexes for application of CUETO model in the entire cohort of 4689 patients with non-muscle-invasive urothelial carcinoma of the bladder

Calibration plots were designed to compare the predicted (red) and the actual risk (black) among each individual risk score and for each end point (at 1 and 5 years, respectively). Modified calibration plots were used due to the defined risk groupings used in the both models. Calibrations plots for disease recurrence are displayed in Figure 1A and B (EORTC) and Figure 2A and B (CUETO), respectively. With regard to disease recurrence, the EORTC model had fairly good calibration in low-risk patients (score=0). In intermediate-risk patients (score=1–9), the EORTC model was hindered by the heterogeneity of this risk group (underestimation and overestimation). For example, intermediate-risk patients with an EORTC risk score from 1 to 9 were included in the same group but had a 1-year risk of disease recurrence ranging from 20% to 50%. In high-risk patients (score>10), the EORTC model overestimated the risk of disease recurrence, both at 1 and at 5 years. The CUETO model had poor calibration for disease recurrence, with underestimation of the risk for low-risk patients (score 0–6) and overestimation for high-risk patients (score>10). The calibration for intermediate-risk patients (score=7–9) was satisfactory.

Figure 1
figure 1

(A D ) Calibration plots for the EORTC model in the entire cohort of 4689 patients with non-muscle-invasive urothelial carcinoma of the bladder (Grey – EORTC risk, Black – actual risk). (A) One-year risk of recurrence. (B) Five-year risk of recurrence. (C) One-year risk of progression. (D) Five-year risk of progression.

Figure 2
figure 2

(A D ) Calibration plots for the CUETO model in the entire cohort of 4689 patients with non-muscle-invasive urothelial carcinoma of the bladder (Grey – CUETO risk; Black – actual risk). (A) One-year risk of recurrence. (B) Five-year risk of recurrence. (C) One-year risk of progression. (D) Five-year risk of progression.

Calibration plots for the EORTC and CUETO models for disease progression are displayed in Figures 1C, D, 2C, and D, respectively. Both models resulted in overestimation of the risk of disease progression in high-risk patients (EORTC score>7 and CUETO score >10) at 1 and 5 years, respectively. Moreover, both models resulted in overestimation of the risk of disease progression in intermediate-risk patients (EORTC score=2–6 and CUETO score=7–9) at 5 years.

Application to BCG patients

When including only patients who were treated with BCG (n=538), similar results were found for both models. The c-indexes for the EORTC model when restricted to only BCG-treated patients were even lower than those from the development cohorts. The EORTC model had c-indexes of 0.554 and 0.576 for disease recurrence and progression, respectively (Table 3). Conversely, the discrimination of the CUETO model increased with c-indexes of 0.597 and 0.645 for disease recurrence and progression, respectively (Table 3).

Table 3 C-indexes of the EORTC and the CUETO model when applied to BCG only, no BCG, primary and recurrent tumours, respectively

The calibration plots for these patients indicate even poorer calibration than the full cohort. The EORTC model overestimated both risks of disease recurrence and progression in high-risk patients at 1 year and in intermediate- and high-risk patients at 5 years (Figure 3A–D). The CUETO model was more accurate concerning prediction of disease recurrence at 1 year, while overestimating this risk at 5 years in high-risk patients (score>10) (Figure 4A and B). With regard to disease progression, it overestimated this risk in intermediate (score=7–9) and high-risk patients (score>10), both at 1 and at 5 years (Figure 4C and D).

Figure 3
figure 3

(A D ) Calibration plots for the EORTC model in 538 patients with non-muscle-invasive urothelial carcinoma of the bladder treated with BCG (Grey – EORTC risk; Black – actual risk). (A) One-year risk of recurrence. (B) Five-year risk of recurrence. (C) One-year risk of progression. (D) Five-year risk of progression.

Figure 4
figure 4

(A D ) Calibration plots for the CUETO model in 538 patients with non-muscle-invasive urothelial carcinoma of the bladder treated with BCG (Grey – CUETO risk; Black – actual risk). ( A ) One-year risk of recurrence. (B) Five-year risk of recurrence. (C) One-year risk of progression. (D) Five-year risk of progression.

When including only patients who were not treated with BCG (n=4151), the discrimination of the EORTC tables improved for both disease recurrence and progression (c-indexes of 0.603 and 0.672, respectively; Table 3). The discrimination of the CUETO model decreased (c-indexes of 0.522 and 0.626 for disease recurrence and progression, respectively; Table 3).

Application to primary and recurrent tumour patients

Subgroup analyses in patients with primary tumours or recurrent tumours only revealed no large differences compared with the data in the combined group (Table 3).

Discussion

Non-muscle-invasive bladder cancer is a common disease with highly variable behaviour and outcomes. Many factors predicting disease recurrence and progression have been reported in this disease to allow evidence-based risk stratification with regard to administration of therapy and follow-up scheduling. The EORTC and CUETO models were born out of this clinical need to individualise and optimise therapy for NMIBC patients. These tables are indeed the best currently existing prognostic models, and their use, as recommended by various guidelines (Babjuk et al, 2011; Burger et al, 2012) has become the standard of care. External validation of a prognostic model on a new data set of more contemporaneous patients is crucial to assess its generalisability. Frequently, the performance of a predictive model is typically overestimated in the original data used to develop the model. Therefore, we set out to test the discrimination and calibration of these models in an external multicentre cohort. Despite the ability of a prognostic model to improve the decision-making process, the validity of predictive models is based both on the agreement between observed and predicted probabilities of an event (i.e., calibration) and its ability to distinguish subjects with different prognoses (i.e., discrimination).

We found that the EORTC tables exhibited a poor discrimination in predicting both disease recurrence and progression (Seo et al, 2010; Fernandez-Gomez et al, 2011). Similarly to two previous external validations (Fernandez-Gomez et al, 2011; Hernandez et al, 2011), we found that the EORTC tables overestimated these risks in high-risk patients. This finding could be explained by the major limitation of the EORTC series: the low number of patients who received adjuvant BCG instillations indeed the model was constructed based on the data from EORTC trials testing the efficacy of various intravesical chemotherapy regimens in the post-TUR setting. Intravesical chemotherapies are considered to be inferior to BCG therapy for high-risk tumours (Malmstrom et al, 2009). Only a single EORTC trial, used for the EORTC tables, included patients managed with induction intravesical BCG, albeit without any form of maintenance therapy. As BCG therapy has been shown to be more effective in preventing disease recurrence (and may delay disease progression) than is intravesical chemotherapy, the overestimation by the EORTC tables of the risk of disease recurrence and progression was expected (Sylvester et al, 2002; Bohle and Bock, 2004; Malmstrom et al, 2009). Moreover, when we restricted the analyses to the patients who did not receive BCG immunotherapy, the discrimination of the EORTC tables increased for both disease recurrence and progression.

Due to the limitations in the designing of the EORTC tables and to improve prediction of outcomes in BCG-treated patients, the CUETO group developed a separate scoring model. Based on calculations of concordances indices, the authors found that disease recurrence and progression predictions were lower than those reported by Sylvester et al (2006). We performed the first, to date, external validation of this scoring model. While, it was not designed for patients without BCG, we evaluated its performance in predicting disease recurrence and progression in our overall cohort of patients, which included only 11% of BCG-treated patients. Not surprisingly, the discriminative ability of this model was significantly lower than that of the original study (Sylvester et al, 2006). Moreover, this scoring model exhibited poor calibration for both disease recurrence and progression. To better evaluate its discrimination, we restricted our analyses to BCG-treated patients. While the discrimination of this score improved, we found that it still overestimated the risk of disease progression. Regarding the prediction of disease progression, there was no definite difference between the values presented by the EORTC risk tables and the CUETO scoring model. Both overestimated the progression rates in our cohort. Several studies that have compared BCG therapy and intravesical chemotherapy failed to find a difference between BCG therapy and intravesical chemotherapy in preventing disease progression (Lundholm et al, 1996; Shelley et al, 2004). Additional factors not included in the EORTC or the CUETO models could be added to new prognostic models to enhance their usefulness. Lymphovascular invasion has been proven to be associated with poorer outcomes (Kunju et al, 2008; Streeper et al, 2009). Moreover, molecular factors or grading could also be of interest (van Rhijn et al, 2010).

Our study is not devoid of limitations such as its multicentre and retrospective study design. We did not control for treatment delay, effect of repeat TURB, quality of the TURB, and prognostic factors such as lymphovascular invasion. In particular, given the heterogeneous treatment pattern across centres, it is difficult to ascertain whether patients and which patients received re-TURB, a treatment that is instrumental in the proper management of some high-risk NMIBC. Similarly, we could not ascertain the number of patients who completed adjuvant intravesical instillation therapy. These limitations could have impacted the occurrence of events such as disease recurrence and progression. In addition, we could not adjust for the number and experience of surgeons and pathologists at each institution; no central pathology review was performed, possibly leading to differences in patients’ treatment strategies. However, all surgeons and pathologists operated at tertiary care centres with experience in UCB. Comorbidities might also have influenced the decision making regarding surgical therapy, introducing a selection bias. However, this series comprise a contemporary and large sample size cohort, which are definitely strengths of the study.

Conclusion

The EORTC risk tables and the CUETO scoring system exhibit a poor discrimination for both disease recurrence and progression in NMIBC patients. These models overestimated the risk of disease recurrence and progression in high-risk patients. These overestimations remained in BCG-treated patients, especially for the EORTC tables. These results underline the need for improving our current predictive tools. Our study is limited by its retrospective and multi-institutional design.