Main

Biliary tract cancer is a rare disease that includes intrahepatic and extrahepatic cholangiocarcinoma, gallbladder carcinoma, and ampullary carcinoma, and patients with unresectable or recurrent forms of these cancers are generally treated with palliative chemotherapy. The key anti-cancer drug for advanced biliary tract cancer (ABTC) is gemcitabine. Recently, gemcitabine+cisplatin (GP) has been recognised as a standard first-line treatment for ABTC following the results of a randomised phase III trial (ABC-02 trial) that compared GP with gemcitabine alone (Valle et al, 2010). Despite the favourable outcome of that trial, the prognosis of patients with ABTC remains poor, with a median survival time of ∼12 months. Accordingly, new drugs or combination therapies should be studied with the intent to improve survival.

Although overall survival (OS) is considered a gold-standard hard end point of trials assessing oncologic drugs, a validated shorter-term surrogate end point for OS would reduce the sample size, cost, and trial duration needed to demonstrate the benefit of a novel drug. A previous pooled analysis of 104 clinical trials for ABTC prior to establishing GP as a standard chemotherapy regimen was conducted to identify superior treatment regimens (Eckel and Schmid, 2007). This study identified the time to progression (TTP) was significantly associated with OS as a secondary objective. However, the analysis included mostly small sample size and non-randomised trials. In the present study, we investigated various efficacy end points as potential surrogates for OS in studies of first-line chemotherapy for ABTC in a limited pool of randomised trials to allow the accurate estimation of treatment effects.

Materials and methods

Registration

This study is registered in the PROSPERO database (CRD42014014526) and was conducted according to the Preferred Reporting Items for Systemic Reviews and Meta-Analysis (PRISMA) statement.

Selection of eligible studies

Eligible studies were randomised phase II and III trials involving patients with ABTC who were treated with first-line chemotherapy. Studies involving the following were excluded: hepatic infusional chemotherapy; combination therapy with a local control therapy such as radiotherapy, surgery or photodynamic therapy, or non-English language reports.

Search strategy

Trials involving ABTC that were published up to February 2015 were identified through a systemic search of the PubMed database, using the keywords ‘biliary tract neoplasms’ OR ‘bile duct neoplasms’ OR ‘gallbladder neoplasms’ OR ‘cholangiocarcinoma’ (all fields) AND ‘chemotherapy’ (all fields) AND ‘clinical trial’ (ptyp). A manual search was also performed for abstracts presented at the annual meetings of the American Society of Clinical Oncology (ASCO), Gastrointestinal Cancers Symposium, European Society of Medical Oncology, and World Congress of Gastrointestinal Cancer up to February 2015.

Data extraction

Two authors (TM and YY) independently extracted information from the selected literature using predefined data abstract forms. The following details were extracted: published or presented year, number of enroled patients, primary end point, chemotherapy regimen, and tumour location (intrahepatic cholangiocarcinoma, extrahepatic cholangiocarcinoma, gallbladder carcinoma, and ampullary carcinoma). The various efficacy parameters assessed as potential surrogate end points for and evaluated for correlations with OS were progression-free survival (PFS; defined as the time to initial progression or death by any cause), TTP (defined as the time to initial progression or cancer-related death), response rate (RR; defined as the rate of complete and partial responses), and disease control rate (DCR; defined as the rate of complete and partial responses and stable disease). The following efficacy data were collected: median values of OS, PFS, and TTP; hazard ratios of OS, PFS, and TTP; and RR and DCR. The control arm in each trial was determined by the consensus of three investigators (TM, YY, and TY) for randomised phase II studies with selection designs wherein all arms were considered experimental treatments.

Statistical analysis

The coefficient of determination (r2) was used to evaluate correlations between the treatment effects on surrogate efficacy end points and treatment effects on OS. Treatment effects on OS were analysed using a linear regression model weighted according to the study sample size of each trial. This model included the treatment effect on each surrogate end point as an exploratory variable. The precision levels of predictions based on this model were demonstrated by 95% confidence intervals (CIs) around the regression line. The treatment effects on PFS and OS were analysed via conversion to a logarithmic scale. The ratios of the median PFS, TTP, and OS between the control and experimental arms in each trial were used to summarise treatment effects because the hazard ratios (HRs) were not always reported. For RR and DCR, the ratios between the control arm and experimental arm were evaluated. TTP and failure-free survival were reclassified as PFS (Schinzari et al, 2009; Sasaki et al, 2013). A ratio of less than 1 denotes a favourable result for PFS, RR, DCR, and OS in the experimental arm.

Several additional analyses, not pre-specified, were performed. The correlations between treatment effects on surrogate efficacy end points and treatment effects on OS among trials with gemcitabine-containing therapies and with targeted agents were analysed. Sensitivity analyses were also performed to support the correlation observed between the median PFS ratios and median OS ratios in all trials. The surrogate threshold effect (STE) on PFS was calculated in all trials and subgroups. The STE is derived from a vertical line that transects the upper 95% predictive limit and a median OS ratio equal to 1; this represents the minimum PFS effect that could predict a positive OS effect (Burzykowski and Buyse, 2006). The predicted median PFS ratio associated with a 20% improvement in the median OS was calculated from the slope of each regression line because an improvement in the median OS of at least 20% has been generally agreed upon as a clinically meaningful outcome improvement in cancer clinical trials (Ellis et al, 2014). We used SAS software version 9.4 (SAS Institute Inc., Cary, NC, USA) for the statistical analyses.

Results

Trials included in the analysis

Among the 426 retrieved papers/abstracts, 17 randomised trials (14 phase II studies and 3 phase III studies) with 36 treatment arms were identified (Figure 1 and Table 1) (Kornek et al, 2004; Ducreux et al, 2005; Rao et al, 2005; Schinzari et al, 2009; Okusaka et al, 2010; Sharma et al, 2010; Valle et al, 2010; Lee et al, 2012; Kang et al, 2012; Morizane et al, 2013; Sasaki et al, 2013; Malka et al, 2014; Moehler et al, 2014; Valle et al, 2014; Chen et al, 2015; Leone et al, 2015; Santoro et al, 2015). A total of 2148 patients with 19 paired arms were analysed. PFS was reported in 14 trials, and TTP was reported in 2 trials. Although RR was reported in all trials, a best supportive care arm in one trial reported no response (Sharma et al, 2010). DCR was reported in 15 trials. Two trials were terminated early. A phase III trial comparing 5-fluorouracil (5-FU)+epirubicin+leucovorin with epirubicin+cisplatin+5-FU and a phase II trial comparing GP+placebo with GP+cediranib were closed because of poor accrual and new drug development cessation, respectively (Rao et al, 2005; Valle et al, 2014). The median values tended to favour the experimental arms over the standard arms with respect to the reported median PFS (5.8 months vs 4.9 months), median OS (10.1 months vs 9.5 months), RR (26.1% vs 15.5%), and DCR (71.7% vs 64.9%). A forest plot of the treatment effects on PFS and OS in trials reporting HR is shown in Figure 2. The median HR also tended to favour the experimental arms over the standard arms with respect to the reported PFS (0.80) and OS (0.82). In trials with targeted agents, however, the HRs for PFS and OS trended to favour the control arms. The median PFS and OS for gemcitabine-alone ranged widely from 3.7 to 5.0 months and 7.7 to 11.2 months, respectively. The median PFS and OS for gemcitabine+platinum combination therapies also exhibited wide distributions of 4.1–8.5 months and 9.5–12.4 months, respectively.

Figure 1
figure 1

Study selection according to PRISMA (preferred reporting items for systemic reviews) diagram. Abbreviations: ASCO=American Society of Clinical Oncology; ESMO=European Society of Medical Oncology.

Table 1 Characteristics of trials included in the analysis
Figure 2
figure 2

A forest plot of treatment effects on progression-free survival (PFS) and overall survival (OS) in trials reporting hazard ratios (HRs). Abbreviations: BSC=best supportive care; C arm=control arm; CI=confidence interval; E arm=experimental arm; FUFA=5-FU+folic acid; GEM=gemcitabine; GEMOX=gemcitabine+oxaliplatin; GP=gemcitabine+cisplatin; mGEMOX=modified gemcitabine+oxaliplatin; SP=S-1+cisplatin.

Correlation between treatment effects

Correlations between the surrogate end points and OS are summarised in Table 2. The median PFS ratio and OS ratio were moderately correlated (r2=0.66; 95% CI, 0.32–0.85, P<0.001; Figure 3A). The correlation between the RR ratio and median OS ratio in all 17 trials with 17 paired arms was weak (r2=0.29; 95% CI, 0.01–0.65, P=0.021; Figure 3B), as were the correlations between the DCR ratio and median OS ratio in 15 trials with 17 paired arms (r2=0.34; 95% CI, 0.02–0.69, P=0.011; Figure 3C).

Table 2 Weighted linear regression analyses of correlations between surrogate end points and OS
Figure 3
figure 3

Correlations between treatment effects on surrogacy end points and overall survival (OS) in all selected trials. (A) Correlation between median progression-free survival (PFS) ratios and median OS ratios. The point where the horizontal and vertical dotted line crosses indicates the surrogate threshold effect. (B) Correlation between median OS ratios and response rate (RR) ratios. (C) Correlation between median OS ratios and disease control rate (DCR) ratios. Circle size is proportional to sample size.

Fourteen trials with gemcitabine-containing therapies were identified, and correlations were analysed between the median PFS ratio and OS ratio in 15 paired arms, RR ratio and median OS ratio in 14 paired arms, and DCR ratio and median OS ratio in 14 paired arms. The median PFS ratio associated strongly with the median OS ratio (r2=0.78; 95% CI, 0.46–0.92, P<0.001; Figure 4A). However, the correlations of the RR ratio and DCR ratio with the median OS ratio were weak (Table 2). Only six trials with targeted agents were identified, and correlations were analysed between the seven paired arms of those end points and the median OS ratios. The median PFS ratio associated strongly with the median OS ratio (r2=0.78; 95% CI, 0.14–0.96, P=0.004; Figure 4B), and this correlation was stronger than that observed between the RR ratio and DCR ratio and the median OS ratio (Table 2).

Figure 4
figure 4

Correlations between median progression-free survival (PFS) ratios and median overall survival (OS) ratios. (A) Trials with gemcitabine-containing therapies. (B) Trials with targeted agents. The point where the horizontal and vertical dotted line crosses indicates the surrogate threshold effect. Circle size is proportional to sample size.

Sensitivity analysis

Several sensitivity analyses were performed to support the strong correlations observed between the median PFS ratio and median OS ratio in all trials (Table 2). When 10 trials with total sample sizes of <100 patients were excluded, an r2-value of 0.60 (95% CI, 0.02–0.92, P=0.015) was calculated between the median PFS ratio and median OS ratio in 7 trials with 8 paired arms. The r2-value was 0.64 (95% CI, 0.27–0.86, P<0.001) after excluding 2 trials that ended early. Among the 8 trials with 9 paired arms that reported HRs for both PFS and OS, the r2-value was 0.63 (95% CI, 0.07–0.91, P=0.006). After excluding 3 trials that reported TTP or failure-free survival, an r2-value of 0.62 (95% CI, 0.23–0.85, P<0.001) was calculated between the median PFS ratio and median OS ratio in 14 trials with 16 paired arms.

Predicted treatment effect on PFS

The STE was 0.83 for all trials (Figure 3A), 0.81 for trials with gemcitabine-containing therapies (Figure 4A), and was not calculated for trials with targeted agents. The predicted median PFS ratio associated with a 20% improvement in the median OS was 0.71 for all trials, 0.69 for trials with gemcitabine-containing therapies, and 0.40 for trials with targeted agents.

Discussion

Our analysis demonstrated that an improvement in PFS was moderately associated with an improvement in OS in randomised trials of first-line chemotherapy for ABTC. A strong correlation between PFS and OS was confirmed in both subgroups of gemcitabine-containing trials and targeted agent-combined trials, which have been actively investigated in ABTC recently.

In the trials we analysed, the significant improvement in OS with the significant improvement in PFS was observed in two paired arms, which involved a comparison between GP and gemcitabine in the ABC-02 trial and a comparison between modified gemcitabine+oxaliplatin and best supportive care (Sharma et al, 2010; Valle et al, 2010). Particularly, the ABC-02 trial had the largest sample size. One of the reasons for our positive results was that most analysed trials tended to have results similar to those of the ABC-02 trial, although the slight improvement in OS was not statistically significant. Furthermore, it was suggested that the impact of the post-progression survival to OS was weak, compared with the association between PFS and OS. The validity of PFS as a surrogate end point for OS has been investigated in various types of cancers, but remains controversial in some (Burzykowski et al, 2008; Paoletti et al, 2013; Blumenthal et al, 2015; Johnson et al, 2015). The increased proportions of subsequent treatments and crossovers, as well as prolonged OS in recent cancer trials, might partly explain why the strong correlation between PFS and OS was not detected in first-line trials. The efficacy of subsequent treatment after the first progression has not been established in ABTC, and indeed, only a few active drugs are used as second-line chemotherapy for ABTC. The correlation between PFS and OS in second-line chemotherapy for ABTC was moderate and the evidence remains insufficient to recommend it in a systematic review (Lamarca et al, 2014), although 15–80% of patients received second-line chemotherapy in our analysed trials. Thus, this might not affect the close correlation of PFS and OS in the present study.

It is difficult to interpret whether a high correlation coefficient of a parameter with a true end point is clinically meaningful as a surrogate (Burzykowski and Buyse, 2006). The STE was introduced as a concept, which was defined as the minimum treatment effect on the surrogate necessary to predict a non-zero effect on the true end point. In several studies that demonstrated a strong correlation between PFS and OS, the STE on PFS exceeded 0.80 (Mauguen et al, 2013; Paoletti et al, 2013; Sidhu et al, 2013). Our result is similar, and the high surrogate thresholds were obtained both in all trials (STE 0.83) and in trials with gemcitabine-containing therapies (0.81). Therefore, PFS may be used as a surrogate end point for OS in first-line chemotherapy for ABTC. However, among trials with targeted agents, the STE on PFS could not be calculated and the PFS did not appear to be a valid surrogate end point for OS, despite its strong correlations with OS. The reason for this discrepancy may be attributable to the small number of trials and lack of a significant OS improvement in those trials.

According to the discussions recently held by working groups of the ASCO Research Committee, an improvement in the median OS of at least 20% was generally agreed upon as a clinically meaningful improvement in the outcome of a cancer clinical trial (Ellis et al, 2014). Using this benchmark of a minimum 20% improvement in OS, the calculated median PFS ratio in our study was ∼0.70. This value will become the targeted median PFS ratio when planning a randomised clinical trial of ABTC treatment with a superiority design.

This analysis did not reveal a strong correlation between RR and OS or DCR and OS. Recent meta-analyses of other cancers also failed to report strong correlations between these parameters (Burzykowski et al, 2008; Sidhu et al, 2013; Blumenthal et al, 2015). Some recent reports have described correlations between OS and novel parameters such as changes in the tumour volume, depth of response, and early tumour shrinkage in some types of cancers (Jain et al, 2012; Petrelli et al, 2015; Sahani et al, 2015; Sharma et al, 2015; Venook and Tabernero, 2015). However, the usefulness of those parameters as surrogate end points for OS remains unclear.

Our study has a few limitations. First, a majority of the main trials included in this analysis were phase II trials. Only three randomised phase III trials were included, one of which was closed early because of poor accrual. The accuracy decreased in trials with a small sample size. Therefore, we analysed trials with total sample size of ⩾100 patients and trials excluding early closed trials, and confirmed that the correlations between PFS and OS were similar to those reported in all trials. Second, the correlations between the median PFS and OS ratios were analysed mainly because only a few HRs were reported for OS and PFS. However, the correlation between the reported HRs for PFS and reported HRs for OS was similar to the results of the primary analysis. In addition, the HRs for PFS and OS were found to be reasonably well-represented by the ratios of the medians (Redman et al, 2013). Third, the tumour locations varied among biliary tract cancer cases, and poorly visible perihilar tumours, in which progression was difficult to assess, were often included. These might explain the differences in efficacy observed between trials. The reported median PFS and OS in our analysed trials were uneven among the gemcitabine-alone arms and among the gemcitabine+platinum combination therapy arms. Additional analysis involving the proportions of tumour locations among trials might be needed in future. Finally, this study was based on literature, and thus future analyses of individual patient data are needed to confirm the strength of the correlation in ABTC (Buyse et al, 2010). To validate our prediction externally, we searched for eligible studies from March 2015 to January 2016. Only two small randomised phase II trials, both of which were presented as abstracts at the ASCO annual meeting 2015, were identified (Jensen et al, 2015; Vogel et al, 2015). These trials included both gemcitabine-containing regimens and targeted agents. The reported median OS ratios were 1.39 and 0.60; our calculated OS ratios were 1.24 and 0.91, respectively. There were a small number of trials for the external validation, and it should be done in future.

In conclusion, we found a moderate correlation between PFS and OS, and a relatively good STE value. Our regression model could provide benchmarks to calculate the PFS in order to demonstrate the clinically required improvement of OS. In patients with ABTC, OS could easily be considered the primary end point in a phase III trial, because the post-progression survival is short. Accordingly, our results indicate that PFS is an appropriate end point in a phase II trial of a newly developed drug.