Background
Multiple myeloma (MM) is the second most common hematologic malignancy, after non-Hodgkin lymphoma [
1,
2]. In the US, it was estimated that over 20,500 new cases of MM and more than 10,600 deaths occurred in 2011 [
3]. Despite improved survival over the past decades, MM remains an incurable disease, with research focused on finding more effective treatments [
4]. Although improving overall survival (OS) has been the gold standard outcome for new anticancer treatments, large costly trials with long follow-up periods are required to document an impact on OS [
5,
6]. Furthermore, OS can be influenced by trial design characteristics, such as crossover and sequential treatments [
7,
8]. Therefore, surrogate endpoints that can be measured sooner and more frequently during the course of a clinical trial, are being used to provide an earlier indication of efficacy [
9].
A surrogate endpoint is a measurement that can be substituted for the final endpoint (e.g., improvement in OS) to successfully measure the effect of an intervention [
10]. Common surrogate endpoints for OS used in clinical oncology trials include: response rate; time to disease progression (TTP); progression-free survival (PFS); and event-free survival (EFS) [
6]. For study conclusions to be valid, differences or changes observed in the surrogate endpoints must accurately reflect changes in the final endpoint [
11]. There is ongoing debate about the utilization of these time-dependent endpoints (TDEs) as intermediate endpoints for OS in clinical trials [
12,
13], as well as their value to health authorities when assessing drug approvals and assessing costs of drug therapy [
7,
14,
15].
In 1992, the US Food and Drug Administration (FDA) instituted the accelerated approval process to allow earlier marketing of drugs that treat serious, life-threatening diseases [
16]. Recently, the FDA ruled that both TTP and PFS are valid and clinically relevant TDEs that can be used in the accelerated approval process for MM agents [
17]. Although these endpoints are generally thought to be reliable in MM, their predictive value for OS is unknown. Our objective was to estimate a quantitative relationship between median TDEs and median OS from prospective published MM studies in order to address the question of what the expected median OS would be given the observed effect in the median TDE.
Discussion
There is a sound body of evidence suggesting that TDEs such as PFS, TTP, and EFS are appropriate surrogate endpoints for OS in several types of cancer [
31,
60‐
71]. However, some conflicting evidence [
61,
72,
73] and some methodological [
11], regulatory [
74], and conceptual/practical [
75] arguments fuel the ongoing discussion about surrogate endpoints in the cancer literature [
76‐
78] and challenge the establishment of TDEs in oncology clinical development [
79].
Our study is the first to highlight the value of TDEs in predicting OS in MM and to confirm the recommendations of the American Society of Hematology/US FDA Workshop on Clinical Endpoints in both newly diagnosed or relapsed/refractory MM [
17]. We focused our research on estimating the absolute effect of TDEs on OS rather than using a relative measure. We are aware that other assessments of potential surrogate endpoints require a two-step validation process, which involves: 1) establishing that the surrogate endpoint predicts the final endpoint accurately; 2) demonstrating that the effect of treatment on both the surrogate endpoint and the final endpoint is closely correlated [
11]. Our methodology, while inherently considering the two last criteria, follows a less formal [
80] approach by using regression modeling methods to show that the effect on median OS is captured by the TDE (validation criterion 1) and that adding treatment to the linear predictor does not improve the prediction (validation criterion 2) (i.e. does not improve the fit), hence suggesting that the causal link between treatment and endpoint has been captured by the TDE predictor.
In order to assess our model’s ability to predict OS in different settings, we confronted our estimates with data from studies in relapsed/refractory MM led by Dimopoulos et al. [
50] and Richardson et al. [
51] (Figure
3). The study by Dimopoulos et al. comparing Len plus Dex with Dex alone reported a hazard ratio (HR) for progression of 0.31 (median 13.4 vs 4.6 months) [
81], and for death of 0.71 (median 38.0 vs 31.6 months) [
50]. In the study by Richardson et al., which compared Bort to Dex, the reported HR for progression was 0.55 (median 6.2 vs 3.5 months) [
82] and for death was 0.77 (median 29.8 vs 23.7 months) [
51].
Estimates of median OS using our model suggest an HR for death of 0.34 for Len plus Dex vs Dex alone (median OS 33 vs 11 months), and 0.55 for Bort vs Dex (median OS 15 vs 9 months), assuming event times are exponentially distributed [
83]. In this case, the treatment effect on TTP would explain more than 90% of the treatment effect on OS for both Len plus Dex and for Bort.
It has been argued that OS is not a realistic endpoint in this setting [
84], especially considering the ever increasing availability of new, effective drugs that can be used as salvage therapies [
8,
85] which may mask the real survival differences between treatment arms. Statistical methods to correct for bias resulting from non-informative censoring (crossover and subsequent treatment options) in survival analysis are increasingly popular [
86,
87]. In a recent paper by Ishak et al. [
88], information from trials conducted by the Medical Research Council (United Kingdom) was used to calibrate survival regression analyses in order to reproduce survival estimates corrected for patient crossover in clinical trials. These authors present a median OS of 11.6 months (95% CI, 9.5–14.2) for patients with > 1 prior therapy randomized to Dex [
50], which is similar to our estimate of 11.3 months. Furthermore, in a survival analysis adjusted for crossover in the APEX trial, Pacou et al. report an OS HR of 0.59 for Bort relative to Dex [
89], which is also very similar to the value derived from our model (HR 0.55), suggesting that our model performs accurately in trials with substantial crossover. Nonetheless, caution is recommended for extrapolation outside the context of our sample because more mature data on more recent clinical trials and future research in this topic is clearly needed.
We provide a more straightforward way of calculating the expected effect of treatment on median OS (prior to the observation of mature OS data), by estimating an absolute rather than a relative measure for the quantitative relationship between the median TDE and median OS. This regression model recognizes the influence of subsequent therapies because it estimates a mean effect of median TDE on median OS using OS data published in the literature, which is uncorrected for the effect of non-randomized subsequent treatment options. We estimated an average increment of 2.45 months in median OS for each additional month of median TDE. As previously highlighted, these estimates are valuable to assess the expected impact of treatments on median OS, for example in trials of newly diagnosed MM where median OS may not be reached for several years.
Information on survival is essential for clinical trial design [
90], accelerated approvals for new drugs [
91], indirect drug comparisons, and economic considerations (e.g. formulary inclusion and other reimbursement decisions), particularly in the absence of head-to-head comparative clinical trials. Such information may help clinicians select the most suitable treatment options for MM patients.
Other studies examining the relationship between TDEs and OS have been reported in metastatic colorectal cancer (mCRC) [
31,
62] and in metastatic breast cancer [
72]. In mCRC, there was a strong association between PFS and OS [
31,
62], with similar correlation coefficients as obtained in our analysis of MM patients [
62]. In metastatic breast cancer, no particular endpoint was determined to be an adequate surrogate for OS [
72]. The different conclusions from studies in breast cancer, mCRC, and MM emphasize the fact that appropriate TDEs cannot be generalized in oncology, and their validity depends on tumor type.
The following caveats should be considered when interpreting our results. Although it seems reasonable to question the endogeneity of TDEs as an explanatory variable for OS, this issue has not been addressed in the MM literature. The methodology presented here attempts to solve the endogeneity problem, but its applicability depends on the availability of valid instruments.
In this analysis, TDEs include three distinct surrogate endpoints; TTP, PFS, and EFS. The estimated relationship between the TDE and OS represents the relationship between an “average” TDE and OS. Although no statistical differences have been found in modeled OS by the type of TDE, the value of the information is limited. Further studies are necessary, particularly to clarify the data from studies using TTP, both because of the competing risk estimation problems [
92] and the arguments against the use of TTP [
7]. Testing could be performed by either modeling each of the subsamples or by including an interaction term between the TDE and type of surrogate endpoint marker in the regression. In the current analysis, no testing could be performed due to the sample size and need for additional (valid) instruments.
Our analysis includes therapies available over a period of 40 years that demonstrated a wide range of efficacy levels. We attempted to control these differences by using publication year as a covariate. In addition, our censored analysis omitted treatment arms with proportionally longer median OS and therefore may not reflect adequately the impact of newer, more effective therapies. Finally, the majority of the studies did not report whether data for OS included patients who were allowed to crossover between treatment arms. Study designs that include automatic treatment crossover can obscure differences in OS, due to the benefit achieved from subsequent treatments [
8].
Competing interests
The authors declare they have no competing interests.
Authors’ contributions
JF coordinated the research project, participated in the conception and design of the study, participated in results interpretation and discussion, and drafted the manuscript. FA participated in design of the study, performed the statistical analysis, and participated in results interpretation and discussion. JMA participated in design of the study, coordinated the literature review, and participated in results interpretation and discussion. FJMC participated in design of the study and coordinated the literature review. DF participated in the literature review and in data extraction. ABSP participated in the conception and design of the study and participated in results interpretation and discussion. RR participated in the literature review and in data extraction. JFRR participated in the literature review and in data extraction. All authors read and approved the final manuscript.