Introduction
Multiple sclerosis (MS) is a chronic, neurodegenerative disease in which putative auto-inflammatory responses attack myelinated axons of the central nervous system (CNS), causing the formation of scar tissue and disruption of nerve impulses traveling to and from the brain. This damage can result in a wide range of possible physical and mental symptoms [
1]. Relapsing–remitting MS (RRMS), the type of MS that is the first diagnosis in 80–85% of patients, is characterized by episodes of neurological dysfunction, known as relapses, followed by periods of remission. Disease-modifying therapies (DMTs) form the mainstay of first-line treatment for RRMS. Until recently, most approved DMTs required administration by injection (interferon beta and glatiramer acetate) or intravenous infusion (natalizumab). Injectable agents are, however, associated with injection site reactions, as well as other tolerability issues (such as influenza-like symptoms), poor patient adherence and moderate efficacy [
2,
3]. Three new oral therapies with different mechanisms of action have recently been approved for the treatment of MS. Fingolimod was the first oral therapy approved for the treatment of relapsing MS. It was approved as a first-line treatment in the USA in September 2010, and was recommended in the EU in March 2011 for the treatment of patients with high disease activity despite previous treatment with at least one other DMT and individuals with rapidly evolving severe RRMS [
4]. Subsequently, teriflunomide was approved in the USA in September 2012 and in Europe in March 2013 [
5,
6]. Dimethyl fumarate (DMF; BG-12) was approved in the USA in March 2013 and recently in Europe as well [
7,
8].
DMTs aim to reduce the frequency and severity of relapses, extend the time intervals between relapses and slow progression to permanent disability [
2]. To assess these treatment goals, annualized relapse rates (ARRs) or time to first relapse and disability progression, as measured by the expanded disability status scale (EDSS), are the primary clinical endpoints of phase 3 studies of therapies for RRMS, with magnetic resonance imaging (MRI) measures of disease activity and burden (CNS lesions) as secondary endpoints. Oral therapies have been shown to offer benefits with regard to these clinical and MRI outcomes when compared with placebo in phase 3 trials [
9‐
13]. The clinical efficacy of these therapies over traditional injectable DMTs has been demonstrated for fingolimod in the trial assessing injectable interferon versus FTY720 oral in RRMS (TRANSFORMS) [
14], and for the 7 mg dose (but not the 14 mg dose) of teriflunomide in the teriflunomide and Rebif (TENERE) trial [
15]. Findings of these phase 3 trials indicate that most doses of oral therapies may represent an advance in the treatment of MS because they offer effective treatment options that are often better tolerated and more convenient than the traditional injectable DMTs.
In response to this therapeutic progress, treatment expectations and goals have evolved to encompass potential remission from the progressive symptoms of MS, known as freedom from disease activity or no evidence of disease activity (NEDA) [
16]. Several exploratory analyses have investigated the efficacy of oral DMTs versus placebo on achieving NEDA status, defined as an absence of relapses, disability progression lasting at least 3 months and no new MRI lesions [
17‐
22]. Post hoc analyses of the 2-year, placebo-controlled, phase 3 FTY720 research evaluating effects of daily oral therapy in multiple sclerosis (FREEDOMS) trial demonstrated that a significantly higher proportion of patients treated with fingolimod 0.5 mg achieved NEDA status than those treated with placebo (33% vs. 13%;
P < 0.001) [
21]. In an integrated post hoc analysis of the phase 3 determination of the efficacy and safety of oral fumarate in relapsing–remitting multiple sclerosis (DEFINE) and comparator and an oral fumarate in relapsing–remitting multiple sclerosis (CONFIRM) trials, the proportion of individuals free from disease activity over 2 years was higher for the DMF 240 mg twice daily group than for the placebo group (23% vs. 11%;
P < 0.0001) [
20]. In a post hoc analysis of teriflunomide multiple sclerosis oral (TEMSO), a greater proportion of patients treated with teriflunomide 7 or 14 mg were free from disease activity than individuals receiving placebo (18% and 23% vs. 14%;
P = 0.0293 and
P = 0.0002, respectively) [
23].
There are no head-to-head controlled trials comparing the efficacy of the different oral DMTs. This is an area of much interest to neurologists and healthcare decision makers; therefore, several indirect treatment comparisons have recently been performed. Of these, two studies have compared fingolimod with teriflunomide [
24,
25]. A network meta-analysis (NMA) found a significantly lower ARR with fingolimod than with teriflunomide 14 mg, but no significant difference in the proportion of patients with 3-month confirmed disability progression [
24]. A separate NMA study found no statistically significant differences between fingolimod and teriflunomide 7 or 14 mg on measures of freedom from relapse and disease progression [
25]. A recent study has additionally compared fingolimod with DMF using an NMA approach and found no significant differences in ARR or in the proportion of patients with disability progression lasting at least 3 months [
26].
Standard NMA methods may be susceptible to bias because of differences in trial populations and methodologies. The placebo-controlled trials of these oral MS therapies are not sufficiently similar and differences between the trials, including differences in patient populations, endpoint definitions and methods for dealing with non-completers, have not been taken into account in any of the NMAs of these therapies performed to date. Subgroup and post hoc analyses of the phase 3 trials of DMTs have demonstrated that differences in patient baseline characteristics influence the observed effect of DMTs on ARRs and disability progression [
14,
27], and that the application of different definitions of disability progression has a large impact on disability outcomes [
28]. Therefore, it is important to adjust for these potentially confounding factors when assessing the comparative efficacy of these oral DMTs. Limited methodology exists to perform this type of adjusted comparison. Therefore, we developed a statistical modeling approach to compare treatment effects that adjusted for differences in patient characteristics and methodologies across the MS trials and allowed for the use of a combination of individual patient- and population-level data, thus permitting the utilization of all available data for these treatments [
29‐
32]. Here, we have compared the effectiveness of oral therapies for MS (fingolimod 0.5 mg, DMF 240 mg twice daily and teriflunomide 7 or 14 mg) for achieving NEDA status. Our modeling approach uses all publicly available data for oral therapies and individual patient-level data from the phase 3 placebo-controlled trials of fingolimod.
Discussion
It is often useful for neurologists, health policy makers and patients to compare the efficacy of therapies for MS, and with the recent introduction of these oral therapies, there is much interest in their comparative effectiveness. This study was a comparison of the efficacy of oral DMTs using a statistical modeling approach to account for differences between the individual, placebo-controlled, phase 3 trials conducted in patients with RRMS. The approach estimated what the RRs of achieving NEDA status would be between two treatments using a comparison in the form A is to B and C is to B to infer the comparison of A to C. The results estimated that in comparisons without covariate adjustment, the RR of achieving NEDA status was higher for fingolimod versus placebo than for DMF and teriflunomide versus placebo, for the three composite measures of NEDA. These results remained similar when models adjusted for differences between the phase 3 trial patient populations. In addition, the indirect comparisons of oral DMTs estimated that fingolimod was more efficacious than both DMF and teriflunomide (i.e., RRs >1) in their respective trial populations for all three composite measures of NEDA, and in most cases these results were statistically significant.
Randomized head-to-head trials are the best method for evaluating the efficacy of different treatments. There is, however, a lack of head-to-head clinical trials, so indirect comparisons provide a means to assess the treatments. The method proposed by Bucher et al. [
33], in which an indirect comparison of two therapies is adjusted according to the results of their direct comparisons with placebo, is valid only if differences in the patient populations do not affect the treatment effect and endpoints are equally defined. Given that the FREEDOMS trials were not sufficiently similar to the DEFINE, CONFIRM and TEMSO trials, use of the Bucher methodology without any adaptation may not have provided a valid comparison. In addition, we sought to use individual patient-level data, which were available for the FREEDOMS trials but not for the DEFINE, CONFIRM and TEMSO trials. We therefore developed a modeling approach for indirect comparisons, which was built upon the Bucher method that adjusted for differences in patient characteristics and methodologies across the trials and allowed for the combination of individual- and population-level data to be used. The model was created by expressing key outcomes from the pooled FREEDOMS trials as a function of baseline characteristics, and then applying this model to an average patient in the pooled DEFINE and CONFIRM trials, as well as to an average patient in TEMSO, to predict the efficacy of fingolimod versus placebo on three composite measures of NEDA.
While alternate modeling approaches are possible (see Table
2), these methods are less suitable because they do not allow for all of the following to be appropriately achieved: (1) controlling for differences in patient populations; (2) accounting for differences in endpoint definitions; (3) accounting for the way in which non-completers are dealt with; and (4) using individual patient data where they are available. For example, a Bayesian mixed treatment comparison has been used to compare the efficacy of teriflunomide with other approved DMTs in the treatment of MS [
24]. Mixed treatment comparisons using Poisson, mixed-log binomial, time-to-event and continuous models have been used to compare the efficacy and safety of DMF with other approved DMTs including fingolimod. However, these analyses could not adjust for differences in trial methodology or endpoint definitions across trials [
26], and although this could be achieved by performing sub-analyses, these methods require data to be available from several studies to enable reasonable estimation of the random effects. Meta-analysis methods are also available to synthesize individual patient and aggregate data, and enable adjustment for patient baseline characteristics [
37]. Such methods would also allow differences in treatment effect due to differences in patient population to be accounted for, using a treatment–covariate interaction, but again these methods would be hindered by not having enough studies in the network to enable reasonable estimation of the random effects. The small number of studies and the need to account for endpoint definitions by performing additional sub-analyses (which would reduce the number of studies even further) made this method inappropriate in our case. An alternate method that could have been applied is the propensity score method of Signorovitch et al. [
31]. This method adjusts for a predefined set of patient baseline characteristics, whereas our approach selects from such a set that best predicts the treatment effect. In the case of MS, in which studies have largely deduced potential treatment modifiers, our approach avoids over-parameterization of the model and enables selection of a parsimonious model.
Table 2
Modeling methods for indirect treatment comparisons
Mixed treatment comparison using summary level data | Does not take into account differences in patient population, endpoint definitions and ways of dealing with non-completers between trials and does not make use of individual patient-level data |
Differences in patient populations could be accounted for using meta-regression by including study-level treatment–covariate interactions [ 45], but adjustments at the study level can be susceptible to the ecological fallacy, where the relationship between outcome and covariate may not be the same at the study and individual level |
Differences in trial methodology could be accounted for using sub-analyses but this requires a larger number of studies than is available in the present case to enable estimation of the random effects assuming that there is heterogeneity in treatment effect between studies [ 46] |
Mixed treatment comparison using individual and summary level patient data [ 47] | Enables the use of individual patient data and adjustment for patient populations, but it does not take into account differences in endpoint definitions or the different ways of dealing with non-completers |
This methodology can also be susceptible to ecological fallacy, require a random effects model and a separate analysis to adjust for endpoint definitions or the different ways of dealing with non-completers |
Bucher pair-wise indirect comparison [ 33] | Enables endpoint definitions or the different ways of dealing with non-completers to be adjusted for, but does not make use of individual patient data and adjust for patient populations |
This methodology can be built on to adjust for patient characteristics and use individual patient data as demonstrated in our study |
Matching-adjusted indirect comparison [ 31] | Enables the use of individual patient data, adjustment for patient populations and trial methodology. This methodology uses individual patient data from trials of one treatment to match baseline summary statistics reported from trials of another treatment |
This method adjusts for a predefined set of patient baseline characteristics and may over-fit the prediction model. This approach may not have sufficient power for all treatments being assessed |
In this analysis, our modeling approach suggests that differences in average patient characteristics between the populations of the clinical trials of the oral therapies have a marginal impact on indirect comparisons of NEDA outcomes, because model outputs before adjustment for baseline covariates are similar to the outputs after adjustment. Taking previous DMT use as an example, the pooled FREEDOMS population had a higher rate of previous DMT use than the other trial populations. A smaller effect on achieving NEDA status might therefore be expected in this population than in one with less previous DMT use, and this was observed. Thus, adjusting for previous DMT use is likely to improve the comparative effectiveness of fingolimod relative to other therapies studied in a population with lower rates of DMT use. However, other differences in trial populations might lead to a greater effect on achieving NEDA status and the effects of different variables may eventually cancel each other out. Our methodology is indeed designed to improve on simply comparing raw event rates across studies. Our modeling approach showed that differences in trial methodologies had a greater impact on NEDA outcomes than differences in patient characteristics, thus highlighting the importance of adjusting for these methodological differences. The impact of these differences was exemplified by the RR predicted when using the DEFINE and CONFIRM approach of dealing with non-completers compared with using the TEMSO method.
This study assessed treatment efficacy using three composite measures of NEDA that were based on the absence of relapses, disability progression, Gd-enhancing T1 lesions, and new or newly enlarged T2 lesions. These individual component measures are well-established indicators of disease activity and are commonly assessed in clinical trials [
17‐
22]. As the effectiveness of treatments for MS increases, the composite endpoint of NEDA is becoming an important measure for clinicians and patients [
16]. The use of these composite endpoints, however, does have some limitations because they do not take into account other potentially important indicators of disease activity, such as brain volume loss or cognitive function. In addition, some analytical adjustment to account for the dominance of one component measure may potentially be required. For example, one analysis has shown that the overall composite endpoint is driven to a large extent by MRI outcomes, with minimal contribution from clinical measures [
32]. Finally, the number and timing of MRI scans were identical for the FREEDOMS trials and DEFINE and CONFIRM, but different for TEMSO. Imbalances in the timing or scheduling of scans could have an impact on MRI outcomes and the extent to which these outcomes contribute to the overall NEDA. Further research is needed to define the best combination of criteria that represents NEDA in MS and the best population in which to adjust the results, but this study provides a valuable exploration into the concepts.
Endpoint definitions also impact the results. In an analysis of the CombiRx trial, which evaluated interferon beta-1a and glatiramer acetate in patients with RRMS, using a 1.0-point increase in EDSS score as definition of progression, 15% of individuals whose screening EDSS score was greater than baseline “progressed” by month 3; that is, many went back to their screening value leading to false positive progressions and diminishing the treatment effect. When a 1.5-point definition of progression was used instead, the false positive progressions were reduced, enhancing the treatment effect [
38]. A similar impact on treatment effect was observed in the FREEDOMS trials, where the treatment effect with respect to 3- and 6-month confirmed disability progression was numerically greater when requiring a 1.5-point change [
28,
39]. Thus, in our study, treatment effect may be lower in the teriflunomide comparisons using the FREEDOMS and TEMSO definition of disability progression (1.0-point increase in patients with a baseline EDSS score of 0), compared with the DMF comparisons, which used the DEFINE and CONFIRM definition (1.5-point increase in patients with a baseline EDSS score of 0).
As with all statistical modeling, limitations exist based on assumptions that are necessary to make the modeling feasible. Firstly, indirect treatment comparisons are a type of observation research, owing to the non-randomized selection of studies for inclusion in these analyses, and are subject to confounding. Our modeling approach, in contrast to several alternative methodologies, reduces this confounding by controlling for differences in patient populations. In addition, our approach is based on the Bucher method and is therefore subject to the same assumptions as this methodology, for example, the transitivity of the treatment effects assumes we can learn about the effect of A versus C via B [
40]. Furthermore, it was assumed that the outcomes of the trials were influenced by a specific set of covariates, but it is possible and indeed likely that results are affected by additional variables not included in the models, such as the treatment environment at the time these studies were conducted and/or the countries or practices involved. We adjusted for known baseline variables, but we could not account for subtle unmeasured selection criteria as sources of influence or bias. Controlled trials in MS have demonstrated the relevance of such hidden selection biases because identical selection criteria have resulted in similar baseline characteristics, but widely different responses to placebo across studies [
41]. In addition, we had to make several assumptions about the methodology used in the TEMSO trial, because this information was not available at the time of planning the analysis. We assumed that the TEMSO trial used the same method of dealing with non-completers as the FREEDOMS trials, but it is possible that an alternative method was used that should have been controlled for in the models. There may also have been additional differences in study methodologies that could affect the results, which we did not account for, such as differences between trials in the use of unscheduled visits for assessing suspected relapses or disability progression. For example, if unscheduled visits (in contrast to scheduled visits) were used to confirm disability progression, an impact on the overall disability progression rate could occur. There is also uncertainty regarding the standard population chosen in which to adjust the results. Statistical analyses usually assume that all patients are at a similar risk of disease activity, but if the adjusted covariate is a key variable, the results could differ considerably in different populations. We also assumed that non-completers had the same likelihood of being disease free as those who completed a trial. This might have led the efficacy results of two therapies to be more similar than in reality if the less effective DMT was associated with higher dropout rates but the number of completers was similar to completers taking the more effective therapy. Lastly, we assumed that the probability of achieving NEDA status could be reasonably predicted using a linear model. The goodness-of-fit assessment demonstrated that the predicted probability of achieving NEDA status was similar to the observed probability of achieving NEDA status, suggesting that this was an appropriate choice of model. Our conclusions must be interpreted with caution because of the assumptions inherent in any indirect comparison.
Acknowledgments
Novartis Pharma AG provided funding for this project and the article processing charges. All authors had full access to all of the data in this study and take complete responsibility for the integrity of the data and accuracy of the data analysis. Richard Nixon and Niklas Bergvall conceived and designed the study, analyzed the data, interpreted the results, and critically reviewed and approved the manuscript. Davorka Tomic, Nikolaos Sfikas, Gary Cutter and Gavin Giovannoni designed the study, interpreted the results, and critically reviewed and approved the manuscript. All named authors meet the ICMJE criteria for authorship for this manuscript, take responsibility for the integrity of the work as a whole, and have given final approval to the version to be published. The authors take full responsibility for the content of the paper. They thank Dr Gemma Carter and Hilary Phelps (Oxford PharmaGenesis Ltd, Oxford, UK) for medical writing support, editorial assistance, and collation and incorporation of comments from all authors. Support for this assistance was funded by Novartis Pharma AG.
Conflict of interest
Richard Nixon is a paid employee of Novartis Pharma AG. Niklas Bergvall is a paid employee of Novartis Pharma AG. Davorka Tomic is a paid employee of Novartis Pharma AG. Nikolaos Sfikas is a paid employee of Novartis Pharma AG. Gary Cutter has received personal compensation for participation in Data and Safety Monitoring Committees for Sanofi-Aventis, Cleveland Clinic, Daiichi Sankyo, GlaxoSmithKline Pharmaceuticals, Genmab, Eli Lilly, Medivation, Modigenetech, Ono Pharmaceutical, PTC Therapeutics, Teva Pharmaceuticals, Vivus, University of Pennsylvania, National Heart, Lung, and Blood Institute, National Institute of Neurological Disorders and Stroke, and National Multiple Sclerosis Society. He has also received consulting and speaking fees, and served on advisory boards for Alexion Pharmaceuticals, Bayhill Therapeutics, Bayer Pharmaceuticals, Celgene, Novartis, Consortium of Multiple Sclerosis Centers (grant), Genzyme, Klein Buendel Inc., Nuron Biotech, Peptimmune, Somnus Therapeutics, Sandoz, Teva Pharmaceuticals, University of Texas Southwestern, and Visioneering Technologies Inc. He is President of Pythagoras Inc. and has received Consortium of Multiple Sclerosis Centers task orders that involve research for various pharmaceutical organizations. Gavin Giovannoni has received honoraria from Bayer HealthCare, Biogen Idec, Canbex, Genzyme, GlaxoSmithKline, Merck Serono, Novartis, Protein Discovery Laboratories, Roche, Synthon, Teva Neuroscience, and UCB; research support from Biogen Idec, Ironwood, Merck Serono, Merz, and Novartis; and compensation from Elsevier as co-chief editor of MS and Related Disorders.