Introduction
Delayed-release dimethyl fumarate (DMF; also known as gastro-resistant DMF) has been shown to be effective for the treatment of relapsing–remitting multiple sclerosis (RRMS) in randomised clinical trials (RCTs), post hoc analyses, and real-world effectiveness studies [
1‐
5]. The CONFIRM study and subsequent post hoc analyses found DMF had greater efficacy in terms of reduction in annualised relapse rate (ARR) versus placebo or glatiramer acetate (GA) [
1,
4]. Post hoc analyses of clinical trial data using mixed treatment comparisons or indirect matching-adjusted methods also found DMF was associated with improved efficacy, as mainly assessed by ARR, compared with interferons (IFN), or teriflunomide (TERI), and with similar efficacy compared with fingolimod (FTY) [
2,
3,
5].
Randomised clinical trials and other controlled trials are necessary for satisfying regulatory agencies regarding efficacy and safety of new medications, but provide limited information relevant to clinical settings and health policy decision-making [
6]. Controlled trials also are costly to perform and have time constraints. As a result, comparative effectiveness (CE) research has emerged as a means of incorporating real-world clinical data, which may include multiple comparators and patients generally excluded by the strict methodology of RCTs. A key objective of CE research methodologies is to reduce the inherent bias associated with treatment selection and decision. In the case of propensity score matching (PSM), potential baseline (i.e. index therapy initiation) and treatment confounders are matched to ensure comparability between treatment groups at baseline, such as those in a retrospective observational study or with patients drawn from a prospectively designed registry. To further improve the reliability of real-world evidence-based data, analysis-censoring procedures need to be implemented and communicated.
Dimethyl fumarate has been compared with a number of alternative treatments using CE research methods, including claims-based analyses [
7,
8], a cross-sectional study [
9], and a number of PSM analyses based on registries [
10‐
14]. In the PSM analyses performed to date, DMF showed improved clinical effectiveness in terms of reductions in ARR and time to first relapse (TTFR) events versus IFN, GA, and TERI [
13,
14], and a similar rate of relapse as FTY [
10‐
12,
14]. To support this growing body of real-world evidence, this CE analysis based on the German NeuroTransData (NTD) multiple sclerosis (MS) registry was conducted to assess real-world CE of DMF compared with IFN, GA, TERI, and FTY in PSM cohorts of patients with RRMS.
Methods
German NTD registry
The NTD is a Germany-wide network of physicians founded in 2008 in the fields of neurology and psychiatry. Currently, 78 neurologists in 153 offices work in NTD practices serving about 600,000 outpatients per year. Each practice is certified according to network-specific and ISO 9001 criteria. Compliance with these criteria is audited annually by an external certified audit organisation. The NTD MS registry includes about 25,000 patients with MS. In the database, demographic, clinical history, and clinical variables are captured in real time during an average of 3.7 visits and Expanded Disability Status Scale (EDSS) assessments per year per patient. A unique relapse definition is applied within NTD. Standardised clinical assessments of functional system scores and EDSS calculation are performed by certified raters (
http://www.neurostatus.net/). All personnel undergo regular training to ensure quality of data in the database. Both automatic and manually executed queries are implemented to further ensure data quality. All data are pseudonymised and pooled to form the MS registry database. This data acquisition protocol was approved by the ethical committee of the Bavarian Medical Board (Bayerische Landesärztekammer; June 14, 2012). On average, nearly four EDSS assessments per year were obtained per individual patient.
Study population
Dimethyl fumarate populations were compared with the following first-line treatment populations: IFN, GA, TERI, and FTY all-comer populations. DMF also was compared with a FTY (European) label population, which includes patients who have either highly active disease that has not responded to other disease-modifying therapies (DMTs) or rapidly progressive disease [
15,
16].
Inclusion criteria for all treatment comparisons were: RRMS (10th revision of the International Statistical Classification of Diseases and Related Health Problems codes G35.0, G35.9, G35.10, or G35.11), age at least 18 years, and to ensure a minimal follow-up time within the study, a valid EDSS measurement and/or a relapse after index therapy initiation was required. Median follow-up frequency between visits, including EDSS assessment within NTD, was approximately 3 months. The relapse criterion was introduced to ensure that patients with an early relapse after index therapy initiation were not excluded from the analysis population. For treatment comparisons with IFN, GA, or TERI, patients had to be either treatment naive or have received pre-treatment with other first-line therapy (e.g. GA or TERI in the case of IFN). In addition, patients treated with injectable therapies (IFN, GA) required therapy initiation from January 1, 2010 onwards to better reflect the current treatment landscape given that the introduction of oral therapies occurred at this time. For comparisons with the FTY label population, patients had to have been pre-treated with IFN, GA, or TERI, with an on-therapy relapse within the last 12 months (to reflect the European label) and have switched from pre-treatment indicating treatment failure on first-line therapy, with a treatment gap of up to 6 months. For comparisons with the FTY all-comer population, patients were either treatment naive or had switched from pre-treatment with IFN, GA, or TERI, with a treatment gap of up to 6 months.
Exclusion criteria were pre-treatment with a DMT other than those allowed or specified as part of the inclusion criteria. Specifically, this meant pre-treatment with any DMT other than: GA or TERI for comparisons with IFN, IFN or TERI for comparisons with GA, GA or IFN for comparisons with TERI, and GA or TERI for comparisons with FTY all-comer or label populations.
The analysis population included all patients who satisfied the inclusion/exclusion criteria and started index therapy (dosed at least once) with a relapse measurement and/or EDSS measurement post-index therapy initiation. For each index therapy comparison, any single patient could only participate once, but each patient could contribute to more than one index therapy comparison.
Study outcome measurements
The primary outcome measurement was TTFR. Secondary outcome measurements were ARR, proportion of relapse-free patients at 12 and 24 months, time to index therapy discontinuation (TTD), and reasons for discontinuation. Time to EDSS confirmed disability progression (CDP) at 3 and 6 months was included as an exploratory outcome measurement. CDP events were defined as at least 0.5-point EDSS score increases for patients with baseline EDSS score greater than 5.5, and at least 1.0-point EDSS score increases for patients with baseline EDSS score 0‒5.5.
PSM and statistical analysis
No formal sample size was pre-calculated because available data already captured within the NTD registry were used. A 1:1 PSM (5:1 greedy matching algorithm [
17]) was used to match measured baseline characteristics of DMF populations to comparator populations for each treatment comparison [
18,
19]. Propensity scores were calculated using multiple logistic regression with the treatment cohort as the dependent variable and the following confounders at index therapy initiation as independent variables: age, sex, disease duration (from first clinical symptoms to start of index therapy), treatment history (number of previous therapies), baseline EDSS score, and total number of relapses in the past 12 and 24 months (based on actual follow-up period before index therapy initiation). For all comparisons other than with the FTY label population, treatment history was categorised and matched as 0 (treatment naive), 1, 2, 3, and 4+ (representing number of previous DMTs). For the FTY label population, treatment history was categorised as 1, 2, 3, and 4+ because treatment-naive patients were not included. Wilcoxon rank-sum and Chi-square tests were used to compare unmatched baseline characteristics by cohort, whereas in the matched data, Wilcoxon signed-rank and McNemar or Stuart–Maxwell tests for marginal homogeneity were used to compare baseline characteristics for continuous variables and proportions, respectively. Pre- and post-matching balance in baseline covariates were based on standardised mean differences (threshold 0.10), and the C-statistic [
20]. The C-statistic is a measure of balance in matched data and ranges from 0.5 to 1.0, with the minimum value indicating the propensity score model is perfectly balanced and has no ability to discriminate between cohorts after matching.
Time to first relapse, TTD, and time to CDP at 3 and 6 months were all analysed using a Cox marginal regression model taking into account the clustered nature of the matched design. For the confirmation of CDP at 3 and 6 months, EDSS scores recorded within 30 days after the onset of a relapse were excluded. Treatment effects were reported as hazard ratios (HRs) together with 95% CIs, and Kaplan–Meier methods were applied to obtain estimates at pre-defined time points. ARR was calculated as total number of relapses divided by total exposure (years), with treatment effect for ARR estimated using a generalised estimating equations (GEE) Poisson regression model. ARRs (95% CIs) for each cohort were presented and treatment effects reported as rate ratios (RRs), along with 95% robust CIs.
Non-pairwise censoring was the primary analysis method for all major outcome measurements. However, pairwise censoring was performed as a sensitivity analysis to account for potential differences between exposure times and to assess the robustness of the results.
Role of the funding source
Biogen was involved in study design, data analysis, and manuscript preparation. Biogen did not have access to patient-level data.
Discussion
Despite the primary role of RCTs for establishing efficacy and safety of new interventions or treatments compared with placebo or active control, several inherent shortcomings make their results difficult to generalise to real-world practice. Firstly, RCTs are unable to provide comparative information regarding all available treatment options. Secondly, patients enrolled in RCTs are selected against strict criteria and thereby do not reflect the broad range of patient characteristics, treatment history, comorbidities, and other factors seen in real-world cohorts. Moreover, patients, doctors, and payers now expect more reliable and transparent information to guide treatment decisions and resource allocation. National and multinational MS registries enable high-quality data acquisition based on modern technology. Advanced statistical methods such as PSM to ensure comparability of the treatment cohorts provide a scientifically sound and statistically rigorous basis for robust results to support the shared decision process between doctors and patients when selecting DMTs in daily practice.
This analysis of the German NTD MS registry implemented best practice in the analysis of non-randomised studies and real-world data to minimise the risk of the most critical bias in the following way [
21‐
23]. (A) PSM methods to minimise the risk of a selection bias when comparing matched patients treated with DMF and comparator DMTs [
24]. It should be noted as for any other PS-based analysis, the PS are only based on measured confounders and cannot account for unmeasured confounders, for example MRI or cognition could potentially be unmeasured confounders for this study. No evidence of significance difference between DMF versus FTY was detected in a sensitivity analysis (Supplementary Table 6). (B) Clear and unique relapse definition and certified raters in the EDSS assessment across NTD to minimise the risk of a detection bias, which may arise if the outcome measurement of interest (either relapse or EDSS in this study) is differently assessed between cohorts. (C) The approximately 3-monthly visit schedule (including relapse and EDSS assessment) across all cohorts (DMF and comparator cohorts) may mitigate the risk of a performance bias. Detailed information on median follow-up times is located in Supplementary Table 5. (D) A sensitivity analysis based on pairwise censoring was implemented to account for different follow-up time between the cohorts and therefore to mitigate the risk of attrition bias.
In this study, DMF therapy following previous relapse either on or off therapy proved to be superior to IFN, GA, and TERI regarding relapse activity. This is in line with results from other studies. Patients with high disease activity require a different perspective and, among the DMTs under investigation in this study, only FTY is labelled specifically for the treatment of such patients. In this study, DMF showed no evidence of significant difference with FTY in relapse outcome measurements. Importantly, this study had both a relatively large sample size and lengthy and frequent follow-up, while previous reports often have only large sample size or extensive, high-quality follow-up, but not both [
3,
7,
10‐
14]. Median follow-up frequency between visits, including EDSS assessment within NTD, is approximately 3 months. Of particular importance is the subgroup analysis demonstrating that the high efficacy of DMF, similar to FTY, also is seen in patients with RRMS with high disease activity and previous DMT failure. The results of our primary analysis extend those of previous studies, reinforcing the greater effectiveness of DMF relative to IFN, GA, and TERI, and similar effectiveness to FTY; however, our study also employed a sensitivity analysis using pairwise censoring to show the robustness of the results. This sensitivity analysis, which accounts for differences in treatment follow-up time, yielded results that were consistent with the primary non-censored analysis across all comparisons, supporting the robustness of these results.
TTD was found to be similar between DMF and IFN, GA, or TERI, while patients treated with FTY had a longer TTD. This suggests that robust and comparable patient adherence can be achieved with DMF if recommendations to mitigate gastrointestinal AEs during initiation of DMF are followed routinely, as they are throughout the NTD network. Such recommendations include patient coaching, taking DMF with food, slow dose titration, dose modification, and use of symptomatic therapies. Other common reasons for discontinuation seen consistently throughout treatment populations were patient decision for non-medical reasons and lack of efficacy.
We expect that longer-term observation will provide additional results and clarify the relative effect of DMF and comparators on disability progression in the future.
Previous CE studies have compared effectiveness of DMF with other DMTs, including direct and indirect comparisons of clinical trial or real-world data, some having incorporated PSM analysis methods. Overall, this PSM registry analysis is well supported by findings of previous CE studies. Data from clinical trials using comparisons based on post hoc direct, mixed treatment, or matching-adjusted indirect methods consistently support the results of this analysis, especially in terms of ARR absolute values and RRs [
1‐
5]. For example, the ARR RRs for DMF versus IFN, GA, and TERI noted in this real-world evidence from PSM data are highly consistent with those noted in direct and indirect comparisons of clinical trial data [
1,
5]. Following clinical trial data analysis, real-world evidence from analysis of insurance claims databases and patient data from academic medical centres emerged [
7,
8,
10‐
13]. Studies from these sources also found that treatment with DMF provided efficacy/effectiveness greater than IFN, GA, or TERI, and similar to that of FTY regardless of whether the comparison method was direct or involved propensity-adjusted cohorts. For example, a retrospective analysis of US claims data found that ARRs for DMF were lower than those for IFN, GA, and TERI and similar to those for FTY [
7]. Finally, real-world evidence has progressed to studies based on data from large multinational registries, including this analysis of the NTD registry [
14]. A previous PSM analysis of the MSBase registry (
http://www.msbase.org) closely reflects this NTD registry analysis with respect to both methods, as well as TTFR, ARR, and discontinuation results [
14].
Although the existing body of evidence that supports CE of DMF is generally based on well-conducted studies, this PSM analysis of NTD registry data can be differentiated from previous studies in several respects, including data sources and method of analysis, cohort types, and the nature of results (e.g. outcome measurements considered, length of observation or follow-up). The present PSM analysis is based on routinely collected data from outpatients seen in clinical practice and under real-world treatment conditions. The clinical practice data follow-up frequency was approximately every 3 months. This contrasts with clinical trial data, which may not properly represent real-world conditions. Insurance claims databases can provide real-world data but, unlike the current PSM analysis, generally give limited information on diagnostic criteria, disease severity, rate of progression, or EDSS status. Data from retrospective studies (ie, chart review) of patients from academic medical centres can provide such clinical data, but patient numbers are often less robust than with claims or registry data sources. Regarding cohort types, the present PSM analysis is based on outpatients seen in routine clinical practice and, to our knowledge, is the first to include patients who meet the more stringent requirements of the European label for FTY. In terms of methodology, this analysis employed best-practice PSM methods and confirming the robustness of results based on applied sensitivity analyses. In terms of specific results, the present PSM analysis includes a greater number of outcome measurements than typically used in previous studies. For example, data on discontinuation has not been consistently included in previous real-world effectiveness studies or post hoc analyses of clinical trials [
1,
5,
7,
8,
13]. Finally, this PSM analysis includes results based on up to 2 years of observation, whereas results from previous studies are often based on observation periods of up to only 1 year in duration [
7,
11,
12]. Hence, while most CE studies have been well conducted according to certain criteria, the present PSM analysis of NTD registry data is distinct in fulfilling these criteria most comprehensively.
Following previous relapse either on or off therapy, DMF was superior to IFN, GA, and TERI on relapse outcome measurements; in addition, there was no evidence of significant difference in efficacy between DMF and FTY, including among patients with high disease activity who met the European criteria for FTY. These results confirm those from previous CE studies and also provide additional support based on use of state-of-the-art PSM practices, careful cohort selection, and comprehensive inclusion of outcome measurements, as well as longer observation than several previous studies. The present PSM provides useful data for clinical decision making based on patient-relevant outcome measurements and further insight into comparative efficacy of commonly used agents for RRMS in a real-world treatment setting.