Introduction
Treatment guidelines for rheumatoid arthritis (RA) recognize the importance of attaining clinical improvement within 3 months and remission or low disease activity within 6 months of treatment initiation [
1,
2]. Biologic disease-modifying anti-rheumatic drugs (bDMARDs) are recommended in the presence of poor prognostic factors or if response to initial treatment is inadequate. The choice of bDMARD is often based on physician experience, patient preference, and cost, and is complicated by a variety of available agents [
3,
4]. Nevertheless, there is a remarkably similar plateau in responder rates for patients achieving 20% (ACR20), 50% (ACR50), and 70% (ACR70) response based on American College of Rheumatology criteria, irrespective of the bDMARD or targeted synthetic DMARD (tsDMARD) studied [
5‐
7].
Given the importance of rapid response to treatment in prevention of irreversible joint damage and improved symptom control, a personalized approach to treatment selection would be preferred over a prolonged and iterative trial-and-error process [
8]. However, only a few biomarkers have been identified as candidates for treatment optimization in RA, with at best modest associations with treatment response, and with inconsistent applicability in current clinical practice. For example, the presence of autoantibodies to rheumatoid factor or cyclic citrullinated peptide (CCP) may predict response to rituximab, and genetic factors such as the
HLA-DRB1 shared epitope may predict response to tumor necrosis factor inhibitors (TNFi) and tocilizumab, an inhibitor of the interleukin-6 receptor (IL-6R) [
3,
9]. It is possible that treatment response would be better predicted by combinations of biomarkers and clinical characteristics [
3,
8,
10‐
13]. However, the multitude of potential clinical and biomarker-based predictors, in combination with their thresholds, and inherent constraints on clinical availability, poses a significant conceptual and computational challenge.
Artificial intelligence techniques such as machine learning are increasingly being used to identify individuals at risk for disease, predict outcome, and optimize treatments [
4,
14,
15]. In machine learning, computers apply hypothesis-free algorithms that enable development of data-based mathematical models [
4]. To develop machine learning models, a randomly selected subset of data such as that obtained from patients in clinical trials is used to select, among a predefined set of parameters (e.g., clinical or blood biomarkers), those factors that are associated with a certain predefined outcome (e.g., ACR20). Once the parameters are set so that the error in predicting the outcome is minimized, those parameter values (i.e., the “rule”) are validated using the remaining data, or new external data sources [
4]. This approach allows the identification of hidden patterns and rules in large datasets, while reducing the risk of overfitting, and having to correctly specify hypotheses a priori [
4]. Machine learning has been applied to electronic health records to prognosticate RA disease activity [
14,
16] and to define disease phenotypes in RA [
17]. However, to the best of our knowledge, it has not yet yielded a robust, clinically feasible rule that would predict treatment response to biologic therapies in patients with RA.
Sarilumab is a human monoclonal antibody to IL-6R approved for the treatment of moderate-to-severe RA [
18,
19]. As with other bDMARDs, the characteristics of patients most likely to benefit from sarilumab treatment remain poorly understood. In this post hoc analysis, we used machine learning to identify a simple and clinically feasible rule that could predict favorable response to sarilumab, and in one trial, an incremental response compared with adalimumab.
Discussion
In this study, we used machine learning to identify a combination of baseline patient characteristics to predict treatment response to sarilumab and adalimumab. The method found that the presence of anti-CCP antibody and CRP level at a selected cutpoint of > 12.3 mg/l were predictive of a better response to sarilumab, and in one trial where adalimumab data were also available, predicted an incrementally larger response to adalimumab. This approach could facilitate choice of treatment in patients with RA.
Our algorithm identified a simple, clinically applicable rule that considered the large number of combinatorial possibilities between 42 variables and their values or thresholds. Therefore, our study demonstrates the potential of machine learning as a tool for systemic, fast, and deep analysis of the data that can yield rules applicable in clinical practice.
Previously, anti-CCP has been identified as a predictor of response to rituximab and abatacept, and high CRP has been identified as a predictor of response to TNF inhibitors and tocilizumab [
26‐
29]. Our study confirms the predictive potential of the combined presence of these two parameters with data from four independent studies. Of note, biologically plausible parameters that have been identified as predictors of response to sarilumab, such as IL-6 concentration [
30], were not included in the rule. This is not unexpected: in patients with RA, IL-6 and CRP levels are highly correlated [
31,
32] and machine learning algorithms, which approach data in a non-biased fashion, are set to prefer one of the correlated parameters based on its ability to maximize the predefined outcome (in our case, ACR20). CRP was probably selected because IL-6 varies more between individuals [
33] and has a more variable diurnal profile than CRP [
34]. The more stable levels of CRP would make it a preferred choice as predictor of response, especially if there was a single biomarker measurement by visit
.
It can be argued that the use of a composite endpoint such as ACR response, which includes the acute phase reactants CRP or ESR, may have influenced the algorithm to select CRP > 12.3 mg/l as one of the components of the rule for an IL-6R inhibitor. However, the rule also predicted CRP-independent endpoints, such as the CDAI and HAQ-DI. ACR scores are based on relative changes and, therefore, unaffected by a potential selection bias. For the other scores associated with low disease activity and remission (e.g., DAS28-CRP remission and LDA), where fixed, relatively low CRP thresholds are required, a selection of high CRP baseline values rather increases the necessary treatment response to achieve these thresholds.
Among the decision tree methods we considered, the GUIDE algorithm was the only one that provided a simple, clinically feasible rule. It also showed the highest precision and competitive accuracy, compared with other methods assessed, as well as a higher transparency and better interpretability, albeit with lower recall.
Since the algorithm was selecting responders regardless of treatment during the model training, patients treated with placebo (MOBILITY and TARGET) and adalimumab (MONARCH) were important controls during the testing phase. We found that rule-positive patients had higher levels of baseline factors associated with poor prognosis and a reduced response to adalimumab treatment or placebo, compared with rule-negative patients. In MOBILITY, the predictive power of the rule was greater for the placebo-adjusted than for the non-adjusted response. The absolute disease state such as CDAI or DAS28-CRP remission/LDA had a relatively low prevalence in these data, and placebo adjusting increases that prevalence, thus improving the performance of the rule. In addition, in settings with active instead of placebo control (e.g., MONARCH trial), our data suggest that the choice of treatment can be improved significantly using this methodology. We explored a clinical scenario and based on the results of the MONARCH trial, sarilumab was clearly favored in rule-positive patients, whereas rule-negative patients could be treated with either adalimumab or sarilumab, based on other priorities (e.g., patient preferences, erosion score, cost).
As noted in the Results section, the rule applied less consistently to patients from TARGET, who had poor tolerance for, or an inadequate response to, TNF inhibitors. Patients with RA who have failed treatment with one drug class are generally less likely to respond to subsequent treatments [
26], which may account for some of the inconsistent rule applicability observed in our analysis. However, the overall percentages of TARGET patients achieving remission or low disease activity endpoints were particularly low, making it difficult to demonstrate differences between rule-positive and rule-negative patients in these disease scores.
A less consistent verification to TARGET data suggests that the rule has limits in generalizability such that the rule may not apply to patients who had inadequate response to TNFi. Also, since all data in the training and validation phases came from randomized, controlled clinical trials, which used stringent enrollment criteria, the rule may not apply to a real-world population of patients with RA in the same way. For example, all patients in these trials had to have elevated CRP, and selection of this variable by the machine learning algorithm as an important variable, as well as the exact CRP cutoff value chosen, may have been influenced by the cutoffs required by trials’ inclusion criteria. In addition, radiographic endpoints were only available in the MOBILITY trial, and with the lack of further validation data we excluded this important assessment from the model training. Inclusion of radiographic scores in the rule may be an interesting variable to further increase the accuracy in patient stratification, albeit at the expense of simplicity. Finally, the number of patients in the training set was relatively small. Using a larger set may have resulted in an even more robust rule.
Acknowledgements
The authors and Sanofi thank the patients for their participation in the trials, as well as the MOBILITY, MONARCH, TARGET, and ASCERTAIN Steering Committees and Investigators.
Disclosures
Markus Rehberg, Clemens Giegerich, Amy Praestgaard, Hubert van Hoogstraten, Melitza Iglesias-Rodriguez: Employees of Sanofi and may hold stock and/or stock options in the company. Jeffrey R. Curtis: Research grants and consulting fees (AbbVie, Amgen, Bristol-Myers Squibb, Corrona, Janssen, Lilly, Myriad, Pfizer, Regeneron, Roche, and UCB). Jacques-Eric Gottenberg: Research grants (Bristol-Myers Squibb, Pfizer, Roche); speaking/consulting fees (AbbVie, Bristol-Myers Squibb, Gilead, Janssen, Lilly, MSD, Pfizer, Roche, UCB). Andreas Schwarting: Grant/research support and honoraria from GSK, Pfizer, Janssen, Roche, BMS, MSD. Santos Castañeda: Grants/research support from MSD and Pfizer, consultation fees/participation in company-sponsored speaker’s bureau from Amgen, Celgene, Lilly, MSD, Pfizer, Sanofi, SOBI, and travel aids for congresses from BMS, MSD, Lilly, Pfizer, Roche. Andrea Rubbert-Roth: Speakers’ bureaus for AbbVie, Bristol-Myers Squibb, Chugai, Hexal/Novartis, Janssen, Lilly, Merck Sharp & Dohme, Pfizer, Roche, and Sanofi, and has provided consultancy for Chugai, Lilly, Roche, and Sanofi. Ernest HS Choy: Received research grants and/or consulting fees from AbbVie, Amgen, AstraZeneca, Biogen, Bio-Cancer, Boehringer Ingelheim, Bristol-Myers Squibb, Celgene, Chelsea Therapeutics, Chugai Pharma, Daiichi Sankyo, Eli Lilly, Ferring Pharmaceuticals, GlaxoSmithKline, Hospita, Ionis, Janssen, Jazz Pharmaceuticals, MedImmune, Merck Sharp & Dohme, Merrimack Pharmaceutical, Napp, Novartis, Novimmune, ObsEva, Pfizer, R-Pharm, Regeneron Pharmaceuticals, Inc., Roche, Sanofi, SynAct Pharma, Tonix and UCB, and participated in speakers’ bureaus for Amgen, Boehringer Ingelheim, Bristol-Myers Squibb, Chugai Pharma, Eli Lilly, Hospira, Merck Sharp & Dohme, Novartis, Pfizer, Regeneron Pharmaceuticals, Inc., Roche, Sanofi-Aventis, and UCB.