Background
Biologic disease-modifying antirheumatic drugs (bDMARDs) play a pivotal role in the treatment of various rheumatologic diseases, such as rheumatoid arthritis (RA) and ankylosing spondylitis (AS), particularly those resistant to conventional synthetic disease-modifying rheumatic drugs (csDMARDs). However, approximately 30% and 20% of RA [
1,
2] and AS [
3‐
5] patients, respectively, do not respond well to their initial bDMARD therapy. A few months are required to determine the efficacy of the medications. Non-responders could experience high drug costs, unimproved disease conditions, and side effects during this period [
6‐
9]. Therefore, methods to predict the responses prior to the start of bDMARDs is garnering substantial interest.
Several studies simply identified and compared clinical factors, such as sex, age, disease duration, and disease activity, in both diseases to influence the treatment responses of bDMARDs [
10,
11], rather than making a predictive model. Because the relationship between clinical variables and phenotypes is complex, machine learning methods outperform conventional statistical models in predicting clinical outcomes in various circumstances [
12‐
15]. Recently, the use of machine learning to predict anti-tumor necrosis factor (TNFi) drug responses in RA patients has been published [
16], based on the largest data obtained among machine learning studies conducted to date in RA. However, the study did not include much about feature importance analysis. In the case of AS, although machine learning to predict early TNFi users was conducted previously [
17], no machine learning model has been developed to predict the responses of bDMARDs.
This study aims to examine whether machine learning can better predict the treatment responses of bDMARDs than conventional statistical methods. In addition, this study aims to identify important clinical factors that affect the treatment responses of bDMARDs through machine learning. Machine learning models including random forest (RF-method), extreme gradient boosting (XGBoost), artificial neural network (ANN), and support vector machine (SVM), are presented to predict bDMARD responses in patients with RA and AS, respectively. The prediction performances between machine learning methods, as well as with a conventional statistical method, which is logistic regression, were compared. Next, feature importance analysis was performed with the generated machine learning models to delineate the factors that are important in training models.
Discussion
Various machine learning models were presented to classify the treatment responses of bDMARDs in RA and AS patients. In RA patients, RF-method was the most suitable method to predict treatment responses more accurately than the conventional statistical method, which is logistic regression. However, machine learning models to predict treatment responses of biologic agents in AS patients are not superior in contrast to RA. According to the feature importance analysis, patient self-reporting scales were the most important input features in both diseases. Only a few previous studies have been published to predict treatment responses to biologic agents in RA patients [
16,
34]. However, the present study includes a more detailed feature importance analysis than previous studies. Furthermore, this is the first attempt to predict the treatment responses of bDMARDs in AS patients.
We implemented various machine learning methods to predict treatment responses, including RF-method, XGBoost, ANN, and SVM. Both RF-method and XGBoost are ensemble models that consist of numerous small decision trees. RF-method is based on a bagging algorithm, and XGBoost is based on a gradient boosting algorithm. Although SVM is relatively older, it exhibits a satisfactory performance in simple image classification with little computational burden. ANNs are gradually gaining popularity as they obtain successful results in various fields, such as image classification. However, decision-tree-based algorithms show better performance in certain circumstances, such as small, tabular data [
35]. RF-method showed better prediction performance than ANN in RA patients in this study. In addition, the optimal ANN prediction model had only one (RA) or two (AS) hidden layers, which are too shallow to obtain the advantage of ANN. Therefore, our input data seemed unsuitable for the ANN. This could be because of the relatively small size of the input data.
RF-method showed better prediction performance than logistic regression in patients with RA but not in those with AS. In addition, the prediction performance of the various models was lower in AS patients. Determining the exact reason requires further research and is beyond the scope of this study, although some speculation can be made. The number of data points was slightly smaller in AS; however, the difference was only 5–10% of all patients. The number of input features of the AS was higher than that of the RA. RA had a more unbalanced responder/non-responder proportion, which generally had a negative effect on machine learning results. Thus, the differences in the prediction performance were unlikely because of the structure of the input dataset. If so, we could assume that the input features were insufficient to predict the treatment response of bDMARDs in patients with AS. Heritability analysis implied that AS has more genetic factors than RA, with higher heritability of approximately 80–90% [
36‐
39] vs. 50–60% [
40,
41] in AS and RA, respectively. Previous studies have shown that genetic features could affect the response of bDMARDs in patients with AS [
42,
43]. In addition, there have been pilot studies of transcriptome analysis [
44,
45] to predict the responses of bDMARDs in patients with AS. Therefore, multi-omics data, including genetics and transcriptomics, may improve prediction performance.
Feature importance analysis can provide insights into clinical factors. In this study, machine learning models revealed that the patient self-reporting scales, PtGA and BASFI in RA and AS patients, respectively, were the most important factors for predicting treatment responses. It is quite surprising because they are more important than more objective clinical features, such as laboratory results (ESR and CRP) and physical examination (SJC and TJC). Previous studies reported patient self-reporting scales, such as RAPID3 [
46] or BASFI [
47] as predictors of bDMARD treatment. However, their relative importance compared with other objective disease activities or functional measures has not been studied. In addition, given that the results of feature importance were similar except for ANN in RA patients, which had inferior performances, the result of the feature important analysis was robust.
The prediction models were trained for each medication use separately. However, the performance of prediction models using RF-method was not superior to that of logistic regression models in each medication dataset. Only the prediction model of adalimumab users in patients with AS using RF-method had a borderline superior result to the logistic regression model. The results of the feature importance analysis for each medication user were not consistent. Again, only the model of adalimumab patients in patients with AS showed similar results to the entire cohort in the feature importance analysis. Adalimumab users in patients with AS occupied the largest patient group with 253 individuals, while the other cohorts comprised less than 200 patients. Therefore, the size of the patient group must be an important factor in generating a proper predictive model, and approximately 250 people could be the lower limit of size.
However, our approach had some limitations. First, even though we divided part of the dataset by the region of hospitals as an independent test dataset and did not participate in any part of the training machine learning model, the validation cohort was not retrieved from a completely different cohort. However, forty-five hospitals were involved in the KOBIO cohorts, and each hospital had an independent enrollment process, assessment physician, and laboratory institution. Thus, we expect that pre-divided test dataset represents an independent cohort. Second, all participants were Koreans, therefore we do not assure that the models we generated showed similar results in other populations. When applied to other populations, new patient data or feature selection may be required in advance.
Conclusions
In conclusion, we developed several machine learning models that could predict the treatment responses of biologic agents in patients with RA and AS. The best-performing model was trained using RF-method in patients with RA. The model performs better than the conventional statistical method, logistic regression. Given the input clinical features, machine learning models have no advantages compared to a logistic regression model in patients with AS. Feature importance analysis shows that patient self-reporting scales, PtGA and BASFI in RA and AS patients, respectively, are the most important input features for machine learning prediction models.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.