Published online Oct 01, 2013.
https://doi.org/10.3349/ymj.2013.54.6.1321
Osteoporosis Risk Prediction for Bone Mineral Density Assessment of Postmenopausal Women Using Machine Learning
Abstract
Purpose
A number of clinical decision tools for osteoporosis risk assessment have been developed to select postmenopausal women for the measurement of bone mineral density. We developed and validated machine learning models with the aim of more accurately identifying the risk of osteoporosis in postmenopausal women compared to the ability of conventional clinical decision tools.
Materials and Methods
We collected medical records from Korean postmenopausal women based on the Korea National Health and Nutrition Examination Surveys. The training data set was used to construct models based on popular machine learning algorithms such as support vector machines (SVM), random forests, artificial neural networks (ANN), and logistic regression (LR) based on simple surveys. The machine learning models were compared to four conventional clinical decision tools: osteoporosis self-assessment tool (OST), osteoporosis risk assessment instrument (ORAI), simple calculated osteoporosis risk estimation (SCORE), and osteoporosis index of risk (OSIRIS).
Results
SVM had significantly better area under the curve (AUC) of the receiver operating characteristic than ANN, LR, OST, ORAI, SCORE, and OSIRIS for the training set. SVM predicted osteoporosis risk with an AUC of 0.827, accuracy of 76.7%, sensitivity of 77.8%, and specificity of 76.0% at total hip, femoral neck, or lumbar spine for the testing set. The significant factors selected by SVM were age, height, weight, body mass index, duration of menopause, duration of breast feeding, estrogen therapy, hyperlipidemia, hypertension, osteoarthritis, and diabetes mellitus.
Conclusion
Considering various predictors associated with low bone density, the machine learning methods may be effective tools for identifying postmenopausal women at high risk for osteoporosis.
INTRODUCTION
Fracture due to osteoporosis is one of the major factors of disability and death in elderly persons.1 Osteoporosis is common in postmenopausal women but is asympstructomatic until a fracture occurs. The World Health Organization (WHO) estimates that 30% of all postmenopausal women have osteoporosis, which is defined as bone mineral density (BMD) 2.5 standard deviations below the young healthy adult mean (T-score ≤-2.5).2 Dual X-ray absorptionmetry (DEXA) of total hip, femoral neck, and lumbar spine is the most widely used tool for diagnosing osteoporosis. However, mass screening using DEXA is not widely recommended as it is a high-cost method of evaluating BMD.3 Current research shows that too few DEXA scans are obtained among high-risk patients,4 while too many DEXA scans are obtained among low-risk postmenopausal women.5
Although the WHO provides FRAX® on their website, which was developed for fracture risk assessment, recent studies show that FRAX® does not have a better sensitivity for fracture prediction than low BMD (T-score ≤-2.5).6 Some reports and guidelines have proposed that women over the age of 65 years should be screened by DEXA.5 However, the diagnosis rate has been reported to be lower than one-third among postmenopausal women in Korea.7 The prevalence of osteoporosis is high in Korea compared to Western countries.8 Moreover, Koreans are increasingly at high risk of osteoporosis due to a deficiency of vitamin D, nutritional imbalance, and lifestyle factors.9 Therefore, an effective prescreening tool is necessary for Korean postmenopausal women to increase the possibility of early treatment.
The risk factors of osteoporosis are well-known and include history of fracture, older age, low body weight, estrogen deficiency at an early age, low calcium intake, and vitamin D deficiency.10 There has been a great deal of research assessing the combination of risk factors that would be of most help to physicians. A number of epidemiological studies have developed clinical decision tools for osteoporosis risk assessment to select postmenopausal women for the measurement of BMD. The purpose of these clinical decision tools is to help estimate the risk for osteoporosis, not to diagnose osteoporosis.
The osteoporosis self-assessment tool (OST) is a simple formula based on age and body weight.11 Although OST uses only two factors to predict osteoporosis risk, it has been shown to have good sensitivity with an appropriate cutoff value.12 The osteoporosis risk assessment instrument (ORAI), simple calculated osteoporosis risk estimation (SCORE), and osteoporosis index of risk (OSIRIS) are more complex decision tools using other risk factors.13 They include not only age and body weight, but also estrogen therapy, history of fracture, and rheumatoid arthritis. However, ORAI, SCORE, and OSIRIS have not shown significantly better performance than OST in predicting osteoporosis risk.13 All of these decision tools have the limitation of low accuracy for clinical use.5 In recent years, new additional risk factors of osteoporosis have been investigated based on individual conditions and risk profile for osteoporosis to enhance sensitivity and specificity.14
Machine learning is an area of artificial intelligence research which uses statistical methods for data classification. Several machine learning techniques have been applied in clinical settings to predict disease and have shown higher accuracy for diagnosis than classical methods.15 These mathematical algorithms have the ability to classify large amounts of data into a useful format.16 The classifiers take the medical data of each patient and predict the presence of diseases based on underlying patterns. Support vector machines (SVM), random forests (RF), and artificial neural networks (ANN) have been widely used approaches in machine learning.15 They are the most frequently used supervised learning methods for analyzing complex medical data.
The SVM is based on mapping data to a higher dimensional space through a kernel function, and choosing the maximum-margin hyper-plane that separates training data.17 Thus, the goal of the SVM is to improve accuracy by the optimization of space separation. RF grows many classification trees built from a random subset of predictors and bootstrap samples.18 RF can deal with high dimensional data in training faster than other methods. ANN comprises several layers and connections which mimic biological neural networks to construct complex classifiers.19 ANN has been applied to many problems of non-linear pattern classification. Logistic regression (LR) is another machine learning technique. LR is the gold standard method for analyzing binary medical data because it provides not only a predictive result, but also yields additional information such as a diagnostic odds ratio.20 SVM, RF, ANN, and LR are the models of choice in many tasks in medicine and bioinformatics for selecting informative variables or genes and predicting diseases more accurately.
Several studies have shown that SVM, RF, and ANN could help predict low BMD using diet and lifestyle habit data.21-24 Although these studies considered risk factors, they did not select informative variables that could contribute to osteoporosis. Moreover, previous studies had no objective comparisons of the performance of osteoporosis prediction developed by epidemiological data among the machine learning methods and clinical decision tools. Therefore, a structural design is needed for constructing the models along with a comparative study of various analytical methods for predicting osteoporosis risk.
In this study, we developed and validated machine learning models with the aim of identifying the risk of osteoporosis in postmenopausal women. The objective of this study was to select patients who were candidates for DEXA in order to increase the effectiveness of screening for osteoporosis. We developed the prediction models for osteoporosis using various machine learning methods including SVM, RF, ANN, and LR. The performance of machine learning methods and conventional clinical decision making tools including OST, ORAI, SCORE, and OSIRIS was compared in respect to accuracy and area under the curve (AUC) of the receiver operating characteristic (ROC).
MATERIALS AND METHODS
Data source
We collected data from Korean postmenopausal women, based on the Korea National Health and Nutrition Examination Surveys (KNHANES V-1) conducted in 2010.25 The KNHANES V-1 was a cross-sectional survey conducted by the Division of Chronic Disease Surveillance, Korea Centers for Disease Control and Prevention. The survey is divided into a health interview survey, a nutrition survey, and a health examination survey. Each data set contains BMD measurements at total hip, femoral neck, and lumbar spine as well as medical characteristics. BMD was measured by DEXA using Hologic Discovery (Hologic Inc., Bedford, MA, USA). Patients who were determined to have postmenopausal status were included in this study. We categorized the postmenopausal women into a control group and an osteoporotic group with low BMD (T-score ≤-2.5) at any site among total hip, femoral neck, or lumbar spine measurements. There were several modifications for data analysis. If an answer for a question in the KNHANES V-1 was 'don't know,' we regarded it as missing data and estimated the answer using a nearest neighbor algorithm.26 This algorithm found the most similar samples to the real values present to estimate the missing values. The KNHANES received ethical approval by Institutional Review Board of Korea Centers for Disease Control and Prevention (IRB No: 2010-02CON-21-C).
Data analysis
The data were separated randomly into two independent data sets: training and testing sets. The training set, comprised of 60% (1000 patients) of the entire dataset, was used to construct models based on SVM, RF, ANN, and LR. The scores of the clinical decision tools for screening osteoporosis including OST, ORAI, SCORE, and OSIRIS were calculated according to each formula. These four conventional clinical decision tools are the most widely used indices for predicting osteoporosis risk.12 Because the KNHANES V-1 did not have specific information concerning fracture type but did indicate simple fracture histories at various sites, the fracture histories were used for the scoring of non-traumatic fracture in SCORE and the history of low impact fracture in OSIRIS. The prediction models were internally validated using 10-fold cross validation.27 We designed the 10-fold cross validation not only to assess performance, but also to optimize prediction models using machine learning techniques. We used 10-fold cross validation on the training set, and the performance was measured on the testing set. The testing set, comprised of 40% (674 patients) of the entire dataset, was used to assess ability to predict osteoporosis in postmenopausal women.
Model selection and validation
We used the 10-fold cross validation scheme to construct machine learning models. The purpose of the machine learning models was to predict osteoporosis risk using the health interview surveys concerning demographic characteristics and past histories listed in Table 1. Due to high dimensionality, variable selection was a necessary technique to make an effective prediction model and to improve prediction performance.28 We also obtained an insight into factors related to osteoporosis through the variables that were entered into the classifiers. Eighty-one variables in the data of the characteristics including alcohol, smoking, stress status, and physical activity were initially selected to design the model to predict osteoporosis risk. We adopted a feature selection method of wrapper-based feature subset evaluation for SVM, RF, and ANN,15, 29 and also determined the order of the variables with the embedded method of each machine learning method and decreased the number of variables to determine the best subset using backward elimination.28 The remaining features that indicated the highest accuracy in 10-fold cross validation were the selected subset for prediction. For LR, we used the backward stepwise method for variable selection.
Table 1
Characteristics of Postmenopausal Women
Data sets in this study were class-imbalanced because the control group contained significantly more samples than the osteoporotic group. Applying a classifier to the imbalanced data could produce undesirable lower performance.30 Therefore, it was important to improve prediction models for the imbalanced data. To obtain the optimal result, we adopted a grid search in which a range of parameter values were tested using the 10-fold cross validation strategy. Due to the imbalanced data problem in this study, prediction accuracy might not be a good criterion for assessing performance since the minor class has less influence on accuracy than the major class.31 Therefore, we evaluated diagnostic abilities including not only accuracy, sensitivity, and specificity, but also AUC. The AUC is known as a strong predictor of performance, especially with regard to imbalanced problems.30 To compare the performance of models, we generated the ROC curves and selected cut-off points as the points on the ROC curve closest to the upper left corner. This method maximized the Youden's index, giving equal weight to sensitivity and specificity.32 ROC curve analysis is the most commonly used method in clinical analysis for establishing the optimal cut-off point. The cut-offs of the OST, ORAI, SCORE, and OSIRIS were calculated using ROC curve analysis. To discriminate osteoporosis, the following cut-offs were used: <-1 for OST, >16 for ORAI, >15 for SCORE, and <-1 for OSIRIS, respectively. We used MATLAB 2010a (Mathworks Inc., Natick, MA, USA) for the analysis of machine learning, SPSS 18.0 (SPSS Inc., Chicago, IL, USA) for LR and statistical analysis, and MedCalc 12.3 (MedCalc Inc., Mariakerke, Belgium) for ROC analysis.
RESULTS
Five hundred eighty-three (34.8%) of 1674 postmenopausal women had combined osteoporosis at any site including total hip, femoral neck, or lumbar spine. Among 583 women with osteoporosis, 95 had osteoporosis at the total hip, 331 at the femoral neck, and 473 at the lumbar spine. Table 1 shows the characteristics of postmenopausal women categorized by the presence of osteoporosis. By comparison with women in the control group, women with osteoporosis were of higher age and lower height, weight, body mass index, and waist circumference. Women with osteoporosis were also less likely to take estrogen therapy. The number of pregnancies, duration of menopause, and duration of breast feeding were higher in the osteoporotic group. Women in this group were more likely to have hypertension and history of fracture, and less likely to have hyperlipidemia.
Table 2 describes the final multivariate LR derived from the training set using backward selection. Variables selected by LR were age, weight, duration of menopause, diabetes mellitus, hyperlipidemia, and osteoarthritis. Table 3 summarizes the results of variable selection for the various machine learning and conventional methods. While the conventional methods selected two to five variables to obtain simplicity, the machine learning methods except LR selected more than 10 variables for better performance. The predictors of osteoporosis selected by SVM included age, height, weight, body mass index, duration of menopause, duration of breast feeding, estrogen therapy, hyperlipidemia, hypertension, osteoarthritis, and diabetes mellitus. RF and ANN showed similar results. The optimal model of SVM was found using a Gaussian kernel function with a penalty parameter C of 100 and scaling factor σ of 30. In RF, the optimal number of trees was 100, and the number of predictors for each node was 3. The optimal ANN was set with 3 nodes of a hidden layer and learning rate of 0.1.
Table 2
Odds Ratios for Predicting Osteoporosis Risk Using the Multivariate Logistic Regression with Backward Selection Models
Table 3
Variable Selection in Machine Learning and Conventional Methods for Osteoporosis Risk of Total Hip, Femoral Neck, or Lumbar Spine
Fig. 1 shows the prediction performance of 10-fold cross validation of the machine learning and conventional methods. The mean and standard deviation of AUCs were calculated from 10 validation results. We obtained the AUCs of SVM, RF, ANN and LR of 0.822, 0.808, 0.794, and 0.793, respectively. The predictors of the machine learning methods came from selected variables shown in Table 3. The AUCs of OST, ORAI, SCORE, and OSIRIS were 0.794, 0.791, 0.766, and 0.787, respectively. In Fig. 1, we found that more complex discriminating functions such as SVM and RF showed better performance than simple linear functions such as LR, OST, ORAI, SCORE, and OSIRIS. For the AUCs, the SVM performed better than ANN (p=0.028), LR (p=0.037), OST (p=0.037), ORAI (p=0.022), SCORE (p=0.005), and OSIRIS (p=0.009) in 10-fold cross validation using a Wilcoxon signed rank test.
Fig. 1
Performance results (AUC) of the machine learning and conventional methods using 10-fold cross validation. Error bars indicate the standard deviation of the mean. AUC, area under the curve; SVM, support vector machines; RF, random forests; ANN, artificial neural networks; LR, logistic regression; OST, osteoporosis self-assessment tool; ORAI, osteoporosis risk assessment instrument; SCORE, simple calculated osteoporosis risk estimation; OSIRIS, osteoporosis index of risk.
Additionally, to assess the ability of the models for predicting osteoporosis, we applied our methods to a testing set composed of the independent data. Table 4 shows the results of classifying the testing set for selecting women at risk of osteoporosis at various BMD measurement sites. As a result, the SVM model was the best discriminator between controls and women with osteoporosis. Considering osteoporosis at any site, SVM predicted osteoporosis risk with an AUC of 0.827, accuracy of 76.7%, sensitivity of 77.8%, and specificity of 76.0%. SVM also predicted osteoporosis at total hip, femoral neck and lumbar spine with the highest AUC and accuracy. SVM predicted osteoporosis at the total hip with an AUC of 0.921. On the other hand, osteoporosis at the lumbar spine was difficult to predict, with an AUC of 0.778. The ROC analysis demonstrated the SVM had the statistically significant better accuracy than OST at total hip, femoral neck, and lumbar spine. Fig. 2 shows the ROC curves of SVM, LR, and OST in predicting osteoporosis at any site. Because SVM and OST had the highest AUC among the machine learning methods and conventional methods, respectively, we compared their ROC curves. LR was also included for comparison with SVM and OST. The AUCs of SVM, LR, and OST were 0.827, 0.809, and 0.806, respectively (Table 4).
Fig. 2
Receiver operating characteristic curves (ROC) of support vector machines (SVM), logistic regression (LR), and osteoporosis self-assessment tool (OST) in predicting osteoporosis risk at any site among total hip, femoral neck, or lumbar spine.
Table 4
Diagnostic Performance of Osteoporosis Risk Assessment Methods for the Testing Set
DISCUSSION
Based on the various machine learning techniques, we investigated a new approach for predicting osteoporosis risk in postmenopausal women using data from the KNHANES V-1. To our best knowledge, this is the first report on application of conventional decision tools for BMD assessment in a Korean population. Among the machine learning and conventional methods, our SVM model discriminated more accurately between women with osteoporosis and control women. The 10-fold cross validation and ROC analysis indicated that the SVM had a statistically significant improvement in predicting osteoporosis. In other words, SVM was more effective in analyzing the epidemiological underlying patterns of osteoporosis compared with the other methods. This finding is consistent with a previous study on the comparison of machine learning methods in various complex discriminating problems for predicting disease.33 Most experts have used conventional methods, including OST, ORAI, SCORE, and OSIRIS, because of their simplicity.34 In this study, we applied complicated mathematical methods for more accurate prediction. Despite its complexity, our method is useful because computerized diagnostic decision supports have been increasingly easy to access due to the advancement of information systems for many medical problems.35
If our prediction model retains good performance after validation in a larger population, it will be possible to use this technique as a cost-effective prescreening tool to determine candidates for evaluation with DEXA and also to prevent osteoporotic fracture in postmenopausal women at high risk. The patients in the high risk group categorized by this method should receive DEXA screening at the hospital. However, patients in the low risk group could postpone receiving a DEXA scan. Women experience menopause at 50 years old on average.36 Accordingly, when we regard the Korean women who are over 50 years old as potential menopausal population, menopausal women account for 31.8% of all women in Korea. The 31.8% corresponds to approximately 8.5 million according to the Korean National Statistical Office 2010. Although our SVM showed small improvement of 2.7% in accuracy compared to OST, the 2.7% corresponds to approximately 230000 women, which are not small population.
Our proposed SVM model included age, height, weight, body mass index, duration of menopause, duration of breast feeding, estrogen therapy, hyperlipidemia, hypertension, osteoarthritis, and diabetes mellitus as predictors (Table 3). Similar to earlier studies concerning prediction for osteoporosis, our results suggest that age is most closely associated with the development of osteoporosis, and weight is also an important factor. However, our findings also demonstrated different factors involved in osteoporosis such as height, duration of menopause, duration of breast feeding, and presence of chronic diseases such as hyperlipidemia, hypertension, osteoarthritis, and diabetes mellitus. Although many clinical studies have shown that height correlates with vertebral bone fractures, there has been no study using height as a predictor for osteoporosis at the total hip or femoral neck.37 In our study, height showed a significant association with osteoporosis at any site and could enhance a decision tool for osteoporosis risk assessment. Estrogen therapy, duration of menopause, and duration of breast feeding were selected due to their association with exposure to estrogen in the endocrine system. Traditionally, estrogen is thought to be the most important factor in bone growth and maturation in women.38 Breast feeding might cause low BMD due to the removal of estrogen through breast milk and reduction of the cumulative exposure to estrogen.39
Our SVM model also used hyperlipidemia, hypertension, osteoarthritis, and diabetes mellitus for more accurate prediction. Hyperlipidemia and diabetes mellitus were also selected in LR with significant odds ratios in Table 2, although they would be counterintuitive to many clinicians. There have been several studies indicating the influence of chronic diseases on the BMD of patients.14, 40, 41 However, these results also showed that the effect of chronic diseases on BMD was minor. Our prediction model was able to consider these chronic diseases in combination using a SVM characterized by nonlinearity and high dimension. Because the SVM model delicately handled a separating space composed of these factors in high dimension, it was possible to consider all factors for the improvement of sensitivity and specificity in predicting osteoporosis.
We found in the present study that the optimal SVM adopted a penalty parameter C of 100 and scaling factor σ of 30 for osteoporosis risk prediction. The parameter C controlled over-fitting and the σ controlled the degree of nonlinearity of the SVM model.42 Our SVM had the similar parameters to the several previous studies.33, 43 However, most studies obtained the optimal parameters empirically, and did not show the specific optimal parameters.15, 21, 42 Since there is no clear guideline on selecting the most effective parameters for certain classification problems,44 our results may be a guide to future epidemiologic researches for predicting osteoporosis.
There are several limitations to this study. First, the study was based on a cross-sectional survey which has several defects according to a medical view. For example, the prevalence of disease was based on a health interview survey taken on one occasion. Weight, height, body mass index, and hormone therapy status, as well as BMD, could differ according to time of the measurement. Second, it was difficult to consider drug effects. For example, treatment with steroids for rheumatoid arthritis, asthma, dermatitis, or autoimmune diseases is known to cause glucocorticoid-induced osteoporosis.45 Systematic approaches are warranted to consider the long-term effects of drugs in further studies. Third, our prediction models were developed for Korean women. Several studies have indicated the possibility that Korean people have physiologic differences in bone metabolism.46, 47 To validate our findings, further study is needed including large heterogeneous samples. Fourth, our study was characterized by imbalanced class distribution. Traditional classifiers have generally shown poor performance on imbalanced data sets because they are designed to find the best classification for the majority.31 The imbalanced class was still critical even though we adopted a dense grid search to decide the optimal prediction models in order to overcome this problem. Therefore, if more patient data associated with osteoporosis were collected, the performance would be expected to improve.
In conclusion, the most important finding of this study is the identification of postmenopausal women at high risk of osteoporosis to increase the possibility of appropriate treatment before fracture occurs. Machine learning methods might contribute to the advancement of clinical decision tools and understanding about the risk factors for osteoporosis. Further studies should be targeted at constructing an extended prediction model for progressive osteoporosis through the collection of prospective data, and the simultaneous prediction of osteopenia and osteoporosis using multi-category classification. We hope that this study enables women to reduce the risk of osteoporosis, which is the major cause of fracture.
The authors have no financial conflicts of interest.
ACKNOWLEDGEMENTS
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (NRF-2012R1A2A2A03045612).
References
-
Trémollieres FA, Pouillès JM, Drewniak N, Laparra J, Ribot CA, Dargent-Molina P. Fracture risk prediction using BMD and clinical risk factors in early postmenopausal women: sensitivity of the WHO FRAX tool. J Bone Miner Res 2010;25:1002–1009.
-
-
Choi YJ, Oh HJ, Kim DJ, Lee Y, Chung YS. The prevalence of osteoporosis in Korean adults aged 50 years or older and the higher diagnosis rates in women who were beneficiaries of a national screening program: the Korea National Health and Nutrition Examination Survey 2008-2009. J Bone Miner Res 2012;27:1879–1886.
-
-
Larrañaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, et al. Machine learning in bioinformatics. Brief Bioinform 2006;7:86–112.
-
-
Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995;20:273–297.
-
-
Breiman L. Random forests. Mach Learn 2001;45:5–32.
-
-
Bishop CM. In: Pattern Recognition and Machine Learning. 4th ed. New York: Springer; 2006.
-
-
Ordóñez C, Matías JM, de Cos Juez JF, García PJ. Machine learning techniques applied to the determination of osteoporosis incidence in post-menopausal women. Math Comput Model 2009;50:673–679.
-
-
De Cos Juez FJ, Suárez-Suárez MA, Sánchez Lasheras F, Murcia-Mazón A. Application of neural networks to the study of the influence of diet and lifestyle on the value of bone mineral density in post-menopausal women. Math Comput Model 2011;54:1665–1670.
-
-
Walid M, Ahmad S, Fadi C, Dima R. Intelligent predictive osteoporosis system. Int J Comput Appl 2011;32:28–30.
-
-
Ministry for Health, Welfare and Family Affairs. The Fifth Korea National Health and Nutrition Examination Survey (KNHANES V), 2010-Health Examination. [accessed on 2012, December 26].Available at: http://knhanes.cdc.go.kr/.
-
-
Batista GEAPA, Monard MC. An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 2003;17:519–533.
-
-
Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI 1995;14:1137–1143.
-
-
Dash M, Liu H. Consistency-based search in feature selection. Artif Intell 2003;151:155–176.
-
-
Kotsiantis S, Kanellopoulos D, Pintelas P. Handling imbalanced datasets: a review. GESTS Int T Comput Sci Eng 2006;30:25–36.
-
-
Sun Y, Kamel MS, Wong AKC, Wang Y. Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 2007;40:3358–3378.
-
-
Yeun EJ. A study on the health promoting lifestyle practices of middle-aged women in Korea. J Korean Soc Health Educ Promot 2000;17:41–59.
-
MeSH Terms
Figures
Tables
Funding Information
-
National Research Foundation of Korea
NRF-2012R1A2A2A03045612