Methods
This study followed the different phases of the Cross Industry Standard Process for Data Mining model as its methodology [
22].
Business understanding
Different recent studies point out the need to improve the diagnosis of neonatal jaundice to prevent severe hyperbilirubinemia and kernicterus. Hence, it is important to explore new methodologies, such as data mining, that can provide better results than the traditional methods.
After examining the different data mining tools, the software WEKA version 3.6, was chosen mainly because of its characteristics: it is a user-friendly tool for health professionals and, as a free application, does not represent any additional cost [
23].
Compared with the studies identified in the literature it is expected that data mining techniques could induce predictions with greater accuracy than known traditional methods.
Data comprehension
The study was performed at the Obstetrics Department of the Centro Hospitalar Tâmega e Sousa, E.P.E., North Portugal, during the period from February to March of 2011.
Healthy newborn infants with 35 or more weeks of gestation were included in the study. Thus, 4 cases without this requirement were excluded from the 231 in the initial sample.
All the data present in the newborn original paper-based record, collected by doctors and nurses, was transcribed into a Microsoft Access database previously implemented for this purpose.
The collected data included: mother and father information, siblings information, gestational information, delivery information, physical exam of the newborn and clinical information of the complete hospital stay. At total, 72 variables were collected and analyzed. The complete table with all the variables is presented in Additional file
1.
Also, transcutaneous bilirubin levels were measured from birth to hospital discharge with maximum time intervals of 8 hours between measurements, using a noninvasive bilirubinometer, the JM-103 Jaundice Meter from Konica Minolta, following the manufacturer’s instructions. Once hyperbilirubinemia was diagnosed and phototherapy was provided, the further bilirubinometer measurements were not performed.
Data preparation
A preliminary statistical analysis was carried out to increase knowledge about the dataset.
During this statistical analysis we performed the data preparation that included elimination, integration, recoding and calculation of variables. All these transformations are presented in detail in Additional file
1.
Eliminated variables – only variables with all missing values have been eliminated, that is, those variables whose information was not collected by doctors and nurses.
Integrated variables – in the newborn paper record, different variables collected repeated information, therefore we integrated the information of these variables into new ones.
Recoded variables – to facilitate the statistical analysis, some variables were also recoded (transformed).
Calculated variables – some variables, such as the dates of admission and discharge, were used to calculate new variables (e.g., length of hospital stay).
After the preparation of data, 60 out of 72 variables remained, plus the transcutaneous bilirubin levels. The final dataset was converted to be modeled using WEKA.
Modeling
To perform data modeling, different classification algorithms, often applied in medical datasets and implemented in WEKA, were chosen: J48 (implementation of the C4.5 algorithm, for generating pruned or unpruned decision trees), simple CART (a decision tree learner implementing minimal cost complexity pruning), naïve Bayes (a Naïve Bayes classifier using estimator classes), multilayer perceptron (a classifier that uses backpropagation to classify instances), SMO (implements John Platt’s sequential minimal optimization algorithm for training a support vector classifier) and simple logistic (classifier for building linear logistic regression models). Other similar methods were also used but without better results and, therefore, are not reported in this study.
The tests were performed using internal cross validation 10-folds. The internal cross-validation is used to determine how the quality of a learning algorithm will be affected in separate sets of data.The average performance on the test set provides an estimate of the performance of the classifier built from the entire data set [
20,
24,
25].
xAll classification algorithms were tested for different subsets of variables and compared in terms of accuracy, sensitivity and specificity. For all subsets, we established a sensitivity of 90% and calculated the respective specificity due to the importance of high sensitivity values in medical decision. Standard error for all AUC measurements was estimated using the method proposed by Hanley and McNeil [
26].
The different subsets corresponded to three different moments. First we used only risk factors that were obtained immediately after the newborns birth: Mother age; Father age; Head circumference; Mother pathologies; Mother usual medication; Gestational age; Physical exam report; Type of delivery; Newborn blood group (Rh); Newborn blood group (ABO) and Mother blood group (ABO).
Then, we also tested the algorithms with the TcB levels, without other risk factors, obtained until 24 hours of life of the newborn.
Finally, we tested the combination of the risk factors and the TcB levels at 24 hours of life of the newborn.
An approval was obtained from the Ethics Committee of the Centro Hospitalar Tâmega e Sousa, EPE, having the reference number 0568/2011.
Results
From the total of 227 newborn infants included into the study, 35 cases (15.4%) were diagnosed with hyperbilirubinemia and treated with phototherapy, the predictive outcome of the study.
The 35 newborn infants treated with phototherapy initiated treatment with a median age of 45.5 hours and early jaundice, detected before the newborn completes 24 hours of life, was present in 4 cases (11.4%).
In the first step, applying the algorithms to the clinical risk factors, a higher accuracy was obtained with Bayes net algorithm (AUC=0.74), followed by naïve bayes and simple logistic (AUC=0.72).
Using only the TcB levels obtained before 24 hours of life of the newborn, higher accuracy was obtained with the multilayer perceptron, the WEKA artificial neural network algorithm (AUC=0.84) followed by naïve Bayes (AUC=0.82) and simple logistic (AUC=0.80).
When combining clinical risk factors with TcB, at 24 hours of life of the newborn, higher accuracy was obtained with simple logistic algorithm (AUC=0.89) followed by naïve Bayes (AUC= 0.88) and Bayes net (AUC=0.87).
In all algorithms, except the multilayer perceptron, the combination of clinical risk factors with TcB levels allowed to improve the accuracy of prediction when compared with TcB or clinical risk factors alone.
Table
2 presents the results from the comparison of the different algorithms applied to data subsets.
Table 2
Comparison of the application of different algorithms to data subsets in terms of accuracy and specificity (for sensitivity of 90%)
J48
| 0.47 | (0.42-0.52) | 0.09 | 0.79 | (0.74-0.84) | 0.43 | 0.75 | (0.70-0.80) | 0.33 |
Simple Cart
| 0.46 | (0.41-0.51) | 0.10 | 0.76 | (0.71-0.81) | 0.42 | 0.77 | (0.72-0.82) | 0.41 |
Naive Bayes
| 0.72 | (0.67-0.77) | 0.38 | 0.82 | (0.77-0.87) | 0.54 | 0.88 | (0.84-0.92) | 0.56 |
Bayes Net
| 0.74 | (0.69-0.79) | 0.42 | 0.73 | (0.68-0.78) | 0.35 | 0.87 | (0.83-0.91) | 0.60 |
MP
| 0.70 | (0.65-0.75) | 0.35 | 0.84 | (0.80-0.88) | 0.53 | 0.81 | (0.76-0.86) | 0.50 |
SMO
| 0.53 | (0.48-0.58) | 0.15 | 0.50 | (0.45-0.55) | 0.12 | 0.72 | (0.67-0.77) | 0.54 |
Simple Logistic
| 0.72 | (0.67-0.77) | 0.39 | 0.80 | (0.75-0.85) | 0.41 | 0.89 | (0.85-0.93) | 0.56 |
Discussion
When compared with the traditional methods, the prediction with the application of data mining techniques offered interesting results.
Comparing with the literature, and specifically with a study from Chou et al. [
14] which also sought to provide information for the indication for phototherapy, this study shows improved results with an AUC of 0.74, compared to the 0.69 presented in that study, although the differences are not statistically significant (the confidence intervals overlap). But, when compared with other studies, particularly a study by Newman, et al. [
16] which seeks to predict bilirubin levels above 25 mg/dl, and safeguarding the differences, our study presented falls short of the 0.83 presented.
Despite not presenting so good results, decision trees models, generated using for instance J48 or Simple Cart, have the advantage of being more easily interpretable, especially when compared with closed models, usually called black box models, such as Artificial Neural Networks. This advantage makes the first to be more easily accepted by the medical community [
24,
27].
Regarding the bilirubin assessment, the identified studies seek to predict the risk of subsequent hyperbilirubinemia using predischarge TSB values. In the present study we used the first day TcB level, to predict the need for phototherapy.
With the application of the multilayer perceptron algorithm, we obtained a slightly higher accuracy than Keren & Bhutani [
17], with an AUC of 0.84, compared with AUC of 0.83, however, this difference is not statistically significant because our result falls in the confidence interval presented in their study.
However, in practice, because it presents better accuracy results, the pediatricians base their assessment in the combination of clinical risk factors with the bilirubin levels presented by the newborns. This is also the methodology supported by the international guidelines from AAP and NICE.
Applied to our dataset, the simple logistic algorithm returned better results than those presented by Newman, et at [
16]: we obtained an accuracy of 0.89 compared to 0.86 in their study. Once more, this difference is not statistically significant, since the confidence intervals overlap.
In addition to the comparison of accuracy it is also important to make an interpretation of the generated models and compare them with clinical rules of thumb, that is, what actually prevails in practice.
Thus, taking as an example the results obtained with the simple logistic algorithm, which is one of the best performing models in all feature subsets, we found that, when applied to the subset containing risk factors and transcutaneous bilirubin levels, the variables with higher influence are, in descending order: TcB in the range between 8 to 16 hours, TcB in the range 16 to 24 hours, gestational age and newborn blood group (ABO).
It is interesting to note that, with regard to TcB levels, the range 8 to 16 hours has greater influence than the subsequent interval, between 16 to 24 hours. It is also important to underline that the first interval between 0 and 8 hours of the newborn life is not part of the generated model. This may be due to the low register of values in the first interval of 8 hours. However, it also reflects the importance of assessment and registration of TcB as early as possible, as supported by several studies.
Concerning risk factors, the algorithm used only the variables gestational age and newborn blood group (ABO) for building the model when, in daily practice, the presence of any risk factor guidelines described by the presence, for example, of cephalhematomas or previous sibling with phototherapy, are considered as an equal increase in risk for subsequent hyperbilirubinemia.
These results are similar to studies that indicate the gestational age as the most determinant variable in the prognosis of neonatal jaundice [
28]. However, the newborn
blood group (ABO) acquires a prominent position in the generated model, since it can be related to the cases of jaundice derived from blood incompatibility.
Resuming, preserving the differences, the application of data mining techniques allowed building high accuracy models, with results not lower than the traditional methods found in the literature.
As mentioned, the average age of newborns at the beginning of treatment is around 45.5 hours of life, a value very close to the possible time of hospital discharge. This makes us believe that an early correct assessment, which can be performed by the proposed methods – the application of data mining methods – can enable reducing effectively the time of admission, as well as prevent incorrect diagnoses for the same reason and reduce readmissions after hospital discharge.
Limitations
The predictive outcome, hyperbilirubinemia, defined differently in the compared studies, may constitute an important bias factor.
The use of other data mining software’s besides WEKA, with different implementation of data mining algorithms, could eventually lead to different results.
A bigger sample could also improve the obtained results.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
All authors contributed equally in the research. All authors read and approved the final manuscript.