Introduction
Since the mid-1980s, many predictive models for the assessment of cardiac postoperative mortality have gained popularity in the medical community [
1]. Because much has happened in the field of cardiac surgery in recent years, mortality is now low and morbidity has been suggested as both a valid end point and a more attractive target for developing operative risk models [
2]. General severity-of-illness models can be inaccurate when applied to specific groups of patients, even if they are valid for comparing outcomes in large numbers of patients [
3], and the inaccuracy of these models makes them inappropriate for predicting individual outcome [
4,
5]. Predictive models, therefore, provide significant advantages in clinical decision-making only if they are customized to the specific population of patients to be investigated. Moreover, although most risk-stratification variables are derived from preoperative patient characteristics [
6‐
10], there are several intraoperative and postoperative physiological variables that can influence morbidity and mortality [
11,
12].
Higgins and colleagues previously evaluated the relative contribution of preoperative conditions, operating theater events and physiological parameters on admission to the intensive care unit (ICU) to outcome, describing a sequential model derived from the patient's status on admission to the ICU [
11]. This model is complementary to the preoperative score of the same study group [
13]. Higgins' models, similar to certain other models, use univariate and multivariate logistic regression to quantify prognosis by a numerical scoring system, but caution is needed in applying scores to individuals [
14][
15][
16].
Algorithms for classification derived from the Bayes theorem can be valid alternatives to logistic regression in discrimination problems. The measured set of individual features serves as input to a decision rule by which the patient is assigned to a morbidity risk class. A key characteristic of this approach is that, given complete knowledge of the statistics of the patterns to be classified, the Bayes rule defines the optimum classifier that minimizes the probability of classification error or the expected cost of an incorrect decision [
17]. A Bayes linear classifier is the simplest approach, but, in the Bayes sense, it is optimal only for normal distributions with equal covariance matrices of the classification groups. However, in many cases, the simplicity and robustness of the linear classifier compensate for the loss of performance occasioned by nonnormality or nonhomoscedasticity [
17‐
19]. In clinical decision-making it is easy to implement and locally customize, because the statistics of the patterns to be classified only require knowledge of the group means and the pooled within-sample covariance matrix, which can be estimated by a training set of correctly classified cases [
18]. The simplicity of a linear classifier, which enables it to be easily tailored and updated to the patient population of a given institution, is a significant advantage of this approach in clinical practice, with respect to multiple logistic regression. The Bayes approach also provides a decision rule for prognosis derived from the whole set of measured predictor variables rather than from scores obtained with logistic regression from group characteristics [
20]. These aspects have led to widespread use of the Bayes decision rule in discrimination problems instead of logistic regression [
21].
The aims of this study were as follows:
1)
to develop an ICU–Bayes model to select the preoperative, intraoperative and postoperative risk factors that best predict postoperative morbidity for coronary artery bypass graft (CABG) patients.
2)
to evaluate the reliability of score models in our population of patients.
3)
to compare these different models as predictors of morbidity risk.
Discussion
A model that predicts the outcome of patients in the ICU with good discrimination can be useful because the risk prediction enables better allocation of resources, for example, and can aid decisions about the appropriateness of continuing treatment [
35]. Most studies have concentrated on short-term mortality, and there is a lack of easy-to-use models predicting risk of complications (morbidity). However, mortality by itself might not be an adequate indicator of quality of care or resource use [
2,
14]. On the contrary, morbidity might be more informative, being a more frequent event than mortality and enabling statistical inferences to be drawn from smaller populations. Finally, morbidity can be measured in terms of postoperative complications and length of stay in the ICU [
1,
14]. Several authors have developed predictive indices of stay in the ICU after heart surgery. Most of these studies included preoperative variables and, generally, did not take events affecting patient outcome in the operative or immediate postoperative period into account. Of course, quantifying risk and assessing outcome in the ICU after cardiac surgery according to preoperative variables alone could lead to incorrect conclusions about the true morbidity risk [
1,
11,
14]. We chose to consider the contribution of preoperative conditions, operating theater events and physiological measurements on admission to the ICU and selected an optimal subset of predictor variables using a stepwise technique. The aim of the study was to compare two approaches for risk discrimination in ICU patients after heart surgery: a Bayes linear classifier developed in our specialized ICU, and score models designed in our training set using the method proposed by Higgins and colleagues [
11].
Both approaches have strengths and weaknesses. The greatest benefit of a score model is that it only requires the sum of integer factors and is, therefore, very simple to apply in routine clinical practice. However, the Higgins approach first requires the development of a logistic regression model. Although continuous and categorical predictor variables can be mixed, the model development can be problematic because logistic regression is very sensitive to correlations between predictors in the model [
16]. If the predictor variables are highly correlated during local application, the result is a loss of information. To overcome this problem, we used a stepwise procedure, similar to that employed with the Bayes model, to select variables to enter in the logistic regression model. A weakness of the scoring system is the difficulty of locally customizing this type of model if training sets planned in a different institution are used. The design of the scoring system requires a complex process, which can have low interobserver reproducibility. In particular, to refit the logistic model using all predictors as categorical variables, Higgins and colleagues used a locally weighted smoothing scatterplot procedure, which involves subjective choices, to identify cut-off points. Similar difficulties might also be encountered when the model is updated with new data, such as improved results resulting from technological advances. Easy updating is a crucial feature. In fact, acquisition of correctly classified new patients enables the training set to be increased day by day, with corresponding improvement in discrimination performance of the model. Progress in medical techniques also makes it necessary to be able to change decision-making models continuously. For example, the dramatic decrease in cardiac postoperative mortality means that morbidity is now used as the new end point for developing operative risk models. Bayes linear discrimination provides much more ductile models because their tuning to new data sets is a rapid and objective procedure that only requires calculation of predictor variable means in the two risk classes and pooled variances and covariances in the whole training set. A weakness of this approach is that it is optimal only if the CPDFs of the two classes can be assumed normal and with equal covariance matrices. However, this type of classifier is used in a wide range of clinical applications because its simplicity and robustness compensate for the loss of performance resulting from incomplete observance of the above statistical hypotheses [
17‐
19]. Our results show that the Bayes linear classifier can predict all types of complications, especially infection and renal failure. Discrimination increases with the number of complications. In particular, the model exactly recognized patients with more than three complications.
The area under the ROC curve, estimated by a maximum-likelihood procedure by assuming binormal distribution of the data, was significantly higher for the Bayes linear model. Similar results were obtained by evaluating the empirical ROC curves obtained from the testing set. According to the Hosmer and Lemeshow criterion [
15], all locally customized models had acceptable discrimination capacities in the testing data set, because their AUCs were much greater than 0.7 and less than 0.8. On the contrary, the AUC of Higgins' standard scoring system calculated with the testing data set did not reach 0.7, indicating poor discrimination capacity for this model in our patients. With regard to calibration, the Hosmer-Lemeshow test showed good fit for all models, except Higgins' scoring system. Table
5 sums up the discrimination and calibration performances tested for the Bayes linear classifier and FC, PC and Higgins' standard scoring models. It points out that the two locally customized score models had significantly lower discrimination capacities than the Bayes linear classifier. The statistical significance of the difference in AUCs between the Bayes linear classifier and the score models increased when passing from the FC to the PC approach, indicating that the score model performance considerably worsened when using the set of variables identified as optimal by the Bayes classifier as predictors. Furthermore, Table
5 shows the weak points of the Higgins' standard score system applied in our specialized ICU, confirming that any comparison of a locally customized model with a previously published model is unfair, regardless of the method by which the model was developed.
In our data set, model performance dropped sharply when logistic regression models were changed to scoring systems, using the procedure suggested by Higgins
et al. [
11]. In fact, when we customized logistic models without transforming regression coefficients into integer scores, we obtained discrimination performance only slightly worse than that of the Bayesian model; however, in this case statistical comparison of ROC areas did not indicate significant differences. This fully agrees with the results obtained in previous studies [
21,
36] and suggests that attempts to obtain a very simple clinical model that reduces computation difficulties could lead to significant loss of performance. Despite the immediateness and simplicity of scoring systems derived from weighted variables, sequential summing of integer factors can distort the multivariate characteristics of outcome prediction. The Bayesian model does not use a weighted scoring system, it uses a decision rule that enables the probability of morbidity in patients undergone CABG surgery to be assessed according to multivariate statistics of the predictor variables used for discrimination (12 variables were selected in our model).
Many papers have tested the validity of the preoperative scoring system [
37‐
42], but to our knowledge, no study on validation of Higgins' ICU-admission score has been published. The present study is the first to locally customize this ICU scoring system and to test its validity using external data. In the original version of the ICU-admission morbidity model, Higgins and associates used an additive scoring system comprising 13 weighted predictors that were graded from 1 to 7, giving a maximum total score of 44 points [
11]. In the FC version, the same method of model development led to a different choice of 13 weighted predictors. Most risk predictors in Higgins' score are the same in other North American and European mortality risk models (such as; the Parsonnet and EUROscore models) [
43‐
45]. Similar risk factors were revealed by Higgins and colleagues and in our Bayes model (Table
2): emergency procedure, age, elevated serum creatinine levels, prior heart operation, history of vascular disease, weight, CPB time, use of IABP after CPB, and low postoperative flow state (low cardiac index and low DO
2I). Although a low preoperative ejection fraction is a known predictor of poor immediate postoperative outcome after cardiac surgery, it was not a risk factor in our study. This is in line with the findings of Zaroff and colleagues [
46], who showed that in some high-risk cases there could be great improvement in left ventricular function after operation because of successful revascularization. Not all patients with a low preoperative ejection fraction required inotropic support, and a low ejection fraction was not a risk factor for outcome for the whole population [
46]. However, we found that morbidity was associated with the need for preoperative and postoperative IABP and use of inotropes after the operation, and these variables are strongly correlated to poor cardiac function.
The idea of developing a risk model derived from the Bayes rule is not new. In 1985, Edwards and colleagues began to use a Bayesian model of operative mortality associated with CABG procedures [
20]. The Society of Thoracic Surgeons National Cardiac Surgery Database model, developed by Edwards and colleagues, incorporates 23 risk factors and is the most widely used model in the USA [
47]. The Society of Cardiothoracic Surgeons of Great Britain and Ireland also proposed a Bayesian model for CABG patients in the UK [
48,
49]. However, both these models focused on postoperative mortality, not morbidity. In the present study, we developed and tested a Bayesian discrimination model for assessing morbidity risk after coronary artery surgery. Some practical aspects need to be considered when this discrimination technique is chosen as support for clinical decision-making. First of all, this approach requires the use of a computer. Moreover, an initial retrospective study for deriving the model might be time consuming and tedious. If detailed records are not available, it might not be possible to obtain the whole set of variables for each patient. Finally, many groups have found it necessary to establish physician training programs to ensure that all users of the model have the same interpretation of terminology and results [
20,
25,
26].
Conclusive remarks must also be made about possible limits of this study. Firstly, the model was developed and validated in a single institution with a relatively low surgical volume, and it might not reflect the experience of hospitals performing a different number of CABG operations, because outcomes can be related to surgical volume [
1,
14,
50]. Moreover, patients were treated by a small, experienced team, and this decreased the variability resulting from perioperative factors. Secondly, we created a multivariate model with 12 predictor variables, not a simple risk score. Not all physicians will find it easy to use, because it requires a special software program to estimate the risk of morbidity. However, our results show that transforming complex statistical models into simple score systems might lead to a significant loss of discrimination performance. On the other hand, personal computers are widely used for managing patient data in ICUs, so introduction of software for estimating the risk of morbidity would not be unduly onerous. Thirdly, the model is derived from preoperative, intraoperative and postoperative variables and only allows prediction of morbidity after ICU admission. Because the model does not assess risk solely on the patient's preoperative status, it cannot be used to enhance patient counseling. Another preoperative risk model needs to be used to define the risk and planning of surgical procedures and type of anesthesia before the operation. Finally, because the duration of CPB was an intraoperative risk factor for morbidity in our model, we might expect the risk of morbidity to be incorrectly estimated in off-pump patients using this risk model.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
All authors participated in study planning and co-ordination. SS collected clinical data. SS, BB and PG were concerned with all clinical aspects of the study. GC, EB and PB designed the Bayesian classifier and locally customized score models and performed the statistical analysis. All authors read and approved the final manuscript.