Background
Globally, stroke is the second leading cause of death following ischemic heart disease and the third leading cause of disability [
1,
2]. In 2013, 6.5 million deaths from stroke (51% died from ischemic stroke), 113 million disability-adjusted life years were lost because of stroke (58% due to ischemic stroke) and 10.3 million of people with new strokes (67% were ischemic stroke) [
1]. In 2015, prevalence of stroke was 42.4 million people, which included ischemic stroke for 24.9 million. There were 6.3 million stroke deaths worldwide, and 3.0 million individuals died of ischemic stroke [
2].
Minimizing the time to treatment for stroke is the important key to improving chances of an excellent outcome (time lost is brain lost) [
3]. It is also important to be able to predict the outcomes of diseases or treatments. Most physicians use their own clinical experience in predicting their patients’ outcomes for making decisions in patient care management. The accuracy of these informal predictions is unclear. Care management might be improved if the physicians combined their clinical forecasts with the formal predictions provided by statistical models. This may be more accurate than relying simply on clinical experience. Prognostic models are statistical tools to assist physicians in making decisions which may affect their patients’ outcomes [
4].
Accurate prognostic models of the functional outcome of a complete recovery in patients after ischemic stroke could be beneficial to neurological care practices for a number of reasons. Firstly, the information of developed prognostic model could be used to select appropriate treatments and action plans in individual patient management, including patient counseling. Secondly, they could be used to improve rehabilitation and discharge planning. Lastly, in light of a weakening economy, prognostic models could be used to make the best clinical choices for patients with regard to specific clinical scenarios which may reduce health care costs [
5].
To date, several studies have developed prognostic models to predict functional outcomes after ischemic stroke, and each model has different strengths and weaknesses. Since models do not always work well in practice, it is recommended that, before a prognostic model is used in clinical practice, the performance of the model should be properly evaluated. This process is known as model validation and involves an assessment of calibration (the agreement between the observed and predicted outcomes) and discrimination (the model’s ability to discriminate between those patients who are likely or unlikely to experience a particular prognostic event). A poor calibration usually reflects over-fitting of the model in the development sample. At least the models should be determined the internal validity (for example, using ‘bootstrap sampling’) to assess validity for the setting where the development data originated from. Another aspect is the external validity (using patient data not used for the development model) to assess generalizability [
6,
7].
There may be danger in moving too quickly to use these models without appropriate validation and understanding of their limitations. The purpose of this study was to systematically review and synthesize performance of existing prognostic models which have been used to predict the probability of complete recovery in ischemic stroke and to investigate their quality.
Methods
Selection criteria
We included studies predicting the outcome of complete recovery after ischemic stroke and in which complete recovery was assessed by scores on at least one of the following instruments: the Barthel Index (BI) ≥ 95/100 or 19/20, the Glasgow Outcome Scale (GOS) score = 1, the Oxford Handicap Scale (OHS) score ≤ 2, and the Modified Rankin Scale (mRS) score ≤ 1. A further criterion was that the studies reported model performance by the use of the concordance statistic, area under the receiver operating characteristic curve (AUC) or calibration performance. There were no restrictions on timing of the outcome evaluation, age of the patients, or type/severity of ischemic stroke.
Search strategy
We searched PUBMED, SCOPUS, CENTRAL, ISI Web of Science and OVID MEDLINE for prognostic models published from inception until 4 December, 2017, using the search terms listed in the Additional file
1 without restrictions on publication language. We also reviewed the reference lists of relevant studies.
Study selection and data extraction
Study titles and abstracts were independently screened and selected by two reviewers (NJ and SR) using the specified criteria. If a decision could not be made based on the abstracts, we then considered their full texts. Disagreement was resolved through discussion with a third reviewer (ML). We extracted the performance measures (concordance statistics, AUCs and performance calibrations) of both types of prediction model: development models and validation models. We also extracted study characteristics: author(s), publication year, setting, study design, definition of outcome, number of subjects, number of outcome events, age, ischemic stroke severity and duration of follow-up.
Quality assessment
We assessed the study quality based on an adaptation of the tool developed by D’Amico et al. [
8]. We showed how each study performed according to each of various major methodological requirements for prognosis research studies. The assessment items were as follows:
-
Did the prognostic study use a cohort design?
-
Were the predictors clearly defined and details provided of how they were measured?
-
Were the missing data handled appropriately with statistical imputation?
-
Was some form of stepwise analysis used for selecting predictors in a multivariable analysis?
-
Was the sample size adequate as defined by an events-per-variable ratio of 10 or more?
-
Was the final model validated on the patients who were used to generate the model (internal validation)?
-
Was the final model validated on the patients who were not used to generate the model (external validation)?
Statistical analysis
We qualitatively synthesized model performances because each separate model had a different combination of predictor variables. We used frequencies and medians with 95% confidence intervals to describe the model performance which included its calibration (how closely predicted values agree with the observed values) and discrimination (the model’s ability to discriminate between patients developing and not developing an outcome event, e.g., complete recovery cases and non-complete recovery cases among ischemic patients). The assessment of calibration was performed using either the Hosmer-Lemshow chi-square test or a calibration curve. The assessment of discrimination was conducted using either the AUC or the concordance statistic (C-statistic) along with a 95% CI. The discrimination of each model was evaluated in accordance with the suggestions by Hosmer and Lemeshow: excellent (AUC ≥ 0.90), good (AUC ≥ 0.80 and ˂ 0.90), fair (AUC ≥ 0.70 and ˂ 0.80), and poor (AUC ˂ 0.70). Calibration was judged as good when a calibration curve closely resembled the line representing perfect calibration (the pre-specified acceptable absolute mean error for the calibration curve was ˂ 0.4) or when the Hosmer-Lemshow chi-square test was non-significant [
9,
10]. We estimated the 95% CIs for AUCs using Hanley’s method for a study which presented only AUCs. The estimation required three quantities: total sample size, number of events and an AUC [
11]. If two or more models assessed discrimination performance in terms of validation, we performed a random-effects inverse-variance meta-analysis using Stata version 10.1 [
12].
Discussion
This systematic review identified 23 prognostic models from ten studies for complete recovery in ischemic stroke. None of these models provided complete information about the model performance which included both internal and external validation. While most prognostic models (18/23) were validated and half of the models (12/23) reported fair to good discrimination on internal validation, only one model showed good calibration. Nearly one third of the models (9/23) were externally validated, and reported fair to good discrimination performance, but only a quarter of the models (6/23) reported nearly perfect calibration. Only two models were validated both internally and externally but not in complete process of the model performance. There was only one model in which a meta-analysis could be performed, and the pooled AUC was fair.
The models were developed and validated in elderly patients mainly with a moderately severe stroke and mainly from high income countries. In addition, most of the developed models were not externally validated. These factors are likely to limit the application of the models to other populations and settings.
In our review, we conducted a systematic search of several electronic databases. All of the included studies used a cohort design. For half of all the identified models more than 10% of their subjects’ data was missing. The model performance analyses were handled by excluding the subjects with missing data. This strategy could lead to biased conclusions if the reasons for missing data were related to the important prognostic indicators or outcomes.
There are some issues related to the assessed quality of the prognostic models. Firstly, our search was performed up to 4 December, 2017. No attempt was made to search unpublished studies. Studies were selected and extracted independently by two reviewers. We did not assess publication bias by any statistical tests or funnel plot asymmetry due to insufficient data. However, we assessed and presented the quality of all the 10 selected studies for each important quality features listed in our methods section. Secondly, about 70% (16/23) of the models used the full model approach in predictor selection (all the candidate predictors included in the multivariable analysis) [
13,
14,
16]. This approach could reduce the risk of predictor selection bias and over-fitting. However, this technique is difficult to apply if the number of events is limited [
6]. In our review, all of the models with the full model approach to predictor selection had fulfilled the requirement of more than 10 patients with complete recovery per predictor. Finally, incomplete measures of the prognostic model performances were reported in all included models. The 95% confidence intervals of the estimated performance indices were rarely reported. The 95% confidence intervals for AUCs which we estimated using Hanley’s method may be slightly inaccurate, but this approach has been accepted in estimating the precision of AUCs when their standard errors are not reported. In addition, calibration performance was often ignored. Calibration is the important performance measure for application of the model in practice. A poor calibration reflects over-fitting of a model and can also be interpreted as reflecting a need for shrinkage of regression coefficients in a prognostic model [
10].
To our knowledge, there are three previous systematic reviews of prognostic models in stroke, but their outcomes of interest were different from ours: for example, mortality in hemorrhagic stroke, recurrent stroke and survival outcome of stroke patients. Therefore, our results were not able to be compared directly to the results of previous reviews. However, while the discrimination performance of their prognostic models varied from poor to good, calibration performance was not considered. The first study was a systematic review of prognostic tools for early mortality in hemorrhagic stroke [
24]. The authors selected 11 articles (12 prognostic tools), but validation data were reported for only one of the prognostic tools. The Hemphill-intracerebral hemorrhage (ICH) model had the largest number of validation cohorts (nine articles) and showed good performance with a pooled AUC of 0.80 (95%CI 0.77 to 0.85). The second study was a systematic review of prognostic models to predict survival in patients with acute stroke. The authors found 83 models, but only three models were externally validated and showed fair to good discrimination [
25]. The final study was a systematic review of prediction models for recurrent stroke and myocardial infarction after stroke. The authors showed that the models for recurrent stroke discriminate poorly between patients with and without a recurrent stroke with the pooled AUCs of 0.60 (95% CI 0.59 to 0.62) for the Essen Stroke Risk Score (ESRS) and 0.62 (95% CI 0.60 to 0.64) for the Stroke Prognosis Instrument II (SPI-II) [
26].
Our findings suggest that some of the current prognostic models for predicting complete recovery from ischemic stroke may be clinically useful when applied to patients from high income countries who have experienced moderately severe ischemic stroke. Model No. 9 which was developed by Johnston et al. [
14] suggests that the model was not over-fitted to the data set and is likely to be useful in predicting complete recovery from ischemic stroke in a similar population. Models No.3, 6, 13 and 15 involving eight predictors, including NIHSS score, age, infarct volume, history of diabetes mellitus and stroke, prestroke disability, small-vessel stroke and tissue-type plasminogen activator (t-PA use). Some were overlapped among the models as shown in Table
2. These models fulfilled the majority of the methodological requirements and showed acceptable performances in the external validation for both discrimination and calibration. We recommend that these models should be used in other settings.
Conclusions
This systematic review has shown that, while many prognostic models have been published, they are rarely validated in external populations, and most of the models were developed from elderly patients with moderately severe ischemic stroke, mainly in high income countries. There is a need for the development of models in other settings, especially in low and middle income populations. All models should be validated, and performance measures should be reported which address the two key issues of discrimination and calibration.
Acknowledgments
The authors wish to thank a native English language speaker, Peter Bradshaw, for line and copy editing drafts of the manuscript. The funding organization had no role in this research.