Setting and data
Our cohorts of unplanned medical admissions are from two acute hospitals which are approximately 65 km apart in the Yorkshire and the Humber region of England – Scarborough hospital (SH) (n ~ 300 beds) and York Hospital (YH) (n ~ 700 beds), managed by York Teaching Hospitals NHS Foundation Trust. We selected these hospitals because they had electronically recorded NEWS2 scores, which are collected as an integral part of the patient’s process of care and were agreeable to the study. Since NEWS is a subset of NEWS2, we developed NEWS and NEWS2 based models because NEWS is still in widespread use.
We included all consecutive adult (age ≥ 18 years) unplanned medical admissions discharged during 3 months (11 March 2020 to 13 June 2020), with electronic NEWS2 data. For each admission, we obtained a pseudonymised patient identifier, patient’s age (years), sex (male/female), discharge status (alive/dead), admission and discharge date and time, diagnoses codes based on the 10th revision of the International Statistical Classification of Diseases (ICD-10), NEWS2 (including its subcomponents respiratory rate, temperature, systolic pressure, pulse rate, oxygen saturation, oxygen supplementation, oxygen scales 2 (yes/no), and alertness including confusion). The diastolic blood pressure was recorded at the same time as systolic blood pressure. Historically, diastolic blood pressure has always been a routinely collected physiological variable on vital sign charts and is still collected where electronic observations are in place. Since NEWS is a subset of NEWS2, we derived NEWS from NEWS2. NEWS and NEWS2 produce integer values that range from 0 (indicating the lowest severity of illness) to 20 (the maximum NEWS2 value possible) (see Supplemental Digital Content - Table
S1 and
S2). The index NEWS/NEWS2 was defined as the first electronically recorded NEWS/NEWS2 within ± 24 h of the admission time. We excluded records where the index NEWS/NEWS2 was not within ± 24 h or was missing/not recorded at all (see Supplemental Digital Content - Table
S3). We searched primary and secondary ICD-10 codes for ‘U071’ for identifying COVID-19. Although we used the ICD-10 code ‘
U071’ to identify records with COVID-19, it is in 95 % agreement with polymerase chain reaction (PCR) swab tests result.
Statistical analyses
We began with exploratory analyses including box plots that showed the relationship between covariates and risk of COVID-19 and line plots showed the relationship between age, vital signs, NEWS2 and risk of COVID-19. We developed three logistic regression models based on NEWS and NEWS2 separately for predicting the risk of COVID-19. The NEWS2-based models (M0’, M1’, M2’) use the index or first electronically recorded NEWS2 dataset within ± 24 h of admission. Model M0’ uses NEWS2 alone; Model M1’ extends M0’ with age and sex and Model M2’ extends M1’ with all the subcomponents of NEWS2 plus diastolic blood pressure. Equivalent models (M0, M1, M2) using NEWS were also developed but model M2 excluded two parameters that are in NEWS2 but no in NEWS - oxygen flow rate and scale 2 (yes/no). A log-transformation was used for variables with right-skewed distributions, i.e. for respiratory rate, pulse rate, systolic and diastolic blood pressure.
We developed all models using YH data (as development dataset) and externally validated their performance on SH data (as validation dataset).
We report discrimination and calibration statistics as performance measures for these models [
7].
Discrimination relates to how well a model can separate, (or discriminate) between patients with and without COVID-19 and is given by the area under the Receiver Operating Characteristics (ROC) curve (AUC) or c-statistic. The ROC curve is a plot of the sensitivity, (true positive rate), versus 1-specificity, (false positive rate), for consecutive predicted risks. A c-statistic of 0.5 is no better than tossing a coin, whilst a perfect model has a c-statistic of 1. In general, values less than 0.7 are considered to show poor discrimination, values of 0.7 to 0.8 can be described as reasonable, and values above 0.8 suggest good discrimination [
8]. The 95 % confidence interval for the c-statistic was derived using DeLong’s method as implemented in the
pROC library [
9] in R [
10]. Calibration is the relationship between the observed and predicted risk of COVID-19 (24) and can be readily seen on a scatter plot (y-axis observed risk, x-axis predicted risk). Perfect predictions should be on the 45° line.
The predictive model performance is usually overestimated if the same data is used for testing model performance. There are several internal validation methods, which aimed to provide a more accurate estimate of predictive model performance. We used bootstrapping as an internal validation approach to assess the discrimination and calibration for all the models [
11,
12]. The overall statistical performance was assessed using the scaled Brier score which simultaneously incorporates discrimination and calibration [
7]. The Brier score is the squared difference between actual outcomes and predicted risk of COVID-19, scaled by the maximum Brier score such that the scaled Brier score ranges from 0 to 100 %. Higher Brier scores indicate superior models. We further assess discrimination and calibration-in-the-large and calibration slopes in the validation data.
The clinical cut-off of NEWS and NEWS2 is 5+ (Supplemental Digital Content - Figure
S1). This is the recommended threshold for detecting deteriorating patients and sepsis [
13,
14]. Therefore, we assessed the sensitivity, specificity, positive and negative predictive values and likelihood ratios for these models at NEWS/NEWS2 thresholds of 5+ [
15]. We further compared the net benefit for all models, which may inform the utility of the models in routine clinical practice [
16]. The net benefit is calculated at a particular threshold probability
\({p}_{t}\) with total sample size
\(N\) as follows:
$$Net benefit= \frac{True positives}{N}-\frac{False positivies}{N}\times \frac{{p}_{t}}{1-{p}_{t}}$$
The model with the highest net benefit metric has the highest clinical value.
We calculated the minimum sample size using the R package
pmsampsize [
17]. We found 930 (93 events) is the minimum required sample size with number of predictors = 21, R2 = 0.182, prevalence = 0.10, shrinkage > 0.9, margin absolute prediction error (MAPE) = 0.05 [
18]. We followed the TRIPOD guidelines for reporting model development and validation [
19]. We have deployed our best performing models - M2’ and M2 - as a calculator for predicting the risk of COVID-19
https://covidcalc.shinyapps.io/calc/. We used Stata [
20] for data cleaning and
R [
10] for statistical analysis.