Background
Coronavirus disease 2019 (COVID-19) is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and has spread worldwide [
1]. On March 12, 2020, the World Health Organization (WHO) announced the disease to be pandemic. It has affected more than 200 countries with about 10,000,000 confirmed cases as of July 01, 2020 [
2]. Therefore, the epidemic of COVID-19 has become a global public health crisis.
Different clinical patterns, such as mild, moderate, and severe to critical types, were observed in patients with COVID-19. Although most COVID-19 patients have mild or moderate symptoms and signs, the finding from China indicated that about 14% of patients were of the severe type and 5% were of the critical type [
3]. Previous studies and clinical practice showed that the degree of severity was associated with the clinical treatment and prognosis of the disease [
3‐
6]. The average overall case-fatality rate of confirmed COVID-19 patients was 2.3%, but that was up to 49.0% in critical patients [
3]. Missed diagnoses will delay the appropriate clinical treatment and increase the possibility of poor prognosis. On the other hand, treatment for a severe or critical COVID-19 patient requires vast medical resources, and over misdiagnoses will overuse the medical resources and increase the medical burden. Therefore, early identification of patients who are likely to develop severe or critical COVID-19 is especially important for clinical practice and epidemic control. In clinical practice, the severity of COVID-19 is categorised into four levels as mild, moderate, severe, and critical types according to the Seventh Edition of the Guide to Diagnosis and Treatment of New Coronary Pneumonia [
7]. This classification is preformed mainly based on the clinical symptoms, oxygen saturation (SaO2), and imaging evidence from computed tomography (CT). However, no evidence from laboratory markers has been included. Previous studies have found that lymphopenia, organ dysfunction, coagulopathy, and elevated D-dimer levels were associated with the severity [
3‐
6,
8].
In this study, we aimed to screen severity-associated markers and construct an assessment model for predicting the severity of patients with COVID-19 based on the data from two hospitals in Hangzhou, Zhejiang province, China.
Methods
Study population
This study enrolled 172 confirmed COVID-19 patients from January 20, 2020 to April 1, 2020 in Hangzhou, Zhejiang Province, China. Among these patients, 104 from Hangzhou Xixi Hospital were used for screening the severity-associated markers and constructing the assessment model as a training set. Part of the 104 patients had been used in the previously published studies [
9,
10]. On the other hand, 68 patients from the First Affiliated Hospital, School of Medicine, Zhejiang University (FAHZJU) were used to validate the model as a validation set. These patients were part of the sample which had been published previously [
10]. COVID-19 was diagnosed according to the interim guidance from the WHO [
11]. The severity of COVID-19 was categorised into four levels according to the Seventh Edition of the Guide to Diagnosis and Treatment of New Coronary Pneumonia [
7]. The mild type was defined as patients with mild clinical symptoms and normal imaging on CT. The moderate type was defined as patients with fever, respiratory symptoms, or other symptoms, and altered imaging evidence with pneumonia. The severe type was defined as patients with at least one of the following symptoms: shortness of breath (breathing rate ≥ 30/min), SaO
2 at rest ≤ 93%, partial pressure of oxygen in arterial blood (PaO
2)/ inspired oxygen fraction (FiO
2) ≤ 300 mmHg, or lung infiltrates > 50% within 24 to 48 h. The critical type was defined as patients with any of the following symptoms: respiratory failure requiring mechanical ventilation, shock, or a combination of other organ failures requiring ICU monitoring treatment.
This was a retrospective study and the protocol was approved by the Ethics Committee of Xixi Hospital and FAHZJU.
Data collection
Data at admission, including demographic information, comorbidities, clinical symptoms and laboratory tests, were extracted from electronic medical records. Collected data were reviewed by a trained team of clinical physicians. Demographic information included age, sex and body mass index (BMI). Comorbidity was defined as having at least one of the following diseases: diabetes, hypertension, cardiovascular disease, severe congenital disease, cancer, and chronic diseases of the liver, kidney, or respiratory system. Clinical symptoms included fever, fatigue, cough, expectoration, shortness of breath, diarrhoea and myalgia. Laboratory markers of laboratory tests included the following eight categories: inflammation, electrolytes, nutritional metabolism, and liver, renal, cardiac, respiratory, coagulation functions.
Statistical analysis
Continuous variables were presented as median (interquartile range [IQR]), and categorical variables were presented as numbers (percentage). Continuous laboratory markers were dichotomously categorised (normal versus abnormal) under the criteria of their clinical reference values. Severity-associated markers of COVID-19 were screened using the ordinal logistic regression.
To construct an assessment model, two criteria were set for selecting markers: P value < 0.05 in the ordinal logistic regression, and at least half of severe or critical patients had an abnormality in the marker. Least Absolute Shrinkage and Selection Operator (LASSO) regression was used for further feature selection. Optimal regularization parameter (λ) was estimated by fivefold cross-validation. To increase the stability of feature selection, we used bootstrap with 1000 resamples and built a LASSO regression model for each bootstrap set. The markers, which were present in more than half of all bootstrap sets, were included in the final model.
Assessment models were constructed using logistic regression, ridge regression, support vector machine, and random forest in the training set. The performance of different models was evaluated by the area under the receiver operator characteristic curve (AUROC). For the internal validation, we used bootstrap with 500 resamples to decrease the over-fitting. For the external validation, four models were assessed in the validation set, respectively. A risk score was established according to the result of the best model. The performance of the risk score in all patients was evaluated using AUROC and calibration curve. The optimal cutoff value was calculated with the maximal Youden index. A web-based assessment system was developed based on the risk score.
All statistical analyses were conducted using R software, version 3.6.2 (R Foundation for Statistical Computing). A two-sided P value < 0.05 was considered statistically significant.
Discussion
Early identification of patients who were likely to develop severe or critical COVID-19 would help reduce the case-fatality rate and efficiently utilize the limited medical resources. In this study, we identified a panel of clinical markers associated with the severity of COVID-19 and constructed different severity-prediction models. We found that the ridge regression model was the best based on high AUROCs in both the internal and external validations of 0.930 (95% CI, 0.914–0.943) and 0.827 (95% CI, 0.716–0.921), respectively. Furthermore, we established a risk score and a web-based assessment system to help clinicians to detect the patients who were likely to develop severe or critical COVID-19 at admission.
Previous studies showed that severe or critical COVID-19 patients were older, had more comorbidities, higher levels of LDH, D-dimer, CRP, and lower levels of ALB, lymphocyte count [
3‐
6,
8]. These findings were consistent in our study. Moreover, using the data of 208 patients from Fuyang, Anhui Province, Ji et al. [
12] established a scoring model named as CALL to predict the severity of COVID-19. Dong et al. [
13] also developed a scoring system with the data of 147 patients from Wuhan, Hubei Province. The AUROCs of their models were 0.910 and 0.843, slightly lower than the AUROC of our assessment model in the internal validation (0.930). However, their models were not validated in an external dataset, leading to the limitation of their generalizability. In contrast, our model validated in an independent dataset and obtained a satisfactory AUROC of 0.827.
Among the eight markers in our model, LDH, CRP, ALB, and lymphocyte count were well-recognized predictors for COVID-19 severity [
16]. For eosinophil count, Zhu et al. [
14] demonstrated that decreased eosinophils could induce acute lung injury in the mouse model. Liu et al. [
15] also found that increased eosinophil count predicted the improvement in COVID-19 progression. Several studies reported that severe or critical COVID-19 patients often experienced electrolyte disturbances [
3,
4,
16]. In our study, we used the sum of abnormalities in potassium, calcium, sodium, phosphorus and chlorine to comprehensively evaluate the degree of electrolyte disturbances. D-dimer and FIB were indicators of coagulation function. Chen et al. [
8] reported that patients infected with SARS-CoV-2 had abnormal coagulation function (hypercoagulation). We combined the two indicators to increase the sensitivity of judging abnormal coagulation and avoid the collinearity of the two markers. Different from other studies, age was not included in our final model. This might be owing to the high correlation between age and comorbidity in the training set, and the LASSO regression identified comorbidity as a more important marker.
There were several limitations in our study. First, there were different distribution on the severity of COVID-19 between the training and validation sets. There were no critical cases in training set while without mild cases in validation set. This difference was due to the rule of government on the COVID-19 prevention and control in Zhejiang Province in China. Xixi Hospital (municipal-level hospital for infectious diseases) mainly receive and cure the patients with mild, moderate, and severe COVID-19 (no critical patients), while FAHZJU (provincial-level hospital) is mainly responsible for moderate, severe, and critical patients (no mild patients). The different distribution of the severity might have influences on the model construction and validation. However, even there were these differences, the ideal performance was still obtained in the validation stage, and this result indicated that there was relatively high generalizability in our model. Second, the subjects were mainly recruited from Hangzhou and the sample size was relatively small. This would limit the generalizability of our model. Additional validation from areas outside Zhejiang should be conducted in the future. Third, because of the retrospective study design, some laboratory tests were not done in some patients. Therefore, their associations with the severity of COVID-19 might be misestimated. Fourth, the clinical data of the subjects were not comprehensive. Adding other specific markers such as cytokines might improve the performance of our model. Finally, due to the low prevalence of comorbidity, the risks in different types of comorbidities were not considered in the assessment model.
Conclusions
In this study, we screened eight severity-associated clinical markers of lactate dehydrogenase, C-reactive protein, albumin, comorbidity, electrolyte disturbance, coagulation function, eosinophil and lymphocyte counts in COVID-19 patients. Based on these eight markers, an assessment model was constructed to help the clinician to evaluate the likelihood of developing severity of COVID-19 at admission and early take measures on clinical treatment.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.