Background
The novel coronavirus disease (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has spread rapidly across the world and caused a global pandemic [
1]. By 1 November 2021, 246,594,191 confirmed cases and 4,998,784 deaths were reported in 223 countries, areas, or territories globally [
1]. It is difficult to predict how long the coronavirus pandemic will last, due to the unequal distribution of vaccines and the changing effect of the new crown vaccines on the constantly emerging variant strains [
2‐
5].
Non-pharmaceutical approaches including case isolation, contact tracing, and quarantine and social distancing are still and will be the main interventions for the control of this pandemic [
6‐
8]. An accurate estimation of the length of the incubation period is crucial for highly efficient non-pharmaceutical interventions.
So far, the estimation of the median incubation period and the corresponding 95% confidence interval has been inconsistent, largely due to the heterogeneity of populations and epidemic phases that had been studied [
9,
10]. Although most cases were reported with a median or mean incubation between 2 and 12 days, studies showing prolonged incubation period over 2 weeks, with an extreme incubation period of 38 days ever reported [
9‐
13]. However, almost all previous researches were based on limited sample sizes, mostly from case reports or single case cluster that were recorded at early epidemic phase. No studies have ever focused on case series with prolonged incubation period > 14 days, to investigate their epidemiological features or to explore the association with clinical severity and transmissibility. This had raised a concern that beyond the required quarantine duration, which kind of patients were likely to transmit the disease. These need to be addressed in developing prevention and control strategies for containing the disease spread.
Here we extensively reviewed the available data of patients with known dates of exposure and symptom onset in the Chinese mainland to explore the features of patients with prolonged incubation period. The determinants for longer incubation period and the resultant impacts on the clinical severity and transmissibility of COVID-19 were evaluated for the first time as well.
Methods
Data sources and data extraction
Data on individual COVID-19 cases and the source transmission clusters were obtained from publicly available data, mainly from the websites of provincial and municipal health commissions in China and the Chinese Center for Disease Control and Prevention (China CDC), or through internet searches using Chinese keywords (“coronavirus” OR “pneumonia”) and (province and city names). For each identified COVID-19 case who had clear epidemiological survey information, basic demographic characteristics (age, sex, type of residence, living city), starting and ending dates of probable exposure, date of symptom onset (fever, respiratory symptoms, myalgia, etc.), date of diagnosis, date of discharge, infection route (case contact in public place or in workplace, traveling to Hubei Province, and/or household contact) were extracted as necessary information. The related epidemiological cluster were determined and likewise had epidemiological data extracted, if available. Two researchers independently reviewed the information of each case and entered the data into a standardized reporting sheet to establish a database. Discrepancies were resolved by discussion between the two researchers and facilitated by a third senior researcher to reach a consensus [
14,
15].
Individual data on occupation, underlying diseases, and clinical severity were additionally collected from epidemiological investigation reports of COVID-19 cases provided by China CDC. These data were matched to the publicly available dataset by city, age, gender, reported date, and other overlapped variables. The pooled de-identified data were used for the subsequent analysis. Suspected cases and asymptomatic cases were excluded from the current study. Cases reported in Hubei Province and imported cases from abroad were also excluded due to a lack of detailed exposure information.
Definitions of key variables
(1)
Incubation period: for each case i, let \({T}_i^E\) and \({T}_i^S\) be the exposure (infection) and symptom onset dates, respectively. The incubation period is then \({V}_i^{Inc}={T}_i^S-{T}_i^E\). The exact exposure date is usually not directly observed but rather bounded by an interval, i.e., \({L}_i\le {T}_i^E\le {U}_i\), and the incubation interval is thus bounded by \({T}_i^S-{U}_i\le {V}_i^{Inc}\le {T}_i^S-{L}_i\). We considered the earliest onset of clinical symptoms as the date of symptom onset. The date of exposure usually had the following two situations. First, if a patient had a history of travel to Hubei Province before symptom onset, the starting and ending dates of exposure (Li and Ui) were set as the dates of arriving at and departing from Hubei province, respectively. Second, if a patient was exposed to (a) a confirmed COVID-19 patient, (b) a person who resided in or had traveled to Hubei Province, or (c) a person with known contact with a confirmed COVID-19 case, the starting and ending dates of exposure (Li and Ui) were set as the initial and the last contact dates, respectively.
(2)
Cluster: the case clusters in our crowdsourced data were obtained by contact-tracing. All cases that were determined to be in close contact with each other were defined as a case cluster.
(3)
Primary and secondary cases: we use the earliest symptom onset date in each case cluster as baseline and call it day 0. A local case with symptom onset on days 0 or 1 or an imported case with symptom onset on days 0–3 is considered as a primary case; otherwise, the patient is considered a secondary case. All cases that did not belong to any cluster, were treated as primary cases.
(4)
Transmissibility: the number of secondary cases infected by the primary case within a cluster was a measure of the transmissibility of the primary case. If there were multiple primary cases within a cluster, they were treated as having jointly infected the secondary cases in the cluster.
(5)
Type of residence: Urban area or Rural area as the permanent residence.
(6)
Epidemic phase: Cases were assigned to a phase “Before Level I response was employed” or “After Level I response was employed” based on their onset date, with different Level I response date across different provinces.
(7)
Geographical Location: Northern, Southern, and Central China are defined according to the latitude of the city where the case is reported. Herein Northern China referred to north of 35°N latitude, Central China referred to between 30°N and 35°N and Southern China referred to south of 30°N.
Estimation of incubation period
For local cases and imported cases for whom both the left and right intervals of the incubation period are complete, we respectively fitted 4 commonly used distributions of incubation period (Weibull, Gamma, Loglogistic, and Lognormal) using the package ‘fitdistrplus’ of the statistical software R. In addition, the cases were stratified according to age, duration from onset to discharge, and infection route, and estimated for the incubation periods in a disaggregated way. The optimal fitted distribution for incubation period was determined by AIC (Akaike’s Information Criterion) and was used to calculate the median of incubation period and 95% confidence interval. Based on the optimal distribution, we estimated the conditional probability that the incubation period of each case was greater than 14 days under the condition of their upper and lower intervals of the incubation period, P(t > 14| t > tlower, t < tupper). Use this probability value to randomly classify each case (includes interval-censored data and right-censored data) into a prolonged incubation period group (>14 days) or a normal incubation period group (≤14 days). We repeated this process 10,000 times, and for those with classification into prolonged incubation period group more than 5000 times, the case was grouped as with prolonged incubation period group, otherwise, the case was defined as a normal incubation period group.
Statistical analysis
The baseline characteristics, epidemiological information, and clinical phenotype were compared between local COVID-19 cases with an incubation period of ≤14 and > 14 days. Pearson’s Chi-square test was used for categorical variables and Fisher’s exact test was used when more than 20% of cells of “R×C contingency table” have expected frequencies < 5. Wilcoxon sum-rank test was used to compare continuous variables between the two groups of patients. The changing patterns of the incubation period were profiled over four epidemic periods, by different case characteristics.
An accelerated failure time model (AFT) assuming a Gamma distribution for the incubation period was applied to evaluate the impact of patients’ characteristics on the length of incubation period. The AFT model was implemented using the “survreg” function in the R package “survival” [
16]. In the “survreg” function, the parameters (baseline shape and scale parameters and covariate coefficients) were estimated via the maximum likelihood approach. This model allowed us to analyze the associations between interval-censored response variables and explanatory variables.
A Logistic regression model was used to evaluate the association between incubation period and clinical severity, with sex, age, geographical location, occupation, type of residence, underlying diseases included as covariates. Attribute value frequency (AVF) and Z-score were used to filter out outliers and be evaluated for their influence on the model [
17‐
19].
To evaluate the impact of primary case’s characteristics on the transmissibility of COVID-19, we used the epidemiological cluster as research unit to fit the modified Poisson regression model. The number of secondary cases in a cluster was used as the dependent variable, and the characteristics of the primary cases, mainly comprised of incubation period (normal or prolonged incubation), sex, age, geographical location, type of residence, underlying diseases, epidemic phase, and clinical severity, were used as explanatory variables. If the cases did not report epidemiological association with any other confirmed cases, then transmissibility of “0” was assigned. To reduce the bias caused by incomplete epidemiological surveys information, we determined transmissibility of “0” only for cases that were reported from cities with high-quality epidemiological surveys (herein referred to the cities with over 40% of the total cases defined for their association with other confirmed cases).
Since there could be more than one primary case in a cluster, we modified the ordinary Poisson regression model to represent those multiple primary cases jointly infected secondary cases in a cluster and constructed a new logarithmic likelihood function as follows:
$$\mathit{\ln}L\left(\beta \right)=\sum_{k=1}^n\left[{y}_k\mathit{\ln}\left(\sum_{t=1}^m\mathit{\exp}\left({X_{kt}}^T\beta \right)\right)-\sum_{t=1}^m\mathit{\exp}\left({X_{kt}}^T\beta \right)-\mathit{\ln}\left({y}_k!\right)\right]$$
Where n represents the number of clusters, m represents the number of primary cases in a cluster, XT is the characteristics of primary cases, yk represent the number of secondary cases in a cluster. The maximum likelihood estimation was used to estimate the regression coefficients of this model.
To assess multicollinearity among the model predictors, variance inflation factor (VIF) of each variable was calculated and all VIFs in our models were lower than 2, indicating a very low multicollinearity of them (Supplementary Table
1). All the analyses were performed using R software (version 3.6.3, R Foundation for Statistical Computing, Vienna, Austria).
Discussion
Based on the largest individual-level dataset with detailed epidemiological information, we determined over 10% of COVID-19 patients had a prolonged incubation period of > 14 days. The longer incubation periods seen in our patients might occur for a variety of reasons, but with older age and less severe disease more frequently seen, no matter local cases or imported cases were estimated. For the first time as we know, we found an increasing proportion of patients with prolonged incubation period along with the epidemic, which increased from 4.0 to 10.8% after Level I response measures were administered in local cases. This was accompanied by an increased median incubation period from 3.52 (95% CI: 3.11–3.98) to 6.83 (95% CI: 6.59–7.07). The results indicated that the medical observation period of 14 days that was adopted by most countries, recommended by WHO [
20] was insufficient. Isolation and medical quarantine policies of 2 weeks currently in place may miss the patients with longer incubation period, for whom extra effective management should be adopted. In China, a “14 + 7 + 7” quarantine strategy had been employed for those returning from abroad since late December 2020. This includes 14-day intensive isolation and medical quarantine at the port of entry plus a 7-day medical observation at home and another 7-day health surveillance in the community, or locally stricter medical quarantine measure. Still, it is also important to weigh the potential health benefits of reducing transmission and thus case numbers against high economic and social costs that differ among countries.
Compared to those with an incubation period within 14 days, COVID-19 patients with a prolonged incubation period exceeding 14 days were significantly less severe and accompanied by shorter duration from symptom onset to discharge. This finding was in agreement with the notion that a shorter incubation period of SARS-CoV could be related to a more severe condition due to more aggressive and damaging inflammatory responses [
21‐
24]. Consistent with previous researches [
25,
26], sex did not effect on the length of incubation period. It’s otherwise notable that the elderly > 60 years old tended to have a longer incubation period than younger age groups. Aging can lead to compromised immune response including the immune response to respiratory viruses, which is often related to a longer incubation period [
27,
28], a similar finding for SARS-CoV-1 in 2003 [
29].
A higher proportion of patients with prolonged incubation period was observed in Southern and Central China than other parts of the country, which might be influenced by meteorological factors, e.g., temperature [
30‐
32]. The function of the human immune system could be weakened in a comparatively colder environment such as in Northern China, which may raise the risk of being infected and lead to a short incubation period [
31]. This was also supported by laboratory findings indicating that lower environment temperature might decrease the infection capacity and viral loads of SARS-COV-2 [
32].
The lower transmissibility for those patients with longer incubation periods was also proposed to be related to lower viral loads, that might be decreased during intergenerational transmission.
The incubation period of COVID-19 also depended on how the disease has been acquired. The cases with a longer incubation period were more likely to be infected through the contact in public and working places compared to those with a household contact. Understandably that public and working places have wider or open space, which was related to a lower exposure intensity than household contact which more likely occured under closed settings.
In this study, we analyzed imported cases and locally exposed cases separately, because the self-reported delay from exposure to symptom onset was longer among cases imported from Hubei Province than locally exposed cases. First, the difference could result from inherent heterogeneity in exposure and immunity between travelers and residents due to the possibility that travelers were usually younger than the residents in general. In addition, recall bias could also differ, as travelers likely had more frequent close contacts in a variety of settings.
The study had limitations. Firstly, the classification of a case as a primary or secondary case based only on the time of symptom onset might lead to misclassification, since the date of symptom onset for some primary cases might be late than that of secondary cases. Secondly, we only included the confirmed cases with apparent illness, while asymptomatic cases were excluded due to their undefinable disease onset date, and inaccessible viral shedding data. Thus, our study conclusion cannot be extrapolated to asymptomatic cases. It is the first study that focused on the prolonged incubation of COVID-19 disease, disclosing the wide variation of incubation period of SARS-CoV-2 infection, which could be explained by the difference in the biological heterogeneity of the population and the control measures of certain regions or periods.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.