Background
Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The virus is easily transmissible between humans, with a basic reproduction number around 2–4 depending on the setting [
1,
2]. To date, no vaccine or highly effective pharmaceutical treatment exists against COVID-19. Countries have used a range of non-pharmaceutical interventions (NPIs) such as testing suspected cases followed by isolation of confirmed cases and quarantine of their contacts, physical distancing measures such as schools and workplaces closures, income support for households affected by COVID-19 and associated interventions, and domestic and international travel restrictions [
3]. These interventions aim to prevent infection introduction, contain outbreaks, and reduce peak epidemic size so that healthcare systems do not become overwhelmed. However, these interventions come at a cost. Testing and contact tracing require laboratory and public health resources to be successful at scale, government subsidies affect national budgets, while physical distancing disrupts economic activities and daily life [
4]. Hence, the psychological, social, and economic cost of interventions needs to be balanced against their potential effectiveness in reducing SARS-CoV-2 spread.
Modelling studies suggest that travel restrictions [
5,
6], contact tracing and quarantine [
7,
8], and physical distancing [
9,
10] may delay SARS-CoV-2 spread, based on assumptions about how they may change transmission between individuals in populations. However, the effectiveness of such interventions depends on factors such as societal compliance (e.g. the extent to which people reduce their daily contacts following government restrictions) that are difficult to prospectively measure. Empirical evidence about the effectiveness of specific policy interventions has been limited (see Additional file
1: Table S8 for a review) [
11‐
37]. While several countries have seen disease incidence peak and fall [
38], ascribing changes in transmission to particular interventions is difficult since countries tend to impose combinations of policy changes at different levels of stringency in close temporal sequence.
Several global databases of COVID-19-related policy interventions have been compiled [
39]. Here, we used the regularly updated Oxford COVID-19 Government Response Tracker (OxCGRT) [
3] and conducted panel analysis to understand the association between policy interventions and time-varying reproduction numbers (
Rt), a measure of the rate of transmission of an infectious disease in a population. We also explore whether this relationship is modulated by definitions of policy interventions, temporal lags, and population characteristics in different countries.
Methods
Data on NPIs and Rt
Data on COVID-19-related NPI intensity from 1 January to 22 June 2020 was extracted on 5 July 2020 from version 5 of OxCGRT, based on the codebook version 2.2 (22 April 2020) [
3]. This version contains publicly available information from 178 countries and territories on 18 NPI categories. We further divided these countries and territories into seven regions according to the World Bank classification [
40]. Note that these 18 NPI categories are broad, so many specific policy interventions (e.g. facial covering mandates) are not independently coded in the database. See Additional file
1: Table S1 for further metadata.
From this database, we removed (i) “miscellaneous” policies as they contained no data at the time of our data extraction; (ii) “giving international support” and “investment in vaccines” policies as they did not on face validity have a causal pathway to influence local SARS-CoV-2 transmission within the timescale of the analysis; (iii) “fiscal measures” and “emergency investment in healthcare” policies as both the start and the duration of their effect is often unclear (e.g. the announcement of an investment may be implemented weeks later; funding that is allocated may be spent over a long time); and (iv) data after 22 June 2020 because > 10% of countries and territories have missing data after this date (see Additional file
1: Figure S1) [3]. Missing data fields on or before 22 June 2020 were imputed by (a) carrying forward or backwards the next or last non-missing observation when missingness occurred at the two tails of the time-series or (b) linearly interpolating using non-missing observations when missingness does not occur at the two tails of the time series. We divided the remaining 13 policy interventions into four policy groups roughly consistent with the original database (Table
1).
Table 1
Thirteen types of NPIs from OxCGRT, their general categorisations, and the coding schema used in our analysis to quantify their intensity
Internal containment and closure | School closure; workplace closure; cancellation of public events; limits on gathering sizes; closure of public transport; stay-at-home requirement; internal movement restriction | Any effort scenario: NPIs are binary variables, considered “present” as long as any (non-zero) effort is made. Maximum effort scenario: NPIs are binary variables, considered “present” only if the maximum effort is made. For example, an intervention X has levels 0–3. A record at level 2 is converted to 1 under any effort and 0 under maximum effort scenarios. |
International travel restrictions | International movement restriction |
Economic policies | Income support; debt/contract relief for households |
Health systems policies | Public information campaign; testing policy; contact tracing |
Most NPIs in the database are measured on ordinal scales that capture intensity (e.g. 0 = no contact tracing; 1 = limited contact tracing; 2 = comprehensive contact tracing). Since the intervals between categories are not necessarily equally spaced, we converted NPI history into binary variables under two scenarios: (i)
any effort scenario: all zero records were converted to 0, and non-zero records were converted to 1, and (ii)
maximum effort scenario: all non-maximum records were converted to 0, and all records at maximum levels were converted to 1 (see Table
1).
Transmission of SARS-CoV-2 is routinely measured using the time-varying reproduction number (
Rt), a metric which represents the mean number of secondary cases that arise from one index case. We used the median
Rt estimates available through EpiForecasts [
https://epiforecasts.io/], a publicly available repository of
Rt estimates for many countries. The estimation process is based on reported incidence while accounting for a range of uncertainties surrounding the incubation period, the delays between symptom onset and reporting [
41]. The underlying method has been detailed in Cori et al. [
42]. In short, the transmission rate of an infectious disease is approximated by the ratio between new infections at time
t and the infectious individuals at time
t − w where
w is the associated time window. In EpiForecasts, a weekly time window is used. This measure is expected to fall when effective NPIs reduce the rate of SARS-CoV-2 transmission. Since the effects of some NPIs may take time to become evident, we explored a range of temporal lag effects between NPI implementation and
Rt changes.
Between 1 January and 22 June 2020, data on NPIs and Rt are simultaneously available for 130 countries and territories, all of which are used in the panel analysis described below.
Understanding the temporal patterns
The effect of an NPI on Rt may vary over time as a result of the evolving epidemic dynamics (e.g. decreasing number of susceptibles) or time-varying factors such as public compliance (e.g. the proportion of shoppers wearing facial coverings after government mandates). To examine this effect, we split up the time series of NPIs and Rt values into two parts: before and after peak NPI intensity. This was a sensitivity analysis to examine the robustness of NPIs’ effectiveness in reducing COVID-19 transmission over time.
We used OxCGRT’s stringency index (SI), a combined metric of several behaviour-related NPI measures, to determine peak NPI intensity. We then fitted a Gaussian generalised additive model (GAM) with cubic splines, using SI as the response variable and date as the sole explanatory variable for each World Bank region (i.e. the predicted regional SI is informed by all stringency index time-series within it). The peak of the predicted SI splines for each region was then examined to derive an average peak across all the regions. We then constructed two time-series: (i) the full time series and (ii) the truncated time series up to the time of peak SI.
We examined temporal clustering among different NPIs to identify potential structural confounding. If two effective NPIs are temporally clustered, one may be removed due to multicollinearity, which should not by any means be interpreted as that NPI being ineffective. Similarly, if one effective and one ineffective NPI are temporally clustered, the statistical association between the effective NPI and reductions in Rt may create a statistical artefact whereby the ineffective NPI may also appear to be associated with reductions in Rt. Either way, the existence of temporal clustering could cause misinterpretation of the regression results unless it is accounted for.
To investigate the temporal clustering patterns, we conducted hierarchical cluster analysis using Ward’s method [
43], which minimises within-cluster variance, under the
any effort scenario and the
maximum effort scenario. The inputs of the hierarchical clustering process were 13 vectors (one for each NPI under consideration), with each vector element corresponding the NPI status aligned by a unique time and location. Euclidean distance was used as the distance function between each pair of NPIs, using all available data (i.e. the full time-series for each NPI). We chose this method to compare the entire time-series of the NPIs, without having to select time-series summary metrics (e.g. the timing when an NPI was implemented) a priori. We then used multi-scale bootstrapping (
n = 10,000) to test the statistical significance of the identified clusters, defined using approximate unbiased
p values less than 0.05 [
44]. The complete implementation of this method can be found in the GitHub repository at [
https://github.com/yangclaraliu/COVID19_NPIs_vs_Rt].
Panel analyses
We used panel (or longitudinal) regression to study the association between NPI intensity and
Rt, treating the time-series of NPI intensity and
Rt in each country as observations of an individual in a panel. We used a linear fixed effects model:
$$ {R}_{it}={\alpha}_i+\sum \beta {X}_{it}+{u}_{it} $$
where
Rit is the time-varying reproduction number of location
i at time
t,
αi is a location-specific intercept (assumed to remain constant over the timescale of the analysis),
βXit represents the 13 NPIs and their corresponding coefficients, and
uit is the error term. The decision to use a fixed-effects model with individual intercept (as opposed to a random-effects model) was based on the results of the Durbin-Wu-Hausman test [
45]. In other words, there is insufficient evidence to support a random effect model based on global data, and the effects of each NPI on
Rt can be characterised by fixed estimators.
We investigated the appropriate temporal lag between NPI intensity and Rt. To do this, we calculated the deviance (natural logarithm of the sum of squared residuals divided by the number of data points) assuming errors are normally distributed for temporal lags of 1 to 21 days. Smaller deviances indicate temporal lags that provide better model fits. A temporal lag of k days regresses on Rt a particular day against NPIs implemented k days before (i.e. Xi(t − k)). This analysis was carried out at both the regional and global levels. Data from North America and South Asia were excluded from region-specific temporal lag analyses due to small sample sizes.
Stepwise backwards variable selection based on Akaike and Bayesian Information Criterion (AIC or BIC) was then used to choose the most parsimonious model. Beginning with the full model (13 independent variables, one for each NPI), independent variables were removed one at a time sequentially. We also validate our results using univariable analyses and a forward variable selection algorithm.
Statistical interpretation
For both the
any effort and the
maximum effort scenarios, we examined a range of model specifications including (i) different variable selection criteria: AIC and BIC, (ii) different temporal lags between the timing of NPIs and changes in
Rt (selected based on deviance from the analysis of temporal patterns), and (iii) different time series lengths: one ending on 22 June 2020 and the other truncated to 13 April 2020, when NPI intensity peaked (on average). We then defined categories of “evidence strength” behind each association according to Table
2. For example, if an NPI has significantly negative effects on
Rt in all but one model set-up (i.e. one of model selection criteria, temporal lags, and time-series length mentioned above), that NPI is considered to have moderate strength evidence, as long as no other NPI in the same temporal cluster has significantly positive effects on
Rt. Allocating each NPI to an evidence category was done independently by two authors (YL and MJ), with differences resolved by discussion.
Table 2
Expert interpretation of evidence from the statistical associations of each NPI with reductions in Rt
Strong | Selected and significant with intended effect signs (i.e. negative) regardless of model specifications (i.e. variable selection criteria, temporal lags, and time-series lengths). | Not in a temporal cluster with any NPI with significantly positive effect estimates. |
Moderate | Selected and significant with intended effect sign (i.e. negative) in two of three model specification dimensions (i.e. variable selection criteria, temporal lags, and time-series lengths), and non-selected or non-significant in the remaining dimension.* |
Weak | Not strong or moderate |
Discussion
Our study used panel regression to examine the temporal association between NPIs that countries introduced in response to the COVID-19 pandemic, and its rate of transmission in populations, represented by Rt. We explored how the association is modified according to the following model specifications: (i) level of NPI intensity (i.e. any vs maximum scenarios), (ii) model selection criteria (i.e. AIC vs BIC), (iii) varying lag effects, and (iv) different time-series lengths (i.e. truncated vs. full time-series).
We found the strength of evidence behind an association between NPIs and
Rt depended on these model specifications. Only two NPIs (school closure, internal movement restrictions) showed unequivocal evidence of being associated with a decrease in
Rt regardless of the assumptions made. Whether schools should stay closed has attracted debate. Keeping schools closed could potentially hurt children’s educational development and general wellbeing. Resuming schools, on the other hand, may increase COVID-19 transmission risks for both students and teachers. Our findings are consistent with much existing literature—although school closures cannot single-handedly suppress an outbreak, they are generally effective in terms of reducing transmission [
49,
50].
We found evidence that internal movement restrictions reduced
Rt, but no evidence of a similar effect for international travel restrictions. The latter is consistent with Russell et al., which shows international movement restrictions have a limited impact on the epidemic dynamics of COVID-19 for most countries [
51]. This difference may be explained by the types of movement interrupted—internal movement restrictions interrupt trips of all lengths whereas international movement restrictions only disrupt longer trips, which are much less common. Additionally, internal and international movement restrictions were likely used in different epidemic contexts—internal movement restrictions tend to be used more often to prevent outbreaks from escalating whereas international travel restrictions make more sense in preventing infection introduction [
52]. The latter effect is not well represented in our data since
Rt can only be estimated in settings with existing COVID-19 outbreaks (i.e. after introduction).
There are differences in the strength and direction of the effects of some NPIs (such as public transport closure and stay-at-home requirements) depending on whether the whole time series of data was used, or only data up to the date of peak NPI stringency (13 April 2020). This may indicate that these NPIs might have different effects at the start of the pandemic compared to later on, so when the NPIs were removed (likely after the peak), Rt did not return to its original level before the introduction of the NPIs.
The best-fitting models also support a considerable delay between NPIs and their effect on transmission. This delay is about a week on average but differs widely between regions. It could reflect delays between policies being put in place and actual behaviour change. It could also reflect delays in reporting, although these are explicitly accounted for in the
Rt estimation in EpiForecasts—the same onset-to-delay distribution is applied in all countries [
41] and hence may not reflect differences between settings. Delays of up to 3 weeks between policy changes and changes in reported cases have been documented [
53].
We were not able to find evidence that supports the effectiveness of contact tracing and testing policies. This may be because both contact tracing and testing policies could lead to more cases being reported, as well as interrupting onward transmission, so the overall effect is the combination of these two opposing effects. While calculating the
Rt, EpiForecasts does not explicitly account for changes in reporting rate [
41]. Another potential explanation is the way NPIs are reported in the OxCGRT, which largely relies on publicly available data sources, such as news articles. Contact tracing and test policies are both well-known public health intervention tools and have minimal impacts on the lives of those who are not potentially infected. Thus, they may be less likely to receive media coverage, compared to more disruptive NPIs such as workplace closures.
We focused our discussion on the direction and relative magnitude of the estimated effect of different NPIs, within the context of their temporal clusters during the on-going COVID-19 pandemic. The actual values of NPI-specific effect sizes, which were found to be greater for “School Closures” and “Workplace Closures” under the any effort scenario and for “Cancellation of Public Events” and “Income Support” under the maximum effort scenario, should be interpreted with caution. Given the statistical approach and the ecological design of the study, these numerical values are difficult to interpret due to structural confounding. For example, when a temporal cluster was effective in reducing Rt, we were not able to confidently attribute the effects to particular NPIs within the cluster. As the pandemic progresses, data on more diverse NPI implementation profiles and outcomes may become available, enabling more precise determination of effect sizes.
Many other papers have explored the impact of physical distancing measures on SARS-CoV-2 transmission. Prospective mechanistic transmission models have explicitly modelled contacts relevant to viral transmission between individuals in different subgroups (e.g. ages), as well as the impact that NPIs may have on these contacts. Such studies mainly use data from a single location only such as Wuhan [
9], Hong Kong [
54], the USA [
55], and the UK [
50]. They suggest that physical distancing interventions can have a large impact on transmission. While the impact of income-related interventions has been less well studied, country reports suggest that they often play an important role in ensuring adherence to distancing measures [
56].
Another group of studies have used empirical data to retrospectively examine whether NPIs have been effective in reducing transmission, using either statistical methods or mechanistic epidemiological models. Many such studies look at single interventions such as travel restrictions [
25] or “lockdowns” [
22,
27]. Therefore, they are less useful to policy-makers wanting to establish which of a basket of NPIs are most effective.
Only a small number have looked at multiple interventions across multiple countries (see Additional File
1: Table S8 for a review). These relate NPIs from databases to proxies of transmission such as
Rt estimated from cases and/or deaths [
18,
21,
37], or the rate of change in cases directly [
13,
16,
57]. Our work demonstrates the major challenges that all such studies (including ours) face—NPI introductions are highly correlated in time, so it is difficult to independently identify the effect of each NPI due to structural confounding. A few studies partially account for this using techniques such as examining whether the number of NPIs that had already been implemented affects the impact of subsequent NPIs [
11] or excluding statistically non-significant variables after all NPIs are included initially [
12].
Our study extends previous work to address this problem in several ways. Firstly, we use data across a larger number of countries and territories and longer time series (January–June 2020), enhancing the power to detect independent effects even when there is partial collinearity. Second, instead of assuming that all NPIs tested have an effect like previous work, we conduct variable selection to identify only those NPIs that are retained in parsimonious models. Third, we conduct cluster analysis to explicitly identify temporal correlations, and use this in our interpretation of the strength of evidence behind each intervention. Fourth, we have conducted sensitivity analyses across a range of model specifications around the variable selection criteria, temporal lag between NPIs and change in transmission, temporal truncation, and the way NPI intensity is coded.
Nonetheless, our study also has several limitations. First, besides the information bias in the NPIs database discussed above, the coding scheme may also introduce potential bias. For example, NPIs coded as “comprehensive contact tracing for all identified cases” may have different implications in different countries. Effectiveness of contact tracing in places like Singapore [
58] may be masked by seemingly similar but realistically non-comparable contact tracing programmes elsewhere. Second, compared to daily incidence,
Rt estimates are much more suitable for cross-country comparisons and thus are used as the metric for COVID-19 transmission in this study. However, these estimates are based on a series of assumptions that may not always be appropriate. For example, the underlying methods assume constant case ascertainment rates over the 12-week time window (March–June 2020) over which our analysis takes place. Consequently, declines in
Rt over time may have been obscured by improvements in case ascertainment, leading to some effective NPIs appearing ineffective in our analyses. We have partially adjusted for this by giving weight in our interpretation only to NPIs whose effect direction is robust to changes in the time-series length. Another limitation is that our model also does not propagate uncertainty around
Rt estimates. Third, although we examined a wide range of NPIs, we did not include any potential interactions in the current model. Such interaction is a possibility, e.g. more people may comply with workplace closures when receiving income support. Future research should look into these relationships. Last but not the least, although OxCGRT is one of the most comprehensive databases of COVID-19-related NPIs to our knowledge, it does not capture individual behaviour such as face-covering use in public spaces. Thus, we were not able to assess the effectiveness of such measures in reducing COVID-19. Such behavioural measures may prove crucial to controlling COVID-19 epidemics, so analyses of datasets that capture adherence to these measures (e.g. survey of public behaviours [
59]) may yield important insights in the future.
Acknowledgements
Funding information for the Centre for Mathematical Modelling of Infectious Disease COVID-19 Working Group: James Munday (Wellcome Trust: 210758/Z/18/Z); Hamish Gibbs (UK DHSC/UK Aid/NIHR: ITCRZ 03010); Carl A B Pearson (BMGF: NTD Modelling Consortium OPP1184344, DFID/Wellcome Trust: 221303/Z/20/Z); Kiesha Prem (BMGF: INV-003174, European Commission: 101003688); Quentin J Leclerc (UK MRC: LID DTP MR/N013638/1); Sophie R Meakin (Wellcome Trust: 210758/Z/18/Z); W John Edmunds (European Commission: 101003688, UK MRC: MC_PC_19065, NIHR: PR-OD-1017-20002); Christopher I Jarvis (Global Challenges Research Fund: ES/P010873/1); Amy Gimma (Global Challenges Research Fund: ES/P010873/1, UK MRC: MC_PC_19065); Sebastian Funk (Wellcome Trust: 210758/Z/18/Z); Matthew Quaife (ERC Starting Grant: #757699, BMGF: INV-001754); Timothy W Russell (Wellcome Trust: 206250/Z/17/Z); Jon C Emery (ERC Starting Grant: #757699); Sam Abbott (Wellcome Trust: 210758/Z/18/Z); Joel Hellewell (Wellcome Trust: 210758/Z/18/Z); Rein M G J Houben (ERC Starting Grant: #757699); Kathleen O’Reilly (BMGF: OPP1191821); Georgia R Gore-Langton (UK MRC: LID DTP MR/N013638/1); Adam J Kucharski (Wellcome Trust: 206250/Z/17/Z); Megan Auzenbergs (BMGF: OPP1191821); Billy J Quilty (NIHR: 16/137/109, NIHR: 16/136/46); Thibaut Jombart (Global Challenges Research Fund: ES/P010873/1, UK Public Health Rapid Support Team, NIHR: Health Protection Research Unit for Modelling Methodology HPRU-2012-10096, UK MRC: MC_PC_19065); Alicia Rosello (NIHR: PR-OD-1017-20002); Oliver Brady (Wellcome Trust: 206471/Z/17/Z); Kevin van Zandvoort (Elrha R2HC/UK DFID/Wellcome Trust/NIHR, DFID/Wellcome Trust: Epidemic Preparedness Coronavirus research programme 221303/Z/20/Z); James W Rudge (DTRA: HDTRA1-18-1-0051); Akira Endo (Nakajima Foundation, Alan Turing Institute); Kaja Abbas (BMGF: OPP1157270); Fiona Yueqian Sun (NIHR: 16/137/109); Simon R Procter (BMGF: OPP1180644); Samuel Clifford (Wellcome Trust: 208812/Z/17/Z, UK MRC: MC_PC_19065); Nicholas G. Davies (NIHR: Health Protection Research Unit for Immunisation NIHR200929, UK MRC: MC_PC_19065); Charlie Diamond (NIHR: 16/137/109); Rosanna C Barnard (European Commission: 101003688); Rosalind M Eggo (HDR UK: MR/S003975/1, UK MRC: MC_PC_19065); Emily S Nightingale (BMGF: OPP1183986); David Simons (BBSRC LIDP: BB/M009513/1); Katharine Sherratt (Wellcome Trust: 210758/Z/18/Z); Graham Medley (BMGF: NTD Modelling Consortium OPP1184344); Gwenan M Knight (UK MRC: MR/P014658/1); Stefan Flasche (Wellcome Trust: 208812/Z/17/Z); Nikos I Bosse (Wellcome Trust: 210758/Z/18/Z); Petra Klepac (Royal Society: RP\EA\180004, European Commission: 101003688).
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.