Background
On May 7th, 2022, a case of monkeypox (mpox) with recent travel to Nigeria was reported in England [
1,
2]. Shortly after, the Centers for Disease Control and Prevention (CDC) identified a case of monkeypox in Massachusetts on May 15th, 2022 [
3]. Since then, multiple nations, including the U.S., have reported a surge in monkeypox cases, mostly among males within the communities of gay, bisexual, and other men who have sex with men (MSM) [
4‐
6]. The World Health Organization (WHO) declared monkeypox a global health emergency on July 23rd, 2022 [
7]. As of November 23rd, 2022, over 79,900 monkeypox cases have been reported in non-endemic countries, especially in the USA, Spain, and Brazil, during the ongoing outbreak [
8]. Given that this is an emerging infection in non-endemic countries, with little historical information about how outbreaks might unfold, mathematical models can help generate real-time forecasts of the trajectory of the epidemics and guide public health measures appropriate for a given geographic setting.
Monkeypox is an endemic zoonotic virus in Africa, most similar in clinical presentation to the Variola (Smallpox) virus [
9]. Both are part of the
Orthopoxvirus genus, which includes other viruses such as cowpox and Vaccinia virus—used in smallpox vaccines [
9,
10]. Monkeypox symptoms include, but are not limited to, flu-like symptoms followed by a raised rash on the face and extremities. The incubation period is usually 6–13 days, with a symptomatic period ranging from 2 to 4 weeks [
9]. Fortunately, monkeypox is not an airborne pathogen. Instead, transmission is mainly driven by prolonged close contact with infected individuals or direct contact with skin lesions, respiratory secretions, or recently contaminated objects—a feature that may facilitate control through basic public health measures [
9].
The ability to forecast country-specific epidemic trajectories is particularly useful in an outbreak like this, which paints a unique epidemiological picture compared to past outbreaks within both endemic and non-endemic countries [
4,
6,
11,
12]. For instance, sexual and intimate contact, specifically between men, has driven the great majority of infections [
5,
13,
14]. Likewise, over 98% of cases in the USA and Spain are male, and most have identified as MSM [
13,
15]. Cases of the ongoing outbreak are less likely to report prodromal symptoms, and rash occurs most frequently in the genital region [
4,
13]. The scale of community spread is unprecedented [
16].
While much has been learned about the epidemiology of this emerging outbreak during the last few months, substantial uncertainties remain about the effect of several variables on the epidemic trajectory, including the frequency and role of asymptomatic individuals, the role of pre-existing immunity from previous smallpox immunization campaigns, and the efficacy of available vaccines [
17]. In this context, semi-mechanistic growth models are especially suitable for conducting short-term forecasts to guide response efforts and evaluate the impact of control measures, including behavior changes that mitigate transmission rates, contact tracing, and vaccination, on growth trends [
18]. Previous real-time forecasting studies have employed a variety of mathematical models in the context of influenza, SARS, Ebola, and COVID-19 [
19‐
23]. Here, we employ an ensemble sub-epidemic modeling framework to characterize epidemic trajectories that result from sub-epidemics aggregation through an optimization process [
19]. This framework has yielded competitive performance in short-term forecasts of various infectious disease outbreaks [
19,
23]. In this study, we generate 4-week ahead forecasts of laboratory-confirmed cases of monkeypox in near real-time at the global level and for nations that have reported the great majority of the cases: Brazil, Canada, England, France, Germany, Spain, and the USA. We also evaluate the model fit and performance of the forecasts based on the mean absolute error (MAE), mean square error (MSE), 95% prediction interval coverage (PI), and weighted interval score (WIS).
Methods
Data
We obtained weekly updates of the daily confirmed monkeypox cases by reporting date from publicly available sources by the CDC and the Our World in Data (OWID) GitHub repository [
24,
25]. At the global level and for countries that have reported the great majority of the cases, including Brazil, Canada, England, France, Germany, Spain, and the USA, we retrieved daily case series from the GitHub Our World in Data (OWID) repository [
8,
25]. We reported forecasts based on CDC and OWID team data for the USA. The CDC and OWID data sources define a confirmed case as a person with a laboratory-confirmed case of monkeypox [
26,
27]. Data were downloaded every Wednesday evening from the CDC and every Friday afternoon from the GitHub Our World in Data (OWID) repository from the week of July 28th, 2022, through the week of October 13th, 2022. For the week of July 28th, 2022, data posted by the OWID team on August 9th, 2022, was used to produce the forecast as it was the earliest data available.
The n-sub-epidemic modeling framework
A detailed description of our modeling framework is given in ref. [
19]. In this n-sub-epidemic modeling framework, epidemic trajectories are modeled as the aggregation of overlapping and asynchronous sub-epidemics. A sub-epidemic follows the 3-parameter generalized-logistic growth model (GLM), which has displayed competitive performance [
28‐
30]. This model is given by the following differential equation:
$$\frac{dC(t)}{dt}={C}^{\prime }(t)=r{C}^p(t)\left(1-\frac{C(t)}{K_0}\right),$$
where
C(
t) denotes the cumulative number of cases at time t and
\(\frac{dC(t)}{dt}\) describes the curve of daily cases over time
t. The parameter
r is positive, denoting the growth rate per unit of time,
K0 is the final outbreak size, and
p ∈ [0, 1] is the “scaling of growth” parameter which allows the model to capture early sub-exponential and exponential growth patterns. If =0 , this equation describes a constant number of new cases over time, while
p = 1 indicates that the early growth phase is exponential. Intermediate values of
p (0 <
p < 1) describe early sub-exponential (e.g., polynomial) growth dynamics.
An n-sub-epidemic trajectory comprises n overlapping sub-epidemics and is given by the following system of coupled differential equations:
$$\frac{d{C}_i(t)}{dt}={C_i}^{\prime }(t)={A}_i(t){r}_i{C_i}^{p_i}(t)\left(1-\frac{C_i(t)}{{K_0}_i}\right),$$
where
Ci(
t) tracks the cumulative number of cases for sub-epidemic
i, and the parameters that characterize the shape of the
ith sub-epidemic are given by (
ri,
pi,
K0i), for
i = 1, …,
n. Thus, the 1-sub-epidemic model is equivalent to the generalized growth model described above. When
n > 1, we model the onset timing of the (
i + 1)
th sub-epidemic, where (
i + 1) ≤
n, by employing an indicator variable given by
Ai(
t) so that the (
i + 1)
th sub-epidemic is triggered when the cumulative curve of the
ith sub-epidemic exceeds
Cthr.
The (i + 1)th sub-epidemic is only triggered when Cthr ≤ K0i. Then, we have:
$${A}_i(t)=\left\{\begin{array}{c}1,{C}_{i-1}(t)>{C}_{thr}\\ {}\ \\ {}0, Otherwise\end{array}\right.i=2,\dots n,$$
where
A1(
t) = 1 for the first sub-epidemic. Hence, the total number of parameters that are needed to model an
n-sub-epidemic trajectory is given by 3
n + 1. The initial number of cases is given by
C1(0) =
I0, where
I0is the initial number of cases in the observed data. The cumulative curve of the
n-sub-epidemic trajectory is given by:
$$C_{tot}(t)=\sum_{i=1}^nC_i(t).$$
Hence, this modeling framework is suitable for diverse epidemic patterns including those characterized by multiple peaks.
Parameter estimation for the n-sub-epidemic model
The time series of new weekly monkeypox cases are denoted by:
\({y}_{t_j=}{y}_{t_1,}{y}_{t_2},\dots, {y}_{t_{n_d}}\)where j = 1, 2, …, nd
Here,
tj are the time points for the time series data,
nd is the number of observations. Using these case series, we estimate a total of 3
n + 1 model parameters, namely
Θ = (
Cthr,
r1,
p1,
K01, …,
rn,
pn,
K0n). Let
f(
t,
Θ) denote the expected curve of new monkeypox cases of the epidemic’s trajectory. We can estimate model parameters by fitting the model solution to the observed data via nonlinear least squares [
31] or via maximum likelihood estimation assuming a specific error structure [
32]. For nonlinear least squares, this is achieved by searching for the set of parameters
\(\hat{\varTheta}\)that minimizes the sum of squared differences between the observed data
\({y}_{t_j=}{y}_{t_1,}{y}_{t_2}\dots ..{y}_{t_{n_d}}\)and the model mean
f(
t,
Θ). That is,
Θ = (
Cthr,
r1,
p1,
K01, …,
rn,
pn,
K0n) is estimated by
\(\hat{\varTheta}=\mathit{\arg}\mathit{\min }\ {\sum}_{j=1}^{n_d}{\left(f\left({t}_j,\varTheta \right)-{y}_{t_j}\right)}^2\).
We quantify parameter uncertainty using a bootstrapping approach described in [
33], which allows the computation of standard errors and related statistics in the absence of closed-form solutions. To that end, we use the best-fit model
\(f\left(t,\hat{\varTheta}\right)\) to generate
B-times replicated simulated datasets of size
nd, where the observation at time
tjis sampled from a normal distribution with mean
\(f\left({t}_j,\hat{\varTheta}\right)\) and variance
\(\frac{\sum_{j=1}^{n_d}{\left(f\left({t}_j,\hat{\varTheta}\right)-{y}_{t_j}\right)}^2}{n_d-\left(3n+1\right)}\). Then, we refit the model to each
B simulated dataset to re-estimate each parameter. The new parameter estimates for each realization are denoted by
\({\hat{\varTheta}}_b\)where
b = 1, 2, …,
B. Using the sets of re-estimated parameters
\(\left({\hat{\varTheta}}_b\right),\) the empirical distribution of each estimate can be characterized, and the resulting uncertainty around the model fit can similarly be obtained from
\(f\left(t,{\hat{\varTheta}}_1\right),\)\(f\left(t,{\hat{\varTheta}}_2\right),\dots, f\left(t,{\hat{\varTheta}}_B\right)\). We run the calibrated model forward in time to generate short-term forecasts with quantified uncertainty.
Selecting the top-ranked sub-epidemic models
We used the
AICc values of the set of best fit models based on one and two subepidemics to select the top-ranked sub-epidemic models. We ranked the models from best to worst according to their
AICc values, which is given by [
34,
35]:
$${AIC}_c={n}_d\mathit{\log}(SSE)+2m+\frac{2m\left(m+1\right)}{n_d-m-1}$$
where
\(SSE={\sum}_{j=1}^{n_d}{\left(f\left({t}_j,\hat{\varTheta}\right)-{y}_{t_j}\right)}^2\),
m = 3
n + 1 is the number of model parameters, and
nd is the number of data points. Parameters from the above formula for
AICc are estimated from the nonlinear least-squares fit, which implicitly assumes normal distribution for error.
Constructing ensemble n-sub-epidemic models
We generate ensemble models from the weighted combination of the highest-ranking sub-epidemic models as deemed by the \({AIC}_{c_i}\) for the ith ranked model where \({AIC}_{c_1}\le \dots \le {AIC}_{c_I}\) and i = 1, …, I. An ensemble derived from the top-ranking I models is denoted by Ensemble(I). Thus, Ensemble (2) refers to the ensemble model generated from the weighted combination of the top-ranking 2 sub-epidemic models. We compute the weight wi for the ith model, i = 1, … , I, where ∑wi = 1 as follows:
\(w_i=\frac{l_i}{l_1+l_2+\dots+l_I}\;for\;all\;i=1,2,\dots,I,\)
where
li is the relative likelihood of model
i, which is given by
\({l}_i={e}^{\left(\left({AIC}_{min}-{AIC}_i\right)/2\right)}\) [
36], and hence
wI ≤ … ≤
w1 . The prediction intervals based on the ensemble model can be obtained using a bootstrap approach similar as before. We employed the first-ranked and the second-ranked models to derive the ensemble forecasts.
AICc values of the top models for the most recent forecast can be found in figure 1s (Additional file
1) [
24,
25].
Forecasting strategy
Using a 10-week calibration period for each model, we have conducted 324 real-time weekly sequential 4-week ahead forecasts across studied areas and models (week of July 28th–week of October 13th, 2022) thus far. At the national and global levels, we also report forecasting performance metrics for 8 sequential forecasting periods covering the weeks of July 28th, 2022, through September 15, 2022, for which data was available to assess the 4-week ahead forecasts. We also compare the predicted cumulative cases for the 4-week forecasts across models for a given setting. Cumulative cases for a given model were calculated as the sum of median number of new cases predicted during the 4-week forecast. Forecasts were evaluated using data reported during the week of October 13th, 2022.
Across geographic areas, we assessed the quality of our model fit and performance of the short-term forecasts for each model by using four standard performance metrics: the mean squared error (MSE) [
37], the mean absolute error (MAE) [
38], the coverage of the 95% prediction interval (PI) [
37], and the weighted interval score (WIS) [
20,
39]. While MSE and MAE assess the average deviations of the mean model fit to the observed data, the coverage of the 95% PI and the weighted interval score consider the uncertainty of the forecasts.
Discussion
We report results from short-term (4 weeks ahead) forecasts of monkeypox cases using a sub-epidemic modeling framework for the world and seven countries that, at the time this study began, had reported the great majority of cases. Our forecasts continue to support an overall declining trend in the number of new cases of monkeypox at the global and country-specific levels. Based on the top-ranked model and weighted ensemble model, we predict that during the next 4 weeks (the week of October 20th, 2022, through the week of November 10, 2022), a total 6232 (95% PI 487.8, 12,468.0, and 95% PI 492.8, 12,463.1) cases of monkeypox could be added globally. At the country level, our top-ranked model indicates that the highest number of new cases will be reported in the USA (OWID data) (median 1806, 95% PI 0, 5545) followed by Brazil (median 879, 95% PI 25, 1952) and Spain (median 43, 95% PI 0, 316). Overall, our models have performed reasonably well across study areas. The top-ranked and weighted ensemble model outperformed other models on average in forecasting performance.Our results offer valuable information to policymakers to guide the continued allocation of resources and inform mitigation efforts. More broadly, findings suggest that the epidemic could be brought under near-complete control in some regions should public health measures continue to be sustained, especially among the high-risk groups [
42‐
44]. Indeed, a core group of higher-risk people is thought to disproportionately contribute to transmission and thereby sustain sexually transmitted infection (STI) epidemics. Monkeypox is inherently different from other STIs like HIV, which has a lifelong duration, or bacterial STIs, which can be acquired repeatedly. Cases may decline rapidly as immunity increases among core group members, either due to infection or vaccination. Without a core group driving the epidemic, monkeypox may become endemic with low transmission levels [
45‐
47]. The current monkeypox outbreak is unprecedented in size and geographic scope. As of November 23rd, 2022, a total of 110 countries globally have reported monkeypox cases at 80,899 [
8]. Only seven countries have historically reported monkeypox cases indicating that more than 93% of the countries reporting cases are non-endemic to monkeypox [
8]. Our latest short-term forecasts from top-ranked models and weighted ensemble models conducted in near real-time indicate a clear, continued slow down in the number of new cases globally and in each country included in the study. This mirrors the recent continental declines in cases reported for Europe and the Americas [
43,
48,
49].
Findings support the significant impact of current measures to contain the outbreak in different areas. For example, in the USA, the primary strategy has been a combination of increasing education around monkeypox (e.g., symptoms, transmission), encouraging practices that reduce potential close contacts and increasing access to vaccination and testing for high-risk groups [
50‐
52]. Although supply-chain shortages impacted early vaccine access, as of August 26th, 2022, the availability of monkeypox vaccines had increased to sufficient levels to combat the outbreak. The racial disparities in access to vaccines that arose in late August are continuing to persist [
53‐
55] though some progress has been made in improving the monkeypox vaccination among racial and ethnic minority groups in the USA. For example, according to a recent morbidity and mortality weekly report (MMWR), between May 22–June 25 and July 31–October 10, 2022, the proportion of monkeypox vaccine recipients increased from 15 to 23% among Hispanic and from 6 to 13% among Black population [
56]. In addition, although vaccines are more recently available, behavioral modification within high-risk groups appears to be driving declines in cases. Continuing these behaviors (e.g., limiting one-time sexual encounters) is crucial in slowing the transmission of monkeypox [
5,
42,
57]. Based on the early evidence from Europe, the World Health Organization (WHO) is quite optimistic that the current outbreak of monkeypox can be contained with the improvement in vaccine supply chains in addition to early detection of the cases and educational interventions which lead to behavioral modifications in the high-risk groups [
58]. Moreover, recent research indicates that the majority of monkeypox cases resulting in severe disease or death have been among MSM with compromised immune systems (e.g., due to untreated HIV infection). Specifically, increased burden has been noted among Black populations, and those experiencing mental health challenges or housing insecurities, reflecting the existing inequities in access to resources diagnosis, treatment, and prevention of monkeypox [
59].
Our study is not exempt from limitations. First, our analysis relied on weekly time series data of lab-confirmed monkeypox cases from two sources, which display irregular daily reporting patterns [
24,
25]. These sources use different approaches in compiling data and addressing data issues, which affect the characterization of the epidemic curve. For example, the CDC data uses cases with reporting data that includes either the positive laboratory test report date, CDC call center reporting date, or case data entry date into CDC’s emergency response common operating platform [
24]. The OWID team uses laboratory-confirmed case data reported to the World Health Organization via WHO Member States [
25,
60]. In addition, the data used for forecasting could also be underestimated due to delays between the date of testing and the date of reporting. Also, the weekly data used in our real-time forecasts has exhibited revisions that retrospectively adjusted the time series. Hence, significant increases or decreases may be observed in reported cases for the same date between forecasting periods. Indeed, the CDC acknowledged the presence of data adjustments within their posted data [
61]. Similar issues have been noted in COVID-19 forecasting studies since ground truth data adjustments occurred during the pandemic [
20]. Nevertheless, in the case of our study and the COVID-19 forecasting study, forecasts are being conducted in real time using ground truth data. Therefore, each weekly forecast is generated using the latest time series available on each prediction date whereas forecasts were scored using the most up-to-date data at the time of the study (week of October 13th, 2022). Because we are dealing with limited epidemic data in this study, we often examined forecasts derived from the second-ranked sub-epidemic model even when it yielded substantially diminished statistical support relative to the top-ranked model. The models employed in this study are semi-mechanistic in that they give insight into the nature of the process that generated the epidemic trends in terms of the aggregation of sub-epidemic trajectories. However, the models are not intended to quantify the effects of different factors, such as behavior change and vaccination, on the declining trend. Finally, it should be noted that our short-term forecasts are based on the inherent assumption that current behavior practices will not change substantially, at least over short time horizons. Further, our models are not sensitive to long-term forecasting episodic risk behaviors that are seasonal or event specific (e.g., LGBTQ Pride festivals). For example, a previous study has reported episodic risk behaviors among MSM, such as condomless anal sex with new male sex partners while vacationing [
62].
In future work, we plan to systematically assess the forecasting performance of the models against other competing models, such as the Autoregressive Integrated Moving Average (ARIMA), which has been broadly applied to forecast time series of epidemics and various other phenomena such as the weather and the stock market [
19,
63,
64]. During the COVID-19, this sub-epidemic modeling framework demonstrated reliable forecasting performance in 10- to 30-day ahead forecasts of daily deaths, outperforming ARIMA models in weekly short-term forecasts covering the U.S. trajectory of the COVID-19 pandemic from the early phase of spring 2020 to the Omicron-dominated wave [
19].
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.