Data processing and statistical model
We divided the study region (Fig.
1) into a grid at a 1 km × 1 km resolution and aggregated daily data into a 1-weekly resolution. The number of patients who were admitted to the hospital during each week of the recorded time period was used to generate the weekly number of newly admitted dengue patients in each cell based on their home locations. This becomes our ‘case’ variable.
To incorporate the role of mobility into the model we used the travel itineraries provided by the patients to generate a time-dependent connectivity matrix, which represented the total number of trips made by dengue infected patients between each pair of cells for each week of the study period. The travel data included all origins and destinations visited each day during the 10 days preceding hospital admittance (the time interval that the patient is assumed to be able to spread the disease) for each patient. The number of daily trips between each pair of cells was summed over all patients, to provide bidirectional daily trip volumes between cells, and then aggregated to the weekly level. For each cell the total incoming weekly trips was summed to define our ‘trip’ variable as the trips were made by surveyed dengue patients who are assumed to be infectious. Similarly, the outgoing trips of this cell would also be counted at their respective trip destinations. Critically, we exclude all trips with a destination of ‘home’ when computing our trip variable, in order to remove the inherent dependence between the ‘case’ variable, i.e., the home location of infected individuals, and the ‘trip’ variable (explanatory variable). Thus, the total number of trips (excluding trips home) made by infected dengue patients entering a given cell i in a given week t, \( {V}_t^i \), was used as a spatial-temporal explanatory variable in the model. The same method was used for the 5 km × 5 km analysis.
Climate variables were averaged or aggregated temporally to a weekly resolution, including weekly average Tavg, Tmin, Tmax, DTR, RH, weekly total Pre, and RD. Land-use data were aggregated spatially to match the targeted spatial grid resolution. The population data were in an original resolution that matched the 1 km × 1 km grid. For land-use, the percentage of occupied land of each type was determined for each 1 km × 1 km grid cell. Both were subsequently aggregated to a 5 km × 5 km grid.
A linear mixed-effects model combined with backward elimination of insignificant fixed effects (p-value > 0.05, two-tail test) was applied to investigate the spatial-temporal dynamics of dengue outbreak with the potential explanatory variables at a weekly time step and 1 km × 1 km spatial resolution. In building the model we first conducted sensitivity analysis to identify the optimal set of climatic variables to include in the model, and corresponding time lag for each of them.
Along with the chosen climate variable, the remaining set of potential explanatory variables (Table
1) was normalized and then taken into the mixed-effects model initially, with population included in the spatial random effects. Population density was incorporated using random effects in the model because population is likely to have spatially heterogeneous effects on dengue outbreaks [
47,
52]. For example, high population areas may imply access to tap water and better living conditions which could restrict dengue transmission [
53], while the higher density of population facilitates disease spread. Furthermore, there could be spatial variance in the distribution of people living in a particular area. In addition to mobility, climate, and land-use variables; the number of new cases in a given cell in the weeks prior were added as explanatory variables to account for autocorrelations in the case data. Subsequently, the variable with the most insignificant fixed-effects coefficient was eliminated each iteration, until only variables with significant coefficients (at 95% significance level) remained in the model. A range of lead time for
\( {V}_t^i \) prior to the admitted week was also tested. A separate analogous process was conducted using a 5 km × 5 km resolution, to test the sensitivity of model results across spatial resolutions, and the robustness of the modeling framework and findings.
Thus, the mathematical representation of the model is given by:
$$ {N_t}^i=\sum \limits_{l\in \overline{L}}{\alpha}_l{f_l}^i+\sum \limits_{c\in \overline{C}}{\beta}_c{c}_{t,{d}_c}+\sum \limits_u{\gamma}_u{V}_{t-u}^i+\sum \limits_w{\delta}_w{N}_{t-w}^i+{a}^i+{b}^i{P}^i+{\varepsilon}_t^i $$
Where.
i is the cell index; i = 1, 2, … .
l is the land-use variable, which belongs to the land-use group set \( \overline{L} \), where \( \overline{L} \) includes Sea, StWtr, FlwWtr, Coconut, Marsh, Paddy, BuiltUp, Scrubland, Homesteads, Forest, Rubber, RockS, OthAg, and Other.
\( {f}_l^i \)is the occupation fraction of land-use group l in cell i, time-invariant.
Pi is the population in cell i, time-invariant.
t is the time index at weekly resolution; t = 1, 2, … .
\( {N}_t^i \)is the number of patients who are admitted to the hospital during week t, whose home locations are in cell i.
\( {N}_{t-w}^i \)is the number of patients who are admitted to the hospital during week t-w, whose home locations are in cell i, where w is measured in weeks; w = 1, 2, …
\( {V}_{t-u}^i \)is the number of total number of trips made into cell i during the week t-u, where u is measured in weeks; u = 1, 2, …
c is the climate variable which belongs to the climate variable set \( \overline{C} \). \( \overline{C} \) includes Tavg, Tmax, Tmin, DTR, Pre, RD, and RH.
\( {c}_{t,{d}_c} \)is the climate variable during the week that begins
dc days prior to the start of week
t.
dc ranges from 7 to 17 days and can be different for different climate variables (Figure S
2). Multiple climate variables can be included in the model.
\( {\varepsilon}_t^i \)is the model residual associated with cell i and week t.
αl is the estimated fixed-effects coefficient for l.
βc is the estimated fixed-effects coefficient for c.
γu is the estimated fixed-effects coefficient for \( {V}_{t-u}^i \).
δw is the estimated fixed-effects coefficient for \( {N}_{t-w}^i \).
ai is the intercept associated with cell i.
bi is the estimated spatial random-effects coefficient for Pi.
The data processing and modeling were performed using MATLAB R2017a.