Background
Malaria is one of the most severe mosquito-borne infectious diseases that threaten human health around the globe [
1]. Since the primary vector for malaria is
Anopheles, the dynamics and distribution of malaria are closely correlated with meteorological conditions [
2]. From a biological perspective, climatic factors have a profound effect on malaria incidence as they affect both the development of mosquitoes and malaria parasites within mosquitoes [
3,
4]. For instance, plentiful rainfall and increased humidity offer mosquitoes suitable sites to breed, resulting in increasing number of mosquitoes. Moreover, appropriate temperatures promote malaria prevalence by enhancing the mosquito’s growth and biting rate, as well as increasing the viability and development rate of parasites within vectors [
5].
When exploring the meteorological effect on malaria incidence, special attention should be paid to two particular issues, which are the lag and non-linear patterns [
6,
7]. In some previous studies, lag time was assumed to be single fixed [
8‐
11]. This is unreasonable, especially when describing the relationship between climatic variables and malaria risk at the large population level. Biologically speaking, there are at least two stages that should be considered for the lag effect, such as the development of mosquitoes and the incubation of parasites within the mosquito. The lag time of each stage may vary indefinitely based on climatic conditions, thus generating a smoothly varied lag distribution at population level. The non-linear correlation between rainfall and malaria incidence has been acknowledged and validated both experimentally and epidemiologically in a series of existing studies [
12,
13]. It is proposed that similar non-linear effect may also exist for temperature [
14‐
16]. Therefore, both lag and non-linear patterns should be considered in the model. For this purpose, the distributed lag non-linear model proves a valuable and effective method [
17].
When discussing the relationship between meteorological factors and malaria incidence, however, in most previous epidemiological studies, the effect of between-lag interaction has long been overlooked. Unlike the interaction between different exposure variables where the various climatic factors are of the same time-period and simultaneously affect malaria incidence, the between-lag interaction is defined as the interaction between one covariate at different lag time, such as the interaction effect of the rainfalls four and five weeks previously on malaria risk for the current week. The concept of lagged interaction was first developed by Heaton in 2014, as he reported the corresponding statistical methods to examine the relationship between heat exposure and mortality [
18]. When considering the between-lag interaction, the total effect of the climatic variables is not simply an accumulation of lagged effects, as is conventionally supposed. This is explained by the fact that the number of mosquitoes at week
t are substantially dependent on the rainfall at week
t − 1. More specifically, adequate rainfall at week
t − 1 provides mosquitoes with abundant breeding sites and promotes their development, which leads to an increased number of mosquitoes at week
t, and consequently a greater risk of malaria transmission.
This study aims to investigate the interaction effect between climatic factors at different lag periods on malaria risk, exemplified by rainfall. So far, it has not been directly reported in previous studies. Specifically, with weekly data during the period of 2004–2009 among 30 counties in southwest China, a varying coefficient distributed lag non-linear model was applied to investigate the association between rainfall and malaria cases. The correlation pattern between rainfall and malaria incidence was set to change depending on the three levels of rainfall at the fourth week lag. The results can help researchers to better understand the complex relationship between climatic factors and malaria transmission, and develop potential better prediction models.
Methods
Study sites
The southwestern region of China has been severely threatened by malaria in the last century [
19]. After decades of continuous effort, malaria in these provinces has been brought under control. However, due to the effect of global warming, as well as the complex meteorological condition in this region (most counties are of sub-tropical climate; a few southern counties in the tropical region) [
20], malaria poses a potential threat to the health of its populace.
The southwest region of China encompasses a large area from 21°14′N to 34°31′N and 97°35′E to 110°19′E. It covers primarily four provinces of Sichuan, Chongqing, Yunnan, and Guizhou with 483 counties in total (county-level cities and districts). This region has a population of 189,977,077 (results from the sixth national census in 2010) and occupies an area of 1,137,570 sq km. Malaria data covered 483 counties, while only 131 counties had daily meteorological records. Chosen from those counties where malaria and climatic data were available, 30 counties with the highest average annual incidence were chosen for this study. The relevant principles of sample selection have been described in a previous study [
21]. Additional file
1 shows the 483 counties and the selected 30 counties in southwest China.
Data description
The weekly meteorological data from July 2003 to December 2009 was collected from the Chinese Meteorological Data Sharing Service System [
22], in which the mean temperature (°C), rainfall (mm) and relative humidity (%) had been recorded. There are 131 meteorological monitoring stations among the 483 counties in the southwest region, and those that are relevant to the counties with high malaria incidence were used.
As daily malaria records would bring many zero counts and jeopardize the stability of the parameter estimation, weekly case reports were used in this study and collected through the Chinese Information System for Infectious Diseases Control and Prevention (CISIDCP) from 2004 to 2009 among the 30 counties in southwest China [
23]. Although both malaria sub-types (
Plasmodium vivax and
Plasmodium falciparum) could be the potential cause of the cases reported, most records did not include the type of parasites. The population data associated with the selected counties were obtained from the National Bureau of Statistics of China, from 2004 to 2009.
Basic distributed lag non-linear model (DLNM)
The methodology of distributed lag non-linear model
(DLNM) is used to describe the dependencies that are both non-linear and delayed [
17]. Because malaria cases are ordinary count data, the association between the expected number of malaria cases
E(
Y
it
) at week
t in county
i and climatic variables in the previous weeks was modelled by the Poisson regression,
$$\begin{aligned} \log (E(Y_{it} )) &= \log (d_{it} ) + \beta_{i0}\\ & \quad + \sum\limits_{l = 4}^{15} {f(x_{i(t - l),r} ,\beta_{rl} )} \\ & \quad + \sum\limits_{l = 4}^{15} {f(x_{i(t - l),h} ,\beta_{hl} )} \\ & \quad + \sum\limits_{l = 3}^{10} {f(x_{{i(t - l),T_{m} }} ,\beta_{{T_{m} l}} )} ,\\ \end{aligned}$$
(1)
where d
it
denotes the population in county i at week t; β
i0 denotes the intercept effect for county i. x
it,r
, x
it,h
and \(\, x_{{it,T_{m} }}\) are the weekly meteorological variables for county i at week t, representing the rainfall, relative humidity and mean temperature, respectively.
The lag ranges for meteorological factors were determined based on relevant biological knowledge. Considering the lag effect, it would be logical to assume that cases in a specific week will be affected by climatic factors several weeks before. Consequently, Model (1) could be used to estimate the cumulative contributions across the entire lag range, rather than on a single fixed lag time. The lag ranges were determined by referring to those biological factors and empirical results from previous studies [
24]. The period from the fourth to the 15th week was used as the lag range for weekly rainfall and mean relative humidity, while for weekly mean temperature it was from the third to the tenth week [
24,
25].
There are two basis functions included in Model (1) to represent the non-linear and lag effects. Taking rainfall as an example, the first is
f(
x
i(t − l),r
,
β
rl
), which describes the non-linear effect of rainfall that happened
l weeks before. It could be interpreted by many functional forms, for instance, the polynomial function. As for the other function, its purpose is to constrain the parameter
β
rl
, thus to refrain from the high collinearity caused by the significant correlation between rainfall on consecutive weeks. With the introduction of the constraining function, a reduction of the noise in the unconstrained distributed lag model could be achieved with less bias [
7]. Subsequently, the second-order natural cubic spline was applied for the investigation of both basis functions in Model (1) due to the fact that the meteorological variables are unimodal [
26], as well as the requirement for parsimony.
Correlations between climatic factors and malaria incidences are not equal from place to place, and that the correlation in one county would be comparatively larger than that within two counties, in most circumstances. The inequality was caused by some unmeasured (or even unmeasurable) county-specific variables. To deal with the potential confounding, β
i0 is modelled as a multilevel random intercept, obeying a normal distribution that β
i0 ∼ N(β
0, δ
0
2
). β
0 is the average intercept of all counties, while δ
0
2
represents the variation of county-specific intercepts around β
0.
Varying coefficient distributed lag non-linear model
The lagged interaction was excluded in Model (1) due to the hypothesis that the rainfall at week
t has the same effect at different levels of rainfall at week
t −
k. However, the effect of rainfall at week
t may also depend on the level of rainfall at week
t −
k. In this regard, a varying coefficient model [
27] was used to examine the dependencies of the rainfall effect at week
t on the rainfall level at week
t −
k and to investigate the between-lag interaction.
The rainfall at the fourth week lag was set as the stratification variable in this study, mainly considering the lag range. As x
i(t − 4),r
denotes the rainfall at the fourth week lag, x
i(t-4),r
should, in a certain degree, influence the lag non-linear pattern of rainfall at other lag weeks, providing the hypothesis is correct that the lagged interaction exists and the effect of rainfall at other lag weeks is indeed dependent on the rainfall level at the fourth week lag.
To investigate the possible influence of the lagged interaction, all
x
i(t − 4),r
were divided into three quantile groups (33.3 and 66.6% percentiles). The three groups were denoted as
R
i(t − 4),r0,
R
i(t − 4),r1 and
R
i(t − 4),r2, which represents the
x
i(t − 4),r
at the low, medium and high level of rainfall at the fourth week lag, respectively. Model (1) is then adjusted to embody these changes, and the modifications are shown below:
$$\begin{aligned} \log (E(Y_{it} )) &= \log (d_{it} ) + \beta_{i0} + \sum\limits_{g = 1}^{2} {\alpha_{g} \times R_{i(t - 4),rg} } \\ &\quad + \sum\limits_{l = 4}^{15} {f(x_{i(t - l),r} ,\beta_{rl} (R_{i(t - 4),rg} ))} \\ &\quad + \sum\limits_{l = 4}^{15} {f(x_{i(t - l),h} ,\beta_{hl} )}\\ &\quad + \sum\limits_{l = 3}^{10} {f(x_{{i(t - l),T_{m} }} ,\beta_{{T_{m} l}} )} ,\\ \end{aligned}$$
(2)
Compared to Model (1), Model (2) is different in two major aspects. The first is that β
rl
has now become β
rl
(R
i(t − 4),rg
), reflecting the fact that the coefficient β
rl
is now varying over R
i(t-4),rg
. Consequently, the effect of rainfall at other lag weeks is now dependent on the relevant rainfall level at the fourth week lag. Besides the lag non-linear pattern, the modified model can also be used to interrogate the lagged interaction that the rainfall effect of other lag weeks changes over different levels of rainfall at the fourth week lag. If the lagged interaction indeed exists, the lag non-linear patterns for rainfall at the three levels should differ from each other.
The second difference is that the coefficient R
i(t − 4),rg
is now included in the intercept. Like the ordinary categorical predictor, R
i(t − 4),r0 at the lowest level was used as the reference group, and α
1 and α
2 represent the differential effects for R
i(t − 4),r1 and R
i(t − 4),r2, respectively. An assumption was made in Model (2) that the mean temperature and relative humidity do not interact with the rainfall. The reference values of all climatic factors are all set at zero.
The analysis may be sensitive to the choice of lag ranges, and therefore the situation where the fourth to 14th weeks were selected as lag range for rainfall instead of the fourth to 15th weeks was also investigated, and the results were robust and showed no significant change. All the analysis was performed with R, which is an open statistical software [
28]. Specifically, the package lme4 [
29] was loaded to estimate the parameters. The data above have been used in previous studies by the same research group to explore other relationships between meteorological factors and malaria incidences [
21,
30].
Discussion
The interaction between meteorological factors is important in malaria transmission, as they are closely associated with vector abundance and survival as well as parasite maturation [
24]. Exploring the interaction between climatic variables on malaria incidence can help to better understand the relationship between meteorological factors and malaria incidence [
31]. The interaction between exposure predictors is common in existing studies, but so far there is no report dedicated to the lagged interaction between climatic factors in the process of malaria transmission, which may also play a crucial role in malaria epidemics. Particularly, the lagged interaction effect on malaria incidence examined in this study was the interaction between rainfall at the 4th week lag and that at the 6th, 9th and 12th week lags.
The results indicate that the rainfall at the 4th week lag affects the correlation between malaria incidence and rainfall at the other lag weeks, implying the interaction effect between lagged rainfalls on malaria. When in the low rainfall level at the fourth week lag, the malaria risk increases along with the increase of rainfall, suggesting that the increasing rainfall promotes malaria transmission when rainfall is low at the 4th week lag. In contrast, excessive rainfall decreases the risk of malaria when rainfall is high at the 4th week lag, which can be observed in panel c, f, i of Fig.
2. These results can be explained by malaria dynamics [
19]: rainfall elevates the environmental humidity and brings about many temporary puddles, simultaneously increasing the number of mosquito breeding sites and enhancing mosquito survival. However, excessive rainfall and accumulation of surface water in complicated terrain would potentially destroy mosquito-breeding sites, thus reducing the mosquito density. Abundant rainfall may also prevent people from working outdoors, resulting in lower chances of people being bitten by mosquitoes and consequently, decreasing malaria incidence. Specifically, when the rainfall level at the 4th week lag is low, the greater rainfall at week
t would relieve the effect of insufficient rainfall so that rainfall offers more breeding habitats to mosquitoes and increases their number, resulting in the increased risk of malaria incidence. In contrast, when the rainfall level at the fourth week lag is high, abundant rainfall at week
t would exacerbate the effect of excessive rainfall, resulting in mosquito breeding sites being destroyed and people reducing their outdoor activities. The effect of high rainfall at week
t would be attenuated or even become negative.
It is also observed that the lagged effect of rainfall on malaria incidence was greatest at the ninth lag week, compared to the 6th and 12th weeks. This is biologically acceptable as the effect of rainfall occurring in the current week or too long before is negligible on malaria incidence.
Greater rainfall brings a higher relative risk but a shorter lag for malaria cases, which can be observed from Fig.
3. Specifically, in the low and medium levels of rainfall at the 4th week lag, rainfall starts to be significantly associated with malaria incidence at the 11th week. In high rainfall level at the fourth week lag, the distributed lag curve shows a significant correlation from the 4th to 13th week. Compared with the high level of rainfall at the fourth week lag, rainfall in low levels is associated with delayed malaria risk. The results are consistent with a previous study [
32]. These may be a result from the previously mentioned malaria dynamics that rainfall could provide fitted habitats for mosquitoes to breed, thus shortening their life cycles and accelerating the spreading of malaria [
33]. Despite the fact that the relative risk for malaria cases is positively correlated with rainfall, the increase in the relative risk is more drastic when rainfall is low, while it becomes minimal when rainfall is high. This phenomenon may be explained by the saturation effect, where the contribution of increasing rainfall to the development of mosquito and parasite becomes negligible or even counterproductive.
Rainfall is selected as an example to reveal the lagged interaction on malaria incidence mainly for biological and epidemiological considerations. From an entomological perspective, rainfall affects most of the stages of the mosquito’s life cycle. For example, plentiful rainfall provides mosquitoes with aquatic breeding sites for their growth and reproduction [
34]. While from an epidemiological view, the reported relationship between rainfall and malaria vary in the literature [
24]. This is especially true in China, as several studies showed that rainfall was closely correlated with malaria incidence [
16,
35,
36], while other studies denied the existence of such correlations [
37]. Understanding the interaction effect between rainfall at different lag time on malaria incidence may help to better explore the relationship between rainfall and malaria incidence.
α
1 and
α
2 are introduced to describe the main effects of the rainfall levels at the fourth week lag. As demonstrated in Fig.
1, the three groups of different rainfall levels at the fourth week lag do not have identical baseline distribution of climatic factors. Even under the same rainfall condition at the fourth week lag, the average effect of rainfall among the three groups should be distinctively different;
α
1 and
α
2 are consequently added as the average deviation, ensuring that the logRR values of all groups would be zero at the reference rainfall, and allowing the comparison of variations for rainfall.
The county-specific random intercept model allows fitting a regression model to meteorological factors with the systematic unexplained variation among the 30 counties. As with other epidemiological literature on the relationship between meteorological variables and malaria incidence, the final results would be potentially interfered by some confounding factors. For instance, there may be different preventive measures with different enforcement strength that are deployed by an individual county to fight malaria, as well as some behavioural patterns, such as the utilization of nets of different types. The variance
δ
0
2
of the county-specific random intercept
β
i0 represents the variation between counties that are not caused by the climatic predictor. The random intercept model has proven efficient in handling the potential bias [
38].
There are a few limitations that should be acknowledged. First, the data quality may change over the 6 years. This primarily varied with time, and the best data quality was found in 2009. Second, only 30 counties with malaria prevalence were used in this study. However, this should not introduce significant inherent bias into this study. The range of malaria incidence varied from low to high values. Specifically, the annualized average incidence ranged from 348.2/100,000 to 1.1/100,000. In particular, it is evident (see Additional file
2) that the 30 counties included many low-incidence counties, such as Eshan with just 11 malaria cases in 6 years. Therefore, the selection method in this study should not undermine the credibility of this study. Third, like several existing studies [
39,
40], the characteristics of
P. vivax and
P. falciparum were not analysed separately due to the lack of associated information in this study. As a result, vivax relapses may be mistaken as new infections caused by meteorological factors. The lag non-linear patterns of the two malaria sub-types may be slightly different from each other. By investigating the potential bias, future studies might provide more details to elucidate the association between climatic factors and malaria incidence in southwest China.
Authors’ contributions
YYW performed the statistical analysis and drafted the manuscript. ZJQ cleared the data. ZJF provided the original data. NW helped to revise the manuscript. XZ conceived of the project concept. XZ, XSL and HJY gave technical assists. All authors read and approved the final manuscript.