Background
Hand, foot and mouth disease (HFMD) is a common infectious disease that mostly occurs in children younger than 5 years of age. This disease is caused by EV71, CoxA16 and other viruses, and it can lead to symptoms in the hand, mouth or foot, including fever, blisters, and ulcers. It can also cause aseptic meningitis, encephalitis, neurogenic edema and other symptoms in some critically ill patients and may even be life threatening [
1]. Therefore, via investigation of the associations of the influencing factors, such as meteorological, geo-environmental and socioeconomic variables, with the incidence of HFMD, the critical risk factors and the difference of risk in regions can be identified and provided for decision-making support of prevention and control measures against this disease.
Existing studies have shown that the incidence of HFMD is related to meteorological, geo-environmental and socioeconomic factors. Meteorological factors, such as air temperature and humidity [
2], rainfall [
3], and wind speed [
4], have important effects on the incidence of HFMD. The short-term El Niño effect was reported to be associated with the incidence and spread of HFMD [
5]. Socioeconomic factors, including population density, the number of industrial companies and the ratio of students in a population, were identified as important HFMD risk factors [
6]. The gross domestic product (GDP) was found to make a great contribution to the incidence of HFMD [
7]. Geographical environmental factors, such as the normalized difference vegetation index (NDVI), population density, land cover types and roadway density, were demonstrated to affect the incidence of HFMD [
8,
9].
In terms of modeling methods, geographically weighted regression [
10], boosted regression tree analysis [
11], the generalized additive model [
12], and the Bayesian network [
13], among others, were used to investigate the relationship between HFMD and influencing factors. Some existing studies [
10,
14,
15] used linear regression to model the relationship between influencing factors and incidence. Others showed significant non-linear relationships [
4,
16] between the factors and incidence and the effect [
7] of interactions between different factors on the incidence. In addition, many studies primarily focused on a single dimension of time or space and not on a systematic combination of the two. The studies on temporal factors include investigations of the seasonal changes of meteorological parameters [
4,
15] or identification of the delay effects of influencing factors [
17]. The resulting analyses or the models used in these studies often ignored the influence of differences in the areas or spatial autocorrelation on the disease incidence. Spatial models mostly focused on the investigation of the spatial autocorrelation and clustering of the incidence and ignored the temporal effect. Meanwhile, spatiotemporal studies [
3,
18,
19] were based on spatiotemporal scans to detect the clustering of HFMD incidence. This widely used method could discover the pattern of HFMD spatiotemporal propagation, but combining it with other factors, such as meteorological and geo-environmental parameters, in these methods was difficult.
In this study, we propose a mixed spatiotemporal model that evaluates the impact of meteorological, socioeconomic and geo-environmental factors on the incidence of HFMD and predicts the incidence risk in the target area within a certain period of time. This model uses geo-additive regression to establish a non-linear relationship between influencing factors and disease incidence and systematically integrates the spatial and temporal autocorrelation and spatiotemporal clustering factors; the contribution of meteorological, environmental, and land-use patterns, the effect of spatial and temporal autocorrelation, and the hotspot output of disease for a robust prediction of HFMD incidence are illustrated. Using cross-validation, our approach demonstrated the improvement in assessment of the disease risk.
Discussion
As a gastrointestinal infectious disease, HFMD is caused by the virus of the enterovirus genus (group) that exhibits strong transitivity. HFMD is mainly transmitted by nasopharyngeal secretions such as saliva or nasal mucus, by direct contact, or by fecal-oral transmission. Its latent infection is high and can cause large epidemics in short periods of time [
27]. The transmission paths are complex and affected by multiple factors. We studied the complex associations between influencing factors and the HFMD incidence rate using the non-linear modeling approach with embedding of spatial effect.
Meteorological factors have significant influences on the HFMD incidence rate [
28]. Our results are consistent with previous studies on the influence of air temperature [
2] and relative humidity [
2,
12]. The exact mechanism related to the association of meteorological parameters with HFMD incidence is not very clear. It is generally assumed that meteorological parameters affect HFMD transmission and then its incidence rate. Several factors, such as pathogen infectivity, human behavior patterns, and immune system fluctuations, were proposed to account for such associations [
29].
Our results provide the informative non-linear associations of the influencing factors with HFMD risk and such non-linear association consists of multiple linear associations corresponding to different value ranges of the covariate in the additive models (the Additional file
1). Previous studies identified a non-linear association between relative humidity and the risk of HFMD. Zhang et al. [
4] found extreme values of 45% (minima) and 85% (maxima), and the change between the extreme values indicated a linear association, with the risk of disease occurrence increasing with increased humidity. This view was also supported in the studies by Chen et al. [
12] and Nguyen et al. [
17]. A possible explanation for this finding is that the enterovirus’ survival is generally assumed to be proportional to a low-moderate relative humidity level between the temperatures of 20 °C and 33 °C [
30] and favored by higher moderate humidity because the virus can persist longer on inanimate surfaces [
31]. Furthermore, our study found an extremum near 70%, and an increase in relative humidity in the range of 70% to 80% led to decreased disease occurrence, which is similar to the results of Urashima et al. [
32]. The period from September to November is associated with relatively high humidity and is the season with a low incidence of HFMD (Fig.
5); there is a negative correlation between relative humidity and incidence in this season. Liao et al. [
15] suggested that the virus was inhibited after the increase in relative humidity reached a threshold, thereby reducing the risk of disease.
For the non-linear association between the minimum daily temperature and HFMD incidence, there are two potential reasons: (1) virological evidence shows the temperature-sensitive nature of enteroviruses and other human enteric viruses [
33,
34], and (2) more outdoor activities in moderately warmer weather increase close contact between individuals, thus enhancing the HFMD transmission. Furthermore, our result in the univariate model identified 20 °C as a critical point beyond which the minimum temperature and the risk of HFMD disease showed a locally negative association (slope < 0) for the value range beyond 20 °C. The study by Xu et al. [
16] in Beijing and that by Huang et al. [
35] in Guangzhou also found similar patterns. This inverse association between HFMD and higher air temperature is unclear. We speculate that children are more likely to stay indoors during the season with higher daily minimum temperatures, reducing the impact from public places and crowds. In addition, our study was conducted at the spatiotemporal level, and the spatial difference in HFMD incidence during the high temperature season might also be a related factor. Due to influence of the other covariates, the multivariate model presented a similar variation of associations with a different threshold (about 16 °C).
Our result about the general association between wind speed with HFMD incidence is similar to the results of [
15,
36]. In addition, such an association might locally weakened and even inversed at high wind speed. Although an increase in wind speed is beneficial to the spread of the virus in the air through airborne droplets [
37], the virus may only be able to stay in the air for a short period of time under a high wind speed. Outdoor activities are also curtailed in windy weather, which reduces the chance of exposure to the virus.
The association between the NDVI and HFMD risk was rarely investigated in the previous studies. Cao et al. [
8] and Stanaway [
9] concluded that there was a negative correlation between them. Studies found that urban areas have a high risk of disease, which can be explained by the fact that the vegetation cover is lower in urban areas than in areas with poor economic development and covered by mountains and cultivated land. This study found a non-linear relationship between the two. Compared to our study, the two abovementioned studies were conducted on a spatial scale, thus ignoring the changes in the NDVI over time. We believe that the NDVI increases in the spring and summer, consistent with the HFMD season, thus reflecting a certain upward trend. This phenomenon is more pronounced in urban areas where the NDVI is relatively low.
For most of the socioeconomic factors, they were significantly positively associated with the HFMD incidence. To the best of our knowledge, only a few studies have combined socioeconomic factors with the other factors to assess their effects on the HFMD incidence. Bo et al. [
6] found that the number of industrial enterprises and the proportion of students in the population were associated with the incidence of HFMD. Huang et al. [
7] concluded that GDP had a significant impact on the risk of HFMD incidence. Zeng et al. [
38] suggested that the increase in migratory workers from rural areas to cities was an important risk factor for the occurrence of HFMD. Cao et al. [
8] concluded that urban areas had a higher risk of HFMD compared with poor areas, which is similar to our result that GDP and HFMD incidence were positively correlated. In urban areas, the higher population density leads to the easy spread of the virus. A more complete health system in developed areas enables the timely and detailed report of disease data to the higher health sector, leading to a bias toward a higher incidence. In this study, the association between the number of hospital beds per capita and the disease incidence also supports this claim. The results of this study revealed that the proportion of primary school students and the HFMD incidence showed a complex non-linear relationship, indicating that there were many confounding factors affecting their association.
The traffic indicator was not selected in the multivariate model, but the univariate analysis showed that traffic factors, as a factor of air pollution, reflected the possibility that air pollution might lead to an increased risk of HFMD. It is generally believed that an increase in particulate matter in the air makes it easier for the virus to attach to particular matter, thus contributing to the spread of the virus [
5]. Air pollution can also reduce human immunity and increase the risk in the exposed population [
6].
Besides the physical factors, the temporal indicator (weekly index) also played the important role. The result of scanning statistics showed much higher risk for the highest cluster hotspot.
The spatial effect presents the spatial distribution of the HFMD incidence rate across the study region. The posterior spatial effect showed a general increased risk of HFMD in the southwest part of Shandong Province, in contrast with the decreased risk of incidence in the central north and northeast parts of Shandong Province. The HFMD transmission was complicated and closely associated with the population density and communication, which presented strong spatial patterns [
39]. Whereas the HFMD transmission cannot be fully captured by the covariates used, the spatial effect and clusters embedded in the models could capture such a spatial pattern (strong spatial autocorrelation), thus considerably improving the model’s performance. Furthermore, the introduction of spatiotemporal scanning statistics and spatial effects accounts for most of the variability caused by the other factors, thus lowering the contributions of the other factors, such as meteorological and traffic factors, in the model. The results showed the important implications of strong spatiotemporal and spatial patterns for HFMD risk assessment.
To the best of our knowledge, this study is one of the first studies to design a geo-additive model to estimate the HFMD incidence rate. We performed a comprehensive exploration of the influence of environmental, meteorological, land use and socioeconomic factors on the HFMD incidence rate in terms of non-linear and spatial effects. Our approach incorporated spatial effects as an indicator of spatial autocorrelation and spatiotemporal cluster output within the model. Due to strong spatiotemporal patterns in the variation of the HFMD incidence, the multivariate model achieved good estimation accuracy (CV R2 of 0.83). Our exploration of the influence of various factors and spatiotemporal patterns has important implications for the assessment of the HFMD incidence, and our model provides a good estimation of the HFMD risk, which is useful for decision-making support for HFMD zonation and warning.
This study has several limitations. First, the socioeconomic factor data used in this study do not contain temporal changes. However, socioeconomic factors such as GDP and the number of students do not have significant changes over time within one year. Therefore, the above limitation has a very limited impact on the results. Second, we chose the Thiessen polygon for spatial effect modeling. The Thiessen polygon is affected by the distribution of sample points. However, if more data become subsequently available, the Thiessen polygons can be updated to produce results with better spatial resolution. Third, we chose many variables that might lead to over-fitting in the non-linear additive model. However, the final multivariate model selected fewer variables, strong temporal and spatial variability explained more variation, and cross-validation demonstrated the prediction efficacy of this method. Fourth, the spatial effect and spatiotemporal clustering of the final model explained a large portion of the variation, and the physical meaning of other variables was ignored. However, this study already explored the effects of individual factors on HFMD itself. In terms of prediction accuracy, the contribution of environmental and socioeconomic factors alone was limited. The addition of spatial autocorrelation and spatiotemporal clustering greatly improved the prediction performance. Under the conditions that the influencing factors were complex and the variability of the disease incidence could not be captured effectively, the addition of spatial autocorrelation and spatial clustering items to the model improved the accuracy of risk identification, which was helpful for zoning and warning of HFMD. Last, our model was trained using the data for Shandong Province of China, and the model was applied only for the assessment of the HFMD risk in Shandong Province of China. However, our geo-additive approach, as an improvement of our previous approach [
40], can be easily extended to other regions and other infectious diseases similar to HFMD that characterize strong spatial autocorrelation and temporal patterns.