Background
Hand, foot and mouth disease (HFMD) is a common infectious disease caused by various enteroviruses, especially enterovirus 71 (EV-71) and Coxsackie virus A16 (CV-A16) [
1]. Recently, enteroviruses other than EV-71 and CV-A16 have been increasing in both mild and severe cases and Coxsackie virus A6 (CV-A6) has been emerging as another predominant serotype in some regions [
2]. HFMD mostly affects children under 5 years of age. It is a rising public health problem and has attracted considerable attention worldwide [
3‐
5]. It is especially widespread in Asia-Pacific areas [
1] and presents a general increasing incidence in recent decades [
6‐
8]. China is one of the Southeast Asian countries with the most serious HFMD epidemics [
9]. It is a Class C notifiable infectious disease in China. Since 2009, the annual incidence of HFMD has never been less than 100 per 100,000, and caused hundreds of deaths in each year. Effective prevention and control of HFMD has become a major challenge in the field of public health [
1,
10].
Incidence forecasting of an infectious disease is essential for the public health authorities to better understand the epidemic characteristics and track its seasonal changes in advance. Accurate predicating is a vital basis to optimize decisions and configure resources for preventing and controlling infectious diseases. So, it is of great significance to establish a scientific, appropriate and reliable prediction model and improve the model performance to the best [
11,
12]. Recently, some researchers are interested in forecasting the incidence of HFMD, using the liner time series models. For example, an ARIMA (1,0,1, 0,1,0)
12 model was constructed to forecast the HFMD incidence in Sichuan, China [
13]. In another study, multivariable ARIMA models using search engine query data and climate factors as exogenous variables were developed to predict the HFMD epidemic in Guangdong, China [
14]. However, the assumption of linearity in many time series events may not be satisfied in practice. The accuracy of the liner forecasting models therefore needs to be improved. Models based on artificial neural networks (ANN) can effectively extract nonlinear relationships in data. They have been widely used in infectious diseases predictions because of their characteristics of robustness, fault tolerance, and adaptive learning ability. As one of the common ANN, back propagation neural networks (BP model) is widely used in many areas, such as economic and engineering. It has also been introduced into forecasting infectious diseases [
15,
16]. To date, however, there has been no literature report on using BP model to predict the epidemic of HFMD.
The purpose of this study was to develop an optimal BP model to predict the future trend of HFMD in Jiangsu province, China, with special emphasis on elucidating the effects of meteorological factors as predictors. Meanwhile, the performance of BP model was compared with ARIMA model. It was expected that the findings in this work would be useful for the prevention and control of HFMD.
Discussion
Accurately identifying the epidemic trend in advance is of critical importance for infectious diseases prevention and control. As HFMD is a common infectious disease throughout the world, modeling its epidemic has been concerned and actively studied in recent years. Some researchers have put forward different prediction methods for HFMD. For examples, Yu et al. [
24], developed a new hybrid model with ARIMA and nonlinear auto-regressive neural network. Zhong et at [
25]., employed XGBoost, one of the machine learning methods, to forecast HFMD with multiple environmental factors.
In this study, an optimized multivariate BP model with meteorological factors was constructed. This model presented a satisfactory accuracy in forecasting the HFMD incidences in Jiangsu province, China. It reached a MAPE less than 20% in the prospective forecasting stage and accurately estimated the seasonal fluctuation of HFMD in the next 24 months. The predictive performance is much better than that in many similar studies. It may serve as a reliable tool for the public health authorities in the practice of HFMD prevention and control. Notably, BP model has a risk of over-fitting, which is a critical issue that usually leads to poor generalization [
26]. In this work, it was observed that the accuracy of prospective predication getting worse when the neurons in the hidden layer was more than 11, which suggested that too many neurons in hidden layer maybe causes severe over-fitting. Hence, how to determine an optimized model structure is an important issue. Unfortunately, it is still controversial. In this study, this work was completed based on MAPE, the BP model with the minimum MAPE on testing set was selected as the best optimal model.
A substantial studies have proposed that infectious diseases are climate sensitive [
27‐
29]. Climatic factors may influence the survival and spread of infectious pathogens in the environment, the host susceptibility and exposure probability [
30‐
32]. The effects of meteorological factors, such as temperature, rainfall and relative humidity, on the epidemics of HFMD have attracted considerable concerning recently [
22,
33]. Song et al. [
34], developed a seasonal ARIMA model with lagged precipitation as predictor to forecast the incidence of HFMD. Unfortunately, the model did not present a satisfactory performance. Similarly, the ARIMAX model we developed using lagged temperature as predictor did not achieve a good enough accuracy for practical application. And the introduced climate variable did not improve the performance of the ARIMA model. It may be due to two reasons. Firstly, ARIMA model is essentially a linear method. However, meteorological factors were proved to be non-linearly associated with the epidemic of HFMD, so ARIMA model is inappropriate to fit the relationships between predictors and HFMD incidence. Besides, data size is not sufficient for the model to fully extract the underline pattern contained in data. Consequently, the model could not achieve a satisfied predictions. Zhao et al. [
35], also constructed an ARIMA model with temperature as predictor based on the data in Huainan City, China. It presented a well goodness-of-fit. However, its extrapolated predictive capability was not explained. Consequently, its practical application value is doubtful. In this study, BP neuron network was employed to forecast the HFMD incidence with monthly average temperature, rainfall and their lagged terms as predictors. This model performed much better than the BP model without climatic variables, which suggested that climate factors can improve the prediction effect. Meanwhile, we also found that both of the two BP models performed much better than the ARIMA models. This may indicate that BP model is more suitable than ARIMA model to predict the HFMD incidence in the study region.
It is worth mentioning that the multivariate BP model developed in this study achieved accurate estimations of the HFMD incidences in the next 24 months. Thus, it could be used to predict the medium to long term epidemic level of HFMD, which is of great important for the public health authorities. As shown in Fig.
3, given the whole test set, the BP model without climate variables performed relatively poor. Interestingly, the first few predicted values matched the real incidences very well. It suggested that this model may have the potential to be used for short-term forecasting, which is necessary to be further verified in practice.
Some limitations need to be mentioned. First, the epidemic of HFMD is affected by many factors, including natural and social environmental factors, etiological factors, and so on. In this study, just meteorological variables were considered to improve the predication ability. Other factors associated with HFMD may also be used as good predictors, which deserves progressive studies. Second, because some mild cases might use home therapies, and some cases with atypical symptoms may be misdiagnosed, so the data reported may underestimate the HFMD incidence, which may affect the precision of the predictions. Third, the optimal BP model was constructed based on the data in Jiangsu province, China, generalizability of our findings to other regions with different epidemic characteristics of HFMD and climate situations might not be straightforward. But the use of the BP model incorporating climate factors in the detection and prediction of HFMD may provide an opportunity for re-allocating healthcare resources more efficiently in other regions or countries. Besides, similar to many other neural network models, BP model can not explain the specific association between risk factors and disease.
Conclusion
In this study, four models were constructed to forecast the incidence of HFMD in Jiangsu province, China. The BP models performed much better than the ARIMA models. The introduction of mean temperature, rainfall and their one order lagged terms significantly improved the prediction accuracy of the BP model. On the contrary, neither the univariate ARIMA model nor the multivariate ARIMAX model achieved satisfactory prediction accuracy. The climate factors did not optimize the performance of the ARIMA model. In general, the multivariate BP model comprehensively combined the autocorrelation of the independent, the climatic variables and their hysteresis effects. It is an ideal method to predict the HFMD epidemic, which has a good prospect of practical application.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.