Background
Influenza is a common respiratory infectious disease that imposes significant morbidity and mortality impact on public health [
1]. Every year, seasonal influenza epidemics are estimated to cause about 3 to 5 million cases of severe illness and up to 650,000 deaths globally [
2], placing a substantial burden on health services. To curb these epidemics, the beginning of major influenza activity in each season must be declared. A timely alert of the onset of seasonal influenza epidemic could allow health communities to activate appropriate influenza response plans and prepare for a subsequent dramatic increase in incidence and utilization of health services [
3]. In temperate regions such as Japan, seasonal influenza epidemics are expected to occur during winter [
4,
5]; however, the exact onset, duration, and severity of these epidemics are not known because of annual differences in the circulating virus strains, population immunity, human mobility, as well as environmental and other factors [
6‐
8]. Therefore, an intuitive and reliable method for estimating epidemic onset is of great interest to public health decision makers because it can help public health agencies to timely respond to the upcoming epidemic peak.
The epidemic onset is technically defined as the time when the incidence exceeds the epidemic threshold [
9]. Hence, the algorithm behind the calculation of the epidemic threshold becomes the key to detecting epidemic onset. Without a consensus for calculating epidemic thresholds, a range of approaches with varying complexity have been proposed [
6,
8,
10]. The simplest but the most subjective option is to empirically specify a fixed threshold for the epidemic by visual inspection of observations [
6,
11‐
14]. A slightly more quantitative manner of determining a fixed epidemic threshold is to use simple statistics, e.g., mean or median [
15‐
19]. One class of widely used methods for obtaining time-varying epidemic thresholds stem from the periodic regression model proposed by Serfling in 1963 [
20]. A variety of Serfling-like regression models have since been developed to detect the onset [
15,
21‐
23] and peak timing [
24] of influenza epidemics, and to characterize the seasonal patterns of influenza [
25‐
27]. The Serfling regression model fits the non-epidemic data from previous years and predicts a baseline curve, above which a certain increase is considered the epidemic threshold. However, these Serfling-type approaches have several drawbacks. Firstly, epidemic and non-epidemic periods are required to be predefined based on subjective criteria [
28], such as manual removal of epidemic peaks, the proportion of influenza-like illness (ILI) patients among all outpatients (ILI proportion), the proportion of laboratory specimens from ILI patients testing positive for influenza (positive proportion), and so on. The precise determination of epidemic and non-epidemic periods is actually the onset that we would like to estimate. Secondly, the baseline curve is estimated relying on long-term (usually the 5 or more previous years) historical data [
13]. Finally, the quantities added to the baseline are varied and not standardized [
15,
22].
Several studies have attempted to define epidemic thresholds, taking into account properties of the epidemic curve, e.g., the rate of increase in the number of cases. Nobre and Stroup [
29] detected the epidemic onset using the exponential smoothing technique and properties of numerical derivatives of the epidemic curve. This method does not require long-term historical data and can be applied to surveillance series of less than a year; however, prequisites include that the chosen polynomial model must fit the data well, and exploratory analysis is required to choose the parameters of the exponential smoothing model. The World Health Organization (WHO) Regional Office for Europe and the European Center for Disease Prevention and Control have implemented the moving epidemic method (MEM) to determine the baseline influenza activity and epidemic thresholds for influenza surveillance in Europe [
8]. The MEM calculates the epidemic start and end after the optimum epidemic duration is firstly found with the slope of the maximum accumulated rates percentage curve less than a predefined criterion δ. Although the MEM can be used for analyzing a single influenza season with as few as 33 weeks of observations, the determination of δ is difficult as it is country-specific. Recently Cheng et al. [
30] developed a moving logistic regression method (MLRM) to determine the thresholds of seasonal influenza epidemics across 30 provinces in mainland China. The MLRM approximates the cumulative epidemic curve by a logistic regression model. Following the MEM, the MLRM chooses the optimum epidemic duration with a slight change of R
2 < 0.01. However, the application of MLRM is limited to symmetric epidemic waves and is not appropriate to asymmetric or bimodal epidemic waves.
While the predominant approaches to detecting epidemic onset are based on thresholds, a few non-thresholding methods have been proposed for estimating epidemic onset. To study the spatiotemporal transmission patterns of influenza, Charu et al. [
31] and Geoghegan et al. [
7] determined the onset time of epidemics using the segmented regression model (SRM). They fitted a segmented regression model to the first half of the epidemic curve (i.e., the weekly time series of ILI before the peak), where the breakpoint quantifies an abrupt change in incidence and its timing corresponds to the epidemic onset. The SRM does not rely on any threshold and can be applied to a single influenza season without requirements for historical data because it defines epidemic onset totally based on the properties of the epidemic curve.
Charu et al. [
31] also demonstrated excellent agreement between influenza epidemic onset estimates derived by the SRM and the Serfling regression model in the United States (US). However, the consistency between epidemic onsets estimated by the SRM and other threshold-based methods using other influenza surveillance systems remains unknown. The lack of reliable information on epidemic onset observations limits the execution of such evaluations. Since 2000, the national epidemic threshold for sentinel surveillance of ILI in Japan has been empirically defined as 1.0 ILI case per sentinel per week (C/S/W) [
32,
33]. This epidemic threshold successfully captures a unique feature of the epidemic curve, which means that once the threshold is exceeded, the weekly number of ILI cases increases rapidly and consistently until peaking [
34]. Hence, those onsets derived by this empirical threshold method (ETM) for influenza epidemics in Japan can be used as a reference standard for assessing other approaches to estimating epidemic onsets.
The thresholds for the onset and end of influenza epidemic are supposed to vary across Japanese prefectures [
35]. Yet, no appropriate epidemic threshold exists for each prefecture. We propose a novel statistical method, the maximum curvature method (MCM), to determine prefecture-specific onsets of influenza epidemics in Japan. This method is based on the maximum curvature of the epidemic curve, which makes the best use of the epidemic curve’s unique feature and retains the advantages of non-thresholding methods for estimating epidemic onset. As we focus on the non-thresholding methods, in this study, epidemic onset estimates derived by both the MCM and SRM are evaluated in comparison with the reference epidemic onsets obtained by the ETM with a fixed value of 1.0 C/S/W. Finally, prefecture-specific thresholds for epidemic onset and end are established using the MCM.
Discussion
In this study, three methods including the ETM, SRM, and MCM, were used to estimate epidemic characteristic parameters for each of the 47 prefectures in Japan during each of the six influenza seasons from 2012/2013 to 2017/2018. Among them, the ETM is a thresholding method to detect epidemic onset based on the nationwide epidemic onset threshold of 1.0 C/S/W. The SRM is an existing non-thresholding method for capturing the breakpoint of the epidemic curve as the epidemic onset. The MCM is also a non-thresholding method that we proposed to detect epidemic onset based on the maximum curvature of the epidemic curve. Proper evaluations of methods for detecting epidemic onset are often impaired because of a lack of suitable datasets with reliable information on the occurrence of epidemics [
29]. To address this issue, in the present study, estimates from the ETM were used as reference standards to evaluate the performance of the other two methods.
The incompleteness of ETM estimates suggests that the empirical epidemic threshold is not appropriate for the levels of influenza activity observed in prefectures located at or near the southernmost part of Japan, such as Okinawa and Kagoshima (Table
2). The severe lack of valid ETM estimates in Okinawa resulted from a level of background influenza activity that was higher than the empirical epidemic threshold of 1.0 C/S/W. It has been recognized that background influenza activity is high throughout the year in tropical regions [
51]. Hence, the influenza seasonality is less defined in Okinawa, where the lowest influenza activity usually occurs later than in other, more northern prefectures (Additional file
1: Figure S5). By contrast, the epidemic onset and ending thresholds (1.9 and 2.6 C/S/W) for Okinawa established using the proposed MCM were the largest, and much higher than those of other prefectures and the empirical epidemic threshold of 1.0 C/S/W (Fig.
4), faithfully reflecting the characteristics of influenza epidemics in Okinawa.
The epidemic curves in all prefectures were asymmetrical because when approaching the epidemic end, the second half of the epidemic curve was relatively gentle compared with the first half, as demonstrated in the 2014/2015 season (Additional file
1: Figure S5). This asymmetry of the epidemic curve not only explains why better agreement with the ETM was achieved for epidemic onset than for epidemic end, regardless of the method used, but also suggests that thresholds for epidemic onset and end are likely to be different and should be established individually. The high consistency between the MCM and ETM guarantees the continuity of using epidemic thresholds derived by the MCM in the Japanese sentinel surveillance system for influenza. Although the prefecture-specific thresholds for epidemic onset and end were established using the only six available influenza seasons, these thresholds can be further refined as more data become available in the future. In addition to the mean statistic used in the present study, other procedures for calculating the thresholds [
8] are worth exploring.
The IQRs of the epidemic ending intensities derived by the MCM during 2012/2013, 2014/2015, and 2016/2017 were wider than those during the other three seasons (Additional file
1: Figure S4). This may be explained by the severity of epidemics. In Japan, the 2012/2013, 2014/2015, and 2016/2017 influenza seasons were characterized by the predominance of the A(H3) subtype whereas the dominant virus subtypes in the other three seasons were A(H1N1)pdm09 and B/Yamagata. Seasonal influenza epidemics dominated by A(H3N2) subtype are generally more severe than those dominated by A(H1N1) and B [
52], which may affect the shape of the epidemic curve. Therefore, establishment of epidemic thresholds, particularly the epidemic ending thresholds, could incorporate information on the dominant influenza virus subtype.
The proposed MCM has several properties that make it broadly applicable for estimating epidemic onset in public health surveillance. First, the MCM is intuitive as it defines epidemic onset by capturing the local point with maximum curvature. The MCM is a non-thresholding approach to determining epidemic onset that is based entirely on the shape of the epidemic curve. During implementation of the MCM, an upper threshold
h is prespecified to limit the search scope for points. However, the sensitivity analysis suggests that the MCM is robust to
h for a wide range (Table
3). Therefore, this threshold is not required to be as precise as
Y0 in the ETM, and is easy to be set. Moreover, it also provides the flexibility to adjust the search scope for points according to the background levels of influenza activity. These properties together with the success of Okinawa give the MCM the potential to estimate epidemic characteristic parameters in the subtropics and tropics where various respiratory pathogens that can cause acute respiratory illness, such as respiratory syncytial virus, parainfluenza virus etc., circulate year round [
18]. Consequently, the patterns of influenza in subtropical and tropical regions are complex with year-round high background rate of acute respiratory illness [
51] and lack of apparent ILI seasonality [
18]. The recent experience of establishing influenza epidemic thresholds in Cambodia using the WHO method [
19] suggests that unlike in temperate regions, the ILI syndromic surveillance data was less useful for setting thresholds [
18]. Therefore, priority to virological surveillance data, such as the positive proportion [
30], the product of the ILI proportion and the positive proportion, should be given when applying the MCM to establish thresholds for influenza epidemics in subtropical and tropical regions.
Second, in contrast to the widely used Serfling-like regression models requiring long series of historical data to estimate model parameters [
13,
20,
22,
26], parameters of the MCM are prespecified. This means the MCM can be applied in areas with limited historical data and in analyzing influenza pandemics that usually last for a single season. Epidemic onsets determined using empirical thresholds [
12], Serfling-type regression model [
21], and the SRM [
7,
31] have been used to investigate spatial transmission of both influenza pandemics and epidemics. New insights into the spatial transmission of influenza may be gained using the MCM as it defines epidemic onset totally based on the properties of the epidemic curve.
Third, although the calculation in the MCM is more complex than that in the SRM, the estimates derived using our novel MCM were in much better agreement with those derived using the ETM. The high consistency between epidemic onsets derived by the ETM and MCM implies that curve properties, such as the curvature, may have been taken into consideration during the determination of the national epidemic onset indicator in Japan. A comparison conducted by Charu et al. [
31] showed excellent agreement between estimates of influenza epidemic onset in the US derived by the SRM and Serfling-like regression method, which in essence determines epidemic onset based on thresholds. In constrast, the agreement between the ETM and SRM was poor in Japan. This may be linked to the differences in sentinel surveillance systems for influenza in the US and Japan.
Finally, the MCM is robust not only to model parameters
n and
h but also to the partitioning of the influenza seasons and the determination of the epidemic peak. Regarding the estimation of epidemic onset, the MCM calculates the curvature at each point by fitting a least-square circle using only
n points around the current one. While searching for the local point of maximum curvature, the MCM also takes into account the changing direction of the curvature at each point, which ensures that only points in the ascending phase of the epidemic curve are targeted. In contrast, the SRM fits two broken lines, using all points in the first half of the epidemic curve. Therefore, when the influenza season begins and ends could have an impact on the epidemic onset estimate. In the present study, it was appropriate to define the start of each influenza season as week 35 with the exception of Okinawa during 2012/2013, 2014/2015, and 2016/2017 (Additional file
1: Figure S3 and S5). For example, during 2012/2013 in Okinawa, the influenza season should have been defined to start around week 44. The first broken line fitted by the SRM included approximately the last 10 weeks of the previous influenza season, which resulted in a biased epidemic onset estimate toward earlier weeks. In this case, the curvatures for these weeks is filtered out by the MCM as their directional angles were not between [0°, 90°] (Fig.
2C and D). Furthermore, taking the direction of curvature into consideration may enable the MCM to overcome the constraint of the MLRM [
30] and to be applicable to multiple epidemic waves of influenza observed in subtropical and tropical regions, such as southern China [
25]. In addition, the SRM is more sensitive to the determination of the epidemic peak timing than the MCM. However, epidemic peaks may suffer from large fluctuations, such as the sharp decrease in ILI activity during the National Day Holiday in the 2009 pandemic in China [
53]. Under such circumstances, the SRM will result in a large bias in the epidemic onset estimates.
There are several limitations to the proposed MCM that deserve consideration. First, the MCM can only be used in retrospective analysis of epidemics because data from later weeks are required for fitting the least-square circles. Second, the MCM implicitly relies on the smoothness of the epidemic curve. For epidemic curves with small fluctuations, we can address this limitation by increasing the number of points (e.g.,
n = 7) used for fitting least-square circles. For irregular epidemic curves with large and frequent fluctuations, techniques such as Savitzky-Golay filtering [
54], among others, may be used to smooth the epidemic curve before applying the MCM. Finally, in comparison with the SRM, the MCM cannot provide confidence intervals for epidemic onset estimates, which limits the ability of the MCM to take uncertainties into account.