Background
Every year influenza epidemics are responsible for substantial clinical and economic burdens in the United States of America (US) [
1]. Consequently, local, state, and national health authorities require quantitative evidence that is timely and representative to make informed decisions regarding the selection and allocation of resources. The Centers for Disease Control and Prevention (CDC), a governmental agency, has been continuously collecting information on the number of outpatient visits for influenza-like illness (ILI) from a diverse network of healthcare providers as well as on the number of influenza-positive lab specimens from public health and clinical laboratories across the US for multiple decades [
2]. Although influenza surveillance occurs throughout the calendar year, the influenza season is defined by the Morbidity and Mortality Weekly Report (MMWR) week 40 through week 20, which corresponds with months October through May. Due to the time to collect, process, and aggregate this information, CDC influenza surveillance reports are traditionally published with a 1–2 week delay. Alternative data sources that are available in near-real time may aid in the design, initiation, or communication of timely strategies and mitigate the impact of influenza.
Over the past decade, Internet-based technologies have been explored as new ways to monitor influenza activity and provide more immediate estimates of disease activity. These Internet-based technologies include systems such as Yahoo [
3], Google [
4‐
6], Baidu [
7], Twitter posts [
8‐
10], clinicians’ database queries [
11], cloud-based Electronic Health Records (EHR) [
12], and online participatory cohorts that allow individuals to report symptoms [
13‐
15]. The ability of these novel Internet-based and crowd-sourced approaches to complement, track, and forecast traditional provider-based influenza surveillance systems has been established at the national and regional levels in the US [
12,
15‐
20]. However, because characteristics of activity may differ across states and sub-populations [
21], further investigation of these novel systems is essential at finer spatial resolutions [
22].
In this paper, we evaluate two novel influenza-tracking systems, athenahealth, a cloud-based EHR-based system, and Flu Near You (FNY), a crowd-sourced system. Founded in 1997, athenahealth is a provider of cloud-based services and mobile applications for medical groups and health systems. Similar to traditional health-care based surveillance systems, athenahealth collects data on individuals who seek medical care. Because athenahealth’s network is cloud-based, the proportion of patients with ILI symptoms in their national network of providers can be estimated in near real-time, potentially providing estimates of influenza activity faster than the national surveillance systems (
https://insight.athenahealth.com/flu-dashboard-2016). Flu Near You is an online crowd-sourced surveillance system that allows volunteers in the US and Canada to report weekly if they have experienced ILI symptoms [
15]. The majority (65%) of FNY respondents who report ILI do not seek medical attention, therefore, this system captures illness activity among a population not routinely included among the other healthcare-based systems considered in this paper.
The objectives of this paper are to assess whether these novel systems, EHR and crowd-sourced, correlate with traditional influenza surveillance systems across multiple spatial resolutions with different sample sizes and to determine the minimum number of visits or reports necessary in each of these novel systems to produce influenza activity estimates that resemble the historical trends recorded by traditional surveillance systems for a given spatial resolution.
Discussion
Traditional surveillance systems currently used by governmental agencies are robust, well accepted, and provide the best basis for tracking influenza activity. However, because estimates only include individuals who visit a medical care facility and there is typically a delay from onset of patient symptoms to final publication of reports, alternative data sources have the potential to minimize these delays in reporting and complement these traditional systems. Although there is still a time delay from onset of patient symptoms to presentation at a health-care provider, the EHR cloud-based system allows symptom reports to be aggregated in near-real time. On the other hand, the crowd-sourced system does not include the same time delay as health-care based systems and captures individuals who do not seek medical care. However, while participants have the option to report symptoms the same day as onset, most participants do not report until they receive the weekly reminder and data is typically aggregated once a week.
For both EHR and crowd-sourced ILI, as the number of total reports increases, the correlations with traditional ILI estimates from governmental agencies also increase. However, EHR data showed higher correlations with CDC ILINet and the number of viral-positive specimens compared to crowd-sourced data at similar spatial resolutions. EHR correlations with CDC ILINet are close to one, which shows that healthcare-based influenza surveillance with different data capture strategies lead to similar ILI incidence curves. Although both EHR and the CDC use data from patients seeking medical attention, the proportion of visit settings differs slightly between the two systems, with emergency department visits being under-represented in the EHR. On the other hand, crowd-sourced correlations with CDC ILINet never reach a correlation of one. Instead, crowd-sourced correlations converge to approximately 0.8–0.9, as shown using both empirical and theoretical approaches. A similar observation was observed when comparing methods of provider recruitment in Texas [
24]. This difference in correlation saturation may be a result of differences in the activity being measured (e.g. ILI reports out of all persons enrolled vs. visits with ILI out of the total number of patient visits) and the population under surveillance, as the crowd-sourced population includes individuals who may not seek medical attention. Based on preliminary analyses, we estimate that approximately 65% of the FNY population who reported ILI symptoms did not seek medical attention. The Italian crowd-sourced counterpart, INFLUWEB, has also reported that approximately two thirds of their participants did not seek medical assistance [
25]. Furthermore, studies in the US have shown that approximately 40% of individuals with ILI seek healthcare [
26]. The crowd-sourced population also differs by demographics. Females and middle-aged individuals are over-represented in the crowd-sourced population [
27]. In addition, crowd-sourced estimates can be affected by media attention and by user participation. For example, the large peak observed in January 2013 occurred after FNY was featured in NBC’s Nightly News with Brian Williams. Investigators have applied a few methods to adjust for these reporting biases, including dropping first reports and a spike-detector method [
15]. We did not adjust for these biases in this paper.
In general, both crowd-sourced and EHR ILI rates showed higher correlations with CDC ILINet compared to the number of viral-positive specimens at the national and regional resolutions (Additional file
1: Table S3). One interesting pattern to note is that when using the bootstrap resampling approach, crowd-sourced correlations with CDC laboratory confirmed influenza specimens reaches the saturation faster than correlations with CDC ILINet. This pattern is also evident at the regional resolution.
Based on the results from this study, we estimate that ILI rates from EHR and crowd-sourced data track traditional ILI estimates from governmental agencies at spatial resolutions that have at least 20,000 weekly EHR visits and 250 weekly crowd-sourced reports. Some spatial resolutions are not well represented in the included novel systems. During the 2015–16 influenza season, for example, 47 states were represented in this EHR network and 26 of these states reached the 20,000 threshold. Although all 50 states are represented in the crowd-sourced system, 32 states did not reach the 250 weekly report threshold during the 2015–16 influenza season. In addition, the geographic distribution of crowd-sourced reports shows large gaps of information especially in the middle and southern areas of the US, and participants tend to cluster around large urban areas, with especially large user bases in the greater metropolitan areas surrounding Boston, New York City, and San Francisco. Flu Near You has made recent efforts to recruit new users through online media campaigns through Facebook, and other previously successful recruitment strategies, such as encouraging current users to recruit friends and colleagues to join, [
28] can be easily employed.
Ideally, we would want to compare ILI rates from crowd-sourced reports to laboratory confirmed influenza cases in the general population. Currently, the CDC provides yearly estimates of seasonal influenza burden in the general population using laboratory-confirmed influenza-associated hospitalization rates from their Influenza Hospital Surveillance Network (FluSurv-NET). However, they do not provide weekly estimates to the public of laboratory-confirmed influenza burden. Although the mechanisms of capture differ between the syndromic systems, the general seasonal trends are similar and provide valuable information about changes in influenza activity.
Conclusions
Our findings suggest that both EHR and crowd-sourced ILI estimates correlate with ILI estimates from traditional influenza surveillance systems in various spatial resolutions with a sufficient number of visits or reports. Spatial resolutions with at least 250 mean weekly crowd-sourced reports display correlations higher than 0.5 with traditional influenza surveillance systems. Furthermore, spatial resolutions with approximately 20,000 weekly EHR visit counts consistently show correlations greater than 0.7 with traditional influenza surveillance systems. As the FNY user base and availability of EHR data are increased throughout the US, these internet-based surveillance tools may become a complementary way to timely monitor influenza activity, especially in populations who do not access health care systems, areas with limited surveillance data, and community based populations.