Background
With increasing medical costs, health care reformers and policy makers have turned to emergency department (ED) utilization as a potential source for cost savings. A relatively small number of patients, often called “frequent” or “high ED users”, have been an increasing focus because of their disproportionate share of ED visits and cost. When defined as 4 or more ED visits per year, frequent users accounted for 4.5 to 8 % of all ED patients and contributed 21 to 28 % of all ED visits [
1]. Prior interventions targeting frequent users did not achieve universally positive outcomes although some studies demonstrate reduced ED use [
2‐
6]. A clear framework including a consensus-based definition of frequent users and methods to accurately and consistently identify this population may improve the effectiveness of care management interventions.
Currently, however, a standardized definition for frequent ED users remains elusive. A single visit threshold has been used to differentiate frequent users from low ED users, and the visit threshold varies from as few as 3 to 12 or more annual visits, often without a clear rationale for the visit cut-point [
7‐
13]. Further, the majority of prior studies on frequent ED users focused on identifying existing frequent ED users [
7,
9,
11,
14], which is problematic as most frequent users in a given year will not remain frequent users in the next year. It has been shown that an individual who has 4 or more visits in a given year was only 28 to 38 % likely to be a frequent user the next year [
1]. Fertel et al. also showed that highly frequent use occurs for only a minority of ED patients, and then only for a discrete period [
15]. Roland et al. pointed out that the “regression to mean” phenomenon should not be ignored when evaluating interventions for frequent ED users [
16]. Therefore, blindly targeting most current frequent ED users for future interventions is inefficient because their heavy use of ED services may decrease without intervention. Since health care resources are limited, it is essential that interventions target patients whose heavy ED use will likely persist. Thus, the capacity to predict patients who are likely to sustain frequent future ED utilization can help address this problem by identifying patients who are most likely to generate future heavy ED use and costs.
Previously we reported that 2.8 million patients from 96 EDs in the state of Indiana within United States generated 7.4 million ED visits from 2008 to 2010, and the average number of visits was 2.6 visits per patient [
17]. We found that patients cross over to other ED institutions with great frequency, and about 3.3 % of the patients made more than 10 visits to Indiana EDs from 2008 to 2010 [
17]. In this study, we explore whether specific features contained within routinely gathered registration data could meaningfully predict a patient’s future ED utilization. If these features accurately predict future frequent ED users, then we can more effectively, target limited health care resources on this group, maximizing the benefit of the intervention. The purpose of this study is to assess the feasibility of using routinely gathered registration data to predict patients who will visit ED’s with high frequency.
Methods
This study was approved by Indiana State Department of Health (ISDH) data release committee and the Indiana University Institutional review board (USA).
Datasets
Data collected for this study were derived from the original Health Level-7 (HL7) version 2 registration transactions for ED encounters from 96 institutions participating in the Indiana Public Health Emergency Surveillance System (PHESS) between January 1, 2008 and December 31, 2010. The data is not publically available but can be accessed through the Regenstrief Institute Data Core (
https://www.regenstrief.org/hsr/research-programs/rcher/data-core/).
The processes for preparing ED encounter data as well as the details for each step were presented in our previous paper [
17]. Briefly, registration transactions were processed to ensure each transaction was unique and contained valid ED encounter data according to PHESS requirements and a set of heuristics drawn from Regenstrief’s long-term real-world experience operating a health information exchange. Unique ED encounters were established using data elements including person, place and time. The specific fields included [
1] healthcare institution (HL7 MSH-4), [
2] ED encounter date (HL7 PV1–44), and [
3] medical record number (HL7 PID-3). Transactions missing any of these fields could not be definitively and uniquely identified as an encounter and were excluded from the analysis.
Unique patients were identified using various combinations of patient demographics, including social security number, last and first name, gender, date of birth, telephone number, and zip code as determined by an open-source probabilistic record linkage software package [
18]. In this manner all ED encounters belonging to the same patient were linked, forming a “patient group.” A unique global patient identifier was assigned to each patient group. In total, we identified 7,447,521 unique ED encounters. Data available for analysis includes: age, sex, chief complaints, ZIP codes for patients’ address, and hospital ZIP codes. Patients’ global identifier was used to link visits across different hospital databases, including all ED visits regardless of disposition.
Predictive model
We developed multivariable logistic regression models. Patients with at least one ED visit in 2008 were used to predict ED visits in the years of 2009 and 2010. Patients who died before January 1, 2009 or had missing values in one or more covariates were excluded (<4.30 %). The final sample size was 1,272,367 patients. All variables were summarized at the patient level for model development.
Covariates
All covariates were determined based on the ED utilization data in 2008.
Age: age was determined at the time of the first ED visit, and divided into six subgroups: <5, 5–14, 15–24, 25–44, 45–64 and > =65 years.
Sex: male and female;
Visits in 2008: the total number of ED visits made in 2008 for each patient;
Chief complaints: the chief complaint syndromes were grouped into 11 categories: respiratory, gastrointestinal (GI), undifferentiated infection (UDI), influenza-like illness (ILI), lymphatic, skin, neurological, pain, dental, alcohol and musculoskeletal syndromes. These categories were used by other surveillance programs with slight modification [
19‐
21]. Chief complaints that could not be grouped into the above 11 syndromes were assigned to “unclassified”. The categories were then reviewed by two physicians (Grannis S, Finnel JT) and an epidemiologist. For each patient, the proportion of each chief complaint syndrome is determined through dividing the number of ED visits with a specific syndrome by the total number of ED visits that the patient had in 2008. Since one ED visit may have more than one syndrome, these percentages do not add up to 100 %.
Zip code centroid straight-line distances: The Perl library Geo::Distance was used to calculate the straight-line distances between geographic points from patients’ home to hospital based on zip code centroids of patient’s home address and hospital address. Distance was then grouped into 3 categories: <=5 miles, 5–20 miles and >20 miles. Since one patient may have multiple ED visits with different distance, we determined the proportion of ED visits falling into each of the three categories by dividing the number of ED visits with a specific distance category by the total number of ED visits that a patient made in 2008. Because the proportions for each of these three distance categories add up to 100 %, only two categories (<5 miles and >20 miles) were included in the analytic model.
Study outcome
The outcome was measured as dichotomized variable (frequent versus low ED user). Frequent ED users were investigated by using visit cut-points ranging from 8 to 16 visits over a two-year period (between 2009 and 2010). One model was fit for each cut point. Patients were defined as frequent ED users if their ED visits were equal to or higher than the visit cut-point, and were otherwise defined as low ED users.
The model’s performance was assessed for discrimination using the Receiver Operating Characteristic (ROC) curves. We balanced the goal of identifying all frequent ED utilizers with the intervention cost of incorrectly identifying frequent ED users by selecting a fixed sensitivity of 25 % to minimize the false positive rate. We then evaluated the specificity and positive predictive value (PPV) for each model at fixed sensitivity of 25 %. We also combined the false positive (FP) patients who had 8 or more visits with the true positive (TP) patients to obtain the “adjusted” positive cohort. The “adjusted” PPV was determined by dividing the “adjusted” positive group by the sum of TP and FP. Statistical analyses were conducted using SAS version 9.3 (SAS Corporation; Cary, North Carolina).
Discussion
The primary goals of this study were to evaluate the feasibility of using routinely available registration data to predict patients likely to use ED services frequently in the future and to develop strategies for improving the accuracy and efficiency of detecting frequent ED users. We demonstrate a strong association between predictor variables present in routine registration data and frequent ED use. The algorithm’s performance characteristics suggest that it is technically feasible to use routinely collected registration data to predict such use, and the model’s observed prediction accuracy may support identifying and intervening upon frequent ED users. Thus, such models may support more effective targeting of limited health care resources to patients who may maximally benefit from intervention.
Much of the literature studying frequent ED utilization has substantial limitations, which our study sought to address. First, some published studies used data from a limited number of ED’s and thus their broad generalizability is unclear [
7‐
12,
22,
23]. Although several statewide studies in United States explored ED visits across age, gender, health insurance groups and clinical characteristics between frequent and in-frequent ED users, most were descriptive in nature and few applied prediction models to identify frequent ED users [
24‐
27]. Second, some studies used survey or interview data and the quality and reliability of such data can be affected by survey response rates [
8‐
10]. Further, the cost, time and other resources involved in the interview may be prohibitive. Third, some studies focused on specific cohorts such as asthmatics or the elderly, and this limits the ability of policy makers and providers to determine whether unifying factors that could be targeted for intervention exist amongst a more general population of patients with frequent ED utilization. Lastly and most importantly, in many cases researchers focused on identifying existing frequent ED users instead of predicting future frequent ED utilization [
7,
9,
11,
14]. As shown in ours and others studies [
1,
15,
16], most patients do not remain frequent ED users over time and many naturally reduce their ED use without intervention (regression toward mean). Thus, predicting patients who are likely to sustain future frequent ED utilization will be necessary for improving the health of this vulnerable patient group.
Developing algorithms that accurately identify patients who are likely to frequently visit ED’s in subsequent years is a first step toward developing potential interventions to mitigate overuse. However, few studies have leveraged any approach or method to identify future frequent ED users [
4,
28‐
31]. In those studies, frequent ED users were defined with a threshold number of ED visits, e.g. 3 to 10 ED visits within the 12 months prior to the study period. In addition, the majority of the comparative cohort studies used a pre-and post-intervention design, where the population exposed to the intervention served as their own historical control groups, without recognizing the regression toward mean phenomenon, which might incorrectly inflate the effectiveness of interventions.
In our study, we developed a practical approach to predict future frequent ED users. The model predicting patients with 8 or more visits in the subsequent two years demonstrated reasonable discriminative power with an AUC of 0.84. As the threshold defining ‘frequent use’ increases, the corresponding AUC also increased. The model predicting frequent ED use of 16 or more visits in the subsequent two years showed good discrimination, with an AUC of 0.92. Strong predictor variables included visits in the baseline year, age, sex, zipcode centroid straight-line distance between home and hospital, and specific chief complaints, including respiratory, dental and alcohol syndromes. When comparing false positives to true positives and false negatives to true negatives, respectively, we noted that the variable “Number of visits in the baseline year” were very close, indicating that patients’ other features contained within routinely gathered registration data contributed additional discriminating power.
If the algorithm incorrectly flags patients as frequent utilizers, the resulting inefficiencies may offset potential savings from subsequent reduced ED utilization. Considering the trade-offs between (a) identifying the maximal number of subjects who are truly frequent ED use patients and (b) minimizing subjects incorrectly flagged as frequent ED use patients, we aimed to balance the cost of incorrectly identifying frequent ED patients by setting the prediction model’s sensitivity at 25 %. Although the models had PPVs around 60 %, a significant proportion of false positive patients actually had more than 8 ED visits in two years. The adjusted PPV for patients having 8 or more visits in the model that predicting frequent ED users as 16 or more visits is 81.9 %. To our knowledge, this is the first study to employ routine registration data to develop predictive algorithm to predict future frequent ED use. The prediction accuracy strongly suggests that it is feasible to apply routinely collected registration data for future frequent ED utilization prediction.
Limitations of our study include the following: First, we lacked comprehensive population level data for persons who did not use the emergency department. Therefore our analysis is limited to characterizing those individuals who present to emergency departments. Second, we did not include data such as patients’ socioeconomic status, since that data is not routinely captured in ED registration data. Third, the applicability of our model to ED registration data from other sites is not assessed, and the predictive performance of the models might be overrated. In the future, we seek to validate this approach against other datasets in a geographically distinct region. Finally, we only evaluated models with 25 % sensitivity as we aimed to balance the cost of ED utilization and the intervention support cost from incorrectly identified frequent ED users.
Competing interests
The authors have no competing interests to declare.
Authors’ contributions
JW collected the data, conceptualized and designed the study, performed all aspects of data analysis, interpreted the data, wrote and revised the paper critically; HX contributed to study design and data interpretation, reviewed and edited the paper; JF contributed to conception and design, reviewed and edited the paper; SG conceptualized and designed the study, interpreted the data, reviewed and edited the paper critically. All authors read and approved the final manuscript.