nach oben

Health Economics Review

Erschienen in:

Open Access 01.12.2023 | Research

Forecasting emergency department arrivals using INGARCH models

verfasst von: Juan C. Reboredo, Jose Ramon Barba-Queiruga, Javier Ojea-Ferreiro, Francisco Reyes-Santias

Erschienen in: Health Economics Review | Ausgabe 1/2023

Abstract

Background

Forecasting patient arrivals to hospital emergency departments is critical to dealing with surges and to efficient planning, management and functioning of hospital emerency departments.

Objective

We explore whether past mean values and past observations are useful to forecast daily patient arrivals in an Emergency Department.

Material and methods

We examine whether an integer-valued generalized autoregressive conditional heteroscedastic (INGARCH) model can yield a better conditional distribution fit and forecast of patient arrivals by using past arrival information and taking into account the dynamics of the volatility of arrivals.

Results

We document that INGARCH models improve both in-sample and out-of-sample forecasts, particularly in the lower and upper quantiles of the distribution of arrivals.

Conclusion

Our results suggest that INGARCH modelling is a useful model for short-term and tactical emergency department planning, e.g., to assign rotas or locate staff for unexpected surges in patient arrivals.

The views expressed in this article are those of the author and do not necessarily reflect those of Bank of Canada.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Background

The hospital emergency department (ED) is the basic unit providing an immediate response to emergency health problems. It is a core component in any health system in providing care for urgent and potentially serious pathological processes with a possible outcome of death or requiring immediate diagnosis and treatment to avoid pain. ED activity is both intense and very diverse, covering from immediately life-threatening pathologies (e.g., cardiorespiratory arrest) to serious or potentially serious illnesses requiring diagnosis or treatment in the hospital setting (e.g., polytrauma, acute myocardial infarction). EDs additionally deal with less serious emergencies that may require hospitalization for diagnosis (e.g., retinal detachment, pyelonephritis) and also provide initial treatment and observation without necessarily involving admission. EDs also serve around 40–50% who could feasibly be treated in primary care emergency centres or 24-h emergency care facilities with an intermediate resolution. About 15% of the population (elderly, frail or chronically ill patients) use hospital ED services on a recurring basis as a result of the aggravation of their pathologies [14, 29].

Patient arrival in EDs is uneven over time. Distribution over the days of the week is not uniform and, although there are variations from one centre to another, some days account for a clearly higher number of visits, e.g., Mondays (see [9, 20]. Likewise, distribution throughout the year is not uniform. Demand for care varied in relation to holiday periods (demographic movements), respiratory virus epidemics, climatic and atmospheric changes and social events [20]. Handling surges, which is the main challenge to ensuring efficient ED management and functioning [7, 16, 21, 26], is closely related to appropriate timing of treatment. ED and hospital resources therefore have to be planned with some built-in flexibility in order to adapt to changing and cyclical changes in the demand for services.

In addition to the quantitative aspects of patient arrivals in EDs, there is a great qualitative impact, given that ED diagnostic and therapeutic activities determine the subsequent evolution of admitted patients in terms of illness resolution, including length of stay, complications and patient satisfaction. Patient satisfaction/dissatisfaction with healthcare services in general is strongly conditioned by technical quality and, above all, by perceived ED quality, which determines perceptions of overall hospital performance [11].

To avoid congestion and facilitate appropriate delivery of medical services, efficient management of ED services requires accurate forecasting of patient inflows [3, 5, 8, 25, 31]. Forecasting is challenging, however, as daily and seasonal variations in patient arrivals are featured by a high degree of variability and overdispersion [20, 23]. Previous empirical research has extensively explored the dynamics of arrivals, mainly relying on Poisson and negative binomial models with different extensions (see, e.g., [3, 28, 34, 35]. However, whether arrivals can be predicted from both past mean values and past observations is still an open question. Nonetheless, information on past mean values and past observations could be useful not only to make accurate mean value forecasts, but also to make predictions at the lower and upper arrival distribution quantiles, critical for two reasons: (a) efficient healthcare resource allocation when patient arrival numbers are low, and (b) avoidance of the negative impact of patient overflows on healthcare quality.

The objective of this study is to explore whether past mean values and past observations are useful to forecast daily patient arrivals in an ED, using an integer-valued generalized autoregressive conditional heteroscedastic (INGARCH) model. This model was designed to describe integer-value series featured by small values and overdispersion, not appropriately addressed by autoregressive moving average (ARMA) models. Originally proposed by Grunwald et al. [15] and Heinen [17], an INGARCH process has a Poisson or a negative binomial conditional distribution, with a time-varying intensity parameter given by a linear function of its p-lagged values and its q recent observations. Sharing the spirit of generalized autoregressive heteroscedastic (GARCH) models, an INGARCH model, by efficiently using past patient inflow information and reflecting the dynamics of the volatility of arrivals, can potentially yield a better conditional distribution fit and forecast for patient arrivals.

Our empirical study focuses on daily arrivals, as this frequency is useful for both routine planning (e.g., on rotas) and tactical planning (e.g., decisions on contacting staff). Daily forecasts provide useful support for administrative decision making and gives early warning signals to efficiently handle available physical and human resources. Our empirical results for a large hospital conclude that the INGARCH model improves both in-sample and out-of-sample forecasts. In particular, the INGARCH model yields a better fit both in the mean and the tails of the arrival distribution, and thus reports more accurate forecasts for abrupt upward or downward movements in arrivals.

Our empirical evidence has implications to support ED management and planning decisions, given that our modelling approach provides more accurate predictions than those based on average counts of patient arrivals, in that it reflects all available past information. Our forecasting model is especially useful when patient inflow is intense, as this is when efficient resource allocation is critical to delivering healthcare quality. Given that ED arrival data is structured similarly across different hospitals, our evidence could be considered generalizable to other hospital EDs.

The paper is organized as follows: Material and Methods and Results sections describe the models and the data, respectively, and Discussion section presents and discusses the empirical results. Finally, Conclusion section concludes the paper.

Material and methods

The INGARCH model

To account for past mean values and past observations regarding ED arrivals, we could adapt the count data nature of ED arrivals to a transformation, e.g., using a logarithmic transformation, and then use standard estimation procedures. However, this modelling strategy to deal with count data has several drawbacks regarding inference and negative predicted values [36], described and summarized in Table 1.

Table 1

Models for non-count data adapted for count data and their limitations

Model	Advantages	Disadvantages
Normal linear regression $y=x\beta +\epsilon$ $\epsilon \sim N\left(0,{\sigma }^{2}\right)$	Normal distribution approximates the Poisson distribution if the mean is higher than 20	No possible inference on single outcomes The model allows for a negative outcome The prediction is not coherent, i.e., the forecast is not an integer-valued outcome
Log-linear model $\mathrm{log}\left(y\right)=x\beta +\epsilon$ $\epsilon \sim N\left(0,{\sigma }^{2}\right)$	The variable y is modelled as a log-normal variable	The zeros in the data have to be deleted to estimate this model, which leads to endogenous sample selection problems The prediction is not coherent, i.e., the forecast is not an integer-valued outcome There is a restriction on the conditional variance, i.e., it must be quadratic in the conditional expectation.*
Log-linear model with constant c to deal with zeros $\mathrm{log}\left(y+c\right)=x\beta +\epsilon$ $\epsilon \|x\sim N\left(0,{\sigma }^{2}\right)$	The model can be estimated even if there are zero elements in the dataset	The log(y) is not linear in x, which introduces bias in the estimation of the model The prediction is not coherent, i.e., the forecast is not an integer-valued outcome
Non-linear model $y=\mathrm{exp}\left(\mathrm{x\beta }\right)+\upepsilon$ $\epsilon \sim N\left(0,{\sigma }^{2}\right)$	There is no problem in dealing with zero values	The model allows for a negative outcome The prediction is not coherent, i.e., the forecast is not an integer-valued outcome
Ordered probit and logit state equation: ${y}^{}=x\beta +\epsilon$ Observation equation: $y=0\;\text{if}\;{y}^{}<{\alpha }_{0}$ $y=1\;\text{if}\;{\alpha }_{0}\le {y}^{}<{\alpha }_{1}$ $y=2\;\text{if}\;{\alpha }_{1}\le {y}^{}<{\alpha }_{2}$ $\vdots$	The integer-valued structure of the data is considered The prediction can be coherent, i.e., if we wanted to forecast the future median value, it would be an integer-valued outcome	The underlying count process is not reflected The forecast is limited to values already observed in the data Complexity is excessive when the number of counts is high

^*If a variable y follows a log-normal distribution, the following identity holds: ${\varvec{V}}{\varvec{a}}{\varvec{r}}\left({\varvec{y}}|{\varvec{x}}\right)=\left({{\varvec{e}}}^{{{\varvec{\sigma}}}^{2}}-1\right){\left[{\varvec{E}}\left({\varvec{y}}|{\varvec{x}}\right)\right]}^{2}$

In this research we use the INGARCH model, which is the integer-valued counterpart to the conventional GARCH model [33], where the IN indicates the integer-valued structure of the data [32]. This model is also referred to as the autoregressive conditional Poisson model [18] or the Poisson autoregressive model [13].

A count variable ${Y}_{t}$ follows an INGARCH(p, q) model if its conditional Poisson distribution has a conditional mean ${\lambda }_{t}$ as given by the following recursion:

$${\lambda }_{t}=\omega + \sum\nolimits_{i=1}^{p}{\alpha }_{i}{Y}_{t-i}+\sum\nolimits_{j=1}^{q}{\beta }_{j}{\lambda }_{t-j}$$

(1)

where $\omega >0$ and ${\alpha }_{1}, \dots , {\alpha }_{p},{\beta }_{1}, \dots ,{\beta }_{q}\ge 0$ and $\sum_{i=1}^{p}{\alpha }_{i}+\sum_{j=1}^{q}{\beta }_{j}<1$ for stationarity reasons [10]. Thus, the conditional Poisson distribution evolves over time with a mean parameter that depends on its previous values and on the past values of the studied variable. This distribution is, therefore, conditional equidispersed but unconditional overdispersed.

For the particular case of an INGARCH(1,1) model (see [10, 19], we have $E\left({Y}_{t}|{Y}_{t-1}\right)={\lambda }_{t}=Var\left({Y}_{t}\right|{Y}_{t-1})$. Applying the law of iterated expectations it follows that $E\left({Y}_{t}\right)=E\left(E\left({Y}_{t}|{Y}_{t-1}\right)\right)=E\left({\lambda }_{t}\right)=\frac{\omega }{1-\alpha -\beta }$. Finally, using the law of total variance, it follows that $Var\left({Y}_{t}\right)=E\left(Var\left({Y}_{t}|{Y}_{t-1}\right)\right)+Var\left(E\left({Y}_{t}|{Y}_{t-1}\right)\right)=E\left({\lambda }_{t}\right)+Var\left({\lambda }_{t}\right)>E\left({\lambda }_{t}\right)$ and $Var\left({\lambda }_{t}\right)=\frac{1-{\left(\alpha +\beta \right)}^{2}+{\alpha }^{2}}{1-{\left(\alpha +\beta \right)}^{2}}E\left({\lambda }_{t}\right)$.

The INGARCH model enables a long memory process to be modelled parsimoniously, where the conditional mean depends on the whole history of the process. For the particular case of the INGARCH(1,1), we have [12]:

$${\lambda }_{t}=\alpha \sum\nolimits_{k=1}^{t}{\beta }^{k-1}{Y}_{t-k}+{\beta }^{t}{\lambda }_{0}+\omega \frac{1-{\beta }^{t}}{1-\beta },$$

(2)

where ${\lambda }_{0}$ could be estimated as an additional parameter [10].

Alternative specifications to the Poisson INGARCH model are the negative binomial INGARCH model, where the recursion in Eq. (1) refers to $\mathrm{log}\left({\lambda }_{t}\right),$ the non-linear Poisson autoregression and a model that includes the covariate information in Eq. (1) [1]. The main advantage of assuming a negative distribution instead of a Poisson distribution lies in the greater flexibility, as the variance may be larger than the mean. Indeed, in the Poisson model we have ${\lambda }_{t}={\mu }_{t}={\sigma }_{t}^{2}$, while for the negative binomial model we have ${\sigma }_{t}^{2}=\frac{{\mu }_{t}}{\pi }=\frac{\nu \left(1-\pi \right)}{{\pi }^{2}}$ and, depending on the model specification, the dynamics are set in $\pi$ [38] or in $\nu$ [37]. Interestingly, the Poisson could be seen as a particular case of the negative binomial when $\pi \to 1$ and $\nu \to \infty$.

For the case of the negative binomial distribution, $NB(\nu , \pi )$ with $\nu >0$ and $0<\pi <1$, the time-varying parameter $\pi$ is modelled via the equation ${\pi }_{t}=\frac{1}{1+\frac{{\lambda }_{t}}{\nu }}$ [38], and the dynamics of the parameter $\nu$ through the equation ${\nu }_{t}=\frac{{\lambda }_{t}\pi }{1-\pi }$ [37].¹ The probability distribution mass of $Y$, where $Y\sim NB(\nu , \pi )$ is

$$P\left(Y=y\right)={\left(1-\pi \right)}^{y}{\pi }^{\nu }\left(\begin{array}{c}y+\nu -1\\ \nu \end{array}\right).$$

(3)

The parameters of those models are estimated by maximum likelihood, where the objective function is given by $\sum_{t=1}^{T}\mathrm{log}\left(P\left(Y=y|{\theta }_{t}\right)\right)$, with ${\theta }_{t}={\lambda }_{t}$ for the Poisson distribution (see Eq. (1)), ${\theta }_{t}=\left({\pi }_{t},\nu \right)$ for the negative binomial model by Zhu [38], and ${\theta }_{t}=\left(\pi ,{\nu }_{t}\right)$ for the negative binomial by Xu [37].

To evaluate forecast accuracy, we use the mean squared error (MSE) and the mean absolute error (MAE), which compare the mean and median, respectively, with the real number of arrivals. The MSE is computed as:

$$MSE=\frac{1}{T-k}\sum\nolimits_{t=k}^{T}{\left({y}_{t}-{\overline{y} }_{t|t-1}\right)}^{2},$$

(4)

where ${y}_{t}$ denotes patient arrivals at time t, and ${\overline{y} }_{t|t-1}$ is the mean number of patient arrivals at time t forecasted with the data available at time t-1. The MAE is computed as:

$$MAE=\frac{1}{T-k}\sum\nolimits_{t=k}^{T}{|y}_{t}-{\widetilde{y}}_{t|t-1}|,$$

(5)

where ${\widetilde{y}}_{t|t-1}$ is the median of the patient arrival distribution at time t, built using the data obtained at t-1.

In addition, to evaluate the fit of the future entire distribution with respect to real patient arrival data, we compute the probability integral transformation (PIT) [6, 22]. Relative frequencies are obtained as the ratio between the forecasted PIT of two consecutive quintiles and the probability of a perfect data fit,the closer the bars to one, the better the fit of forecasted values. Consecutive quintiles are given by $\widetilde{F}\left(\frac{j}{10}\right)-\widetilde{F}\left(\frac{j-1}{10}\right)$ for j = 1,…,10, where:

$$\widetilde{F}\left(u\right)=\left\{\begin{array}{cc}\begin{array}{c}0 \\ \frac{u-F\left(k-1|{I}_{t-1}\right)}{F\left(k|{I}_{t-1}\right)-F\left(k-1|{I}_{t-1}\right)}\\ 1 \end{array}& \begin{array}{c}u\le F\left(k-1|{I}_{t-1}\right)\\ F\left(k-1|{I}_{t-1}\right)\le u\le F(k|{I}_{t-1})\\ u\ge F\left(k|{I}_{t-1}\right)\end{array}\end{array}\right.$$

(6)

where $k>0$ and $F(\cdot )$ is the predictive distribution.

We also include a threshold that does not reject the null hypothesis of the data coming from a uniform (0,1) distribution. Intervals are created similarly to the threshold of a backtesting exercise. We assume that each observation has 1/10 probability of being in each bar, so the distribution of the observations in the PIT histogram reflects a bin (T, 0.1), where T indicates the number of out-of-sample observations. This technique allows us to check whether the data structure is fitted correctly in short sample series. To our knowledge, there is no previous study of count models that uses a statistical criterion for small samples to evaluate the PIT histogram.

Data

The data corresponds to daily arrivals in the ED of a large 1100-bed university clinical hospital in Santiago de Compostela (Spain) during January 2015 to December 2020, with a catchment population of about 450,000 people in that period. ED human resources include 36 doctors, 57 nurses and 42 clinical assistants, while physical resources include 21 reclining chairs, a critical room with four monitored stations for vital emergencies and a monitor room with six monitored stations for serious emergencies or patients requiring monitored observation. The ED applies Manchester triage, which classifies and colour codes patients into five levels according to urgency. Of the patients who attend the ED, 22.04% are admitted to hospital wards, 77.1% are discharged home, 0.48% are transferred to another hospital, 0.21% die in the ED and 0.17% request voluntary hospital discharge. Figure 1A shows that inflow seemed to show a seasonal trend, but was at a minimum during the early COVID crisis, as reflected in the long left tail in the histogram in Fig. 1B, and as reflected in the negative skewness reported in Table 2. Before the COVID pandemic, the mean number of daily arrivals was around 400, but structural change since then has reduced that number to around 300, and the mean number of monthly and annual arrivals is 12,207 and 146,483, respectively. Table 2 shows that since the variance is much larger than the mean, a model that takes into account this overdispersion is required, e.g., a negative binomial model. The fact that the number of entries is far from zero also indicates that zero-inflated models should be ruled out.

Table 2

Four first moments of the number of ED arrivals for the period 2015–2020

	2015–2020	2015–2019	2019–2020
mean	400.96	419.01	364.81
variance	4836.52	2017.46	8530.98
skewness	-1.17	-0.05	-0.54
kurtosis	4.90	3.11	2.49

Figure 2 depicts trends that need to be considered when modelling the number of arrivals. Figure 2A indicates that the Monday arrival rate is considerably greater than that of the remaining weekdays, whereas the weekend rate is much lower than the workday rate. Figure 2B depicts a higher number of arrivals in the first (spring) and fourth (winter) quarters compared to the second (summer) and third (autumn) quarters of the year.

Results

Empirical evidence

Taking into account the features of daily ED arrivals, we use the INGARCH model as it allows changes in the count distribution to be captured by considering changes in the mean, as reported by Fig. 1 Panel A. Specifically, we consider an INGARCH(1,1) model with a negative binomial distribution to capture the overdispersion in the data, i.e., the variance is higher than the mean, as reported in Table 2. Furthermore, we use deterministic covariates to identify the increase in arrivals on Mondays and in the first and fourth quarter, and the decrease in arrivals at weekends and after the COVID outbreak. To avoid overfitting problems, we keep model parameterization to a minimum. Equation (7) presents the evolution of the conditional mean of a negative binomial model, i.e., the Zhu [38] and Xu [37] model specifications, or the conditional mean of a Poisson model.² Hence $\mathrm{X}=[{I}_{Monday}, {I}_{Weekend}, {I}_{Winter}, {I}_{COVID}]$, and thus:

$${\lambda }_{t}=\omega + \alpha {Y}_{t-1}+\beta {\lambda }_{t-1}+\gamma {X}_{t|t-1}.$$

(7)

Given that $E\left({\lambda }_{t}\right)=\frac{\omega }{1-\alpha -\beta }$, in order to have comparable estimates for the exogenous variables in Eq. (7) across different model specifications, we set the parameter $\omega$ to be equal to $(1-\alpha -\beta$) multiplied by the mean number of arrivals, computed by discarding the dates that are considered within the exogenous variables, i.e., the mean number of arrivals on days that are not Monday, the weekend, or winter (Q4), or after the COVID outbreak (after 13 March 2020, when the Spanish government declared a state of alarm).

Table 3 presents the parameters of the Zhu [38], Xu et al. [37] and Poisson models for $NB(\nu , \pi )$, where parameter $\theta$ in Table 2 refers to parameter $\nu$ for Zhu [38] and to parameter $\pi$ for Xu et al. [37]. Empirical estimates show that, consistent with the above-mentioned descriptive features of the data, the Monday effect is positive and significant, while the weekend effect and the winter effect are both negative and significant. Finally, the COVID pandemic had a negative impact on mean ED arrivals, consistent with the fall in hospital activity except for COVID pathologies. Estimates of the AR and MA parameters show that those effects are positive and statistically significant, indicating that both past mean values and past observations are useful in depicting the conditional distribution of patient arrivals and, thus, in forecasting those arrivals. This evidence holds for both the Zhu [38] and the Xu et al. [37] model specifications. Finally, we obtain the Pearson residuals for all the different model specifications and compute the autocorrelations and the cumulative periodogram, confirming that those residuals are white noise as shown in Figs. 3.

Table 3

Parameters estimates and standard deviation (in parenthesis) for the model in Eq. (7)

	$\mathrm{\alpha }$	$\beta$	$\theta$	${\gamma }_{Winter}$	${\gamma }_{Monday}$	${\gamma }_{Weekend}$	${\gamma }_{COVID}$	LogLik	AIC	BIC
Zhu [38]	0.27 ***	0.68 ***	235.14 ***	-0.00 ***	0.18 ***	-0.07 ***	-0.03 ***	-10,726	21,467	21,507
Zhu [38]	(0.01)	(0.01)	(0.01)	(0.01)	(0.01)	(0.01)	(0.01)	-10,726	21,467	21,507
Xu et al. [37]	0.26 ***	0.67 ***	0.37 ***	0.37 **	0.17 ***	-0.07 ***	-0.03 ***	-10,723	21,461	21,500
Xu et al. [37]	(0.01)	(0.01)	(0.01)	(0.00)	(0.01)	(0.00)	(0.00)	-10,723	21,461	21,500
Poisson	0.18 ***	0.77 ***		-0.00 ***	0.17 ***	-0.07 ***	-0.02 ***	-11,537	23,086	23,120
Poisson	(0.01)	(0.01)		(0.00)	(0.00)	(0.00)	(0.00)	-11,537	23,086	23,120

***, **, and * indicate that the parameter is significant at 1%, 5% and 10%, respectively. The parameters are estimated using the full sample. Parameter $\omega$ in Eq. (7) is obtained by weighting, by (1 $-\mathrm{\alpha }-\upbeta$), the mean of the number of arrivals on days not affected by the dummies

Figure 4 depicts the PIT for the Zhu [38], Xu et al. [37] and Poisson models, showing that all the bars from Xu et al. [37] are within the red lines (indicating the null hypothesis of being uniformly distributed at 99%), but not those for the Zhu [38] model. Interestingly, the Poisson PIT is U-shaped, indicating that the restriction imposed by this model, i.e., the conditional mean equals the conditional variance, results in a failure to forecast lower and upper quantiles of the distribution of arrivals.

Finally, to mitigate concerns on overfitting, we run an out-of-sample evaluation for a rolling window of all the data prior to the day we want to forecast. Results for the comparison of each model in terms of the one-day forecast of the number of arrivals are reported in Table 4, which shows the in-sample (2015–2018) and out-of-sample (2019–2020) evidence for those metrics. Empirical estimates show that the Xu et al. [37] model yields lower MSE and MAE values for both the in-sample and out-of-sample periods. In addition, Table 5 shows that, according to Pearson’s residual autocorrelation, the Zhu [38] model also yields a good fit for the out-of-sample period.

Table 4

In-sample and out-of-sample MSE and MAE for ED arrivals

	MSE		MAE
	In-sample (2015–2018)	Out-of-sample (2019–2020)	In-sample (2015–2018)	Out-of-sample (2019–2020)
Zhu [38]	1251.28	1020.70	27.74	25.16
Xu et al. [37]	1102.80	979.06	25.91	24.52
Poisson	1102.00	981.08	25.94	24.58

The one-day-ahead forecast is estimated using the information available up to the previous day

Table 5

Pearson’s autocorrelation from the raw data and the residuals from the models in the out-of-sample periods (2019–2020)

	Raw data	Zhu	Xu	Poisson
correlation	0.85	-0.03	-0.07	0.03
p-value	0.0000	0.4863	0.0597	0.2103

Figure 5 shows the one-day-ahead out-of-sample forecast of the number of arrivals in the out-of-sample period 2019–2020 using the Xu [37] model, given that this is the best forecaster. The solid line indicates the median and the dots reflect the observations (i.e., arrivals), while the different shades of blue reflect confidence intervals at different levels. Graphical evidence confirms the goodness of the model forecasting capacity.

Discussion

Our evidence has clear implications in terms of cost minimization, as better predictions at the tails of the ED arrival distribution contribute to reduced costs through better workforce management for different circumstances. Likewise, a better analysis of ED patient arrivals allows timely care without delays, leading to improved survival rates, reduced average hospital stay and reduced readmissions of patients admitted to the ED, all of which economically translates into cost reductions.

Our evidence is related with previous studies as follows. To forecast waiting times in an emergency department, Benevento et al. [4] evaluate several machine learning techniques, including Lasso, Random Forest, Support Vector Regression, Artificial Neural Network and Ensemble methods. They define as additional predictors new variables based on the queues, which captured the situation of hospital emergencies, and show that Random Forest is a reasonable compromise solution. Our study adds to this analysis by exploring how the INGHARCH model is able to capture the dynamics of arrivals to a hospital emergency department.

Similarly, Loureiro et al. [24] evaluate the application of the CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology to a demonstration case of queue waiting time prediction, with the objective of studying a machine learning (ML) method for estimating queue waiting time. The computational experiments were based on two main validation procedures: a standard cross-validation and a sliding window scheme. Overall, competitive and quality results were obtained using an AutomatedML (AutoML) algorithm fed with newly engineered features. In fact, the AutoML model proposed by the authors produces a small error (5 to 7 min), while requiring a reasonable computational effort. With less computational effort, the model presented in the current paper allows a data fit whose result does not differ too much from the one presented by the aforementioned Loureiuro et al.

von Wagner et al. [30] show how to accurately and automatically characterize patient flow in an emergency department using a combination of data from a real-time locating system (RTLS) and other traditional hospital information systems, such as electronic medical records (EMR) and laboratory information systems. The hospital can use the information to identify bottlenecks and to develop strategies to optimize patient flows. Those authors used different performance indicators, such as total length of stay, to assess Emergency Department time tasks, which is consistent with our study. One of their main conclusions is that there is a large difference between length of stay using only electronic medical record data and that calculated by combining data from electronic medical records and real-time location systems; a limitation we also found in our study.

Overall, our results suggest that INGARCH modelling is a useful support for short-term ED planning to assign rotas or locate staff for unexpected surges in patient arrivals. Improved forecasting of ED arrivals is a first step to implementing useful real-time management algorithms that offer solutions to complex ED management, in terms of both resource use and health implications for patients. Furthermore, better forecasting of ED arrivals is useful to predict hospital admissions and the impact of ED arrivals on bed utilization and length of stay [27]. However, a task we leave for future research is how ED arrivals and their forecast through INGARCH models could ultimately shape hospital admissions and bed utilization.

Conclusions

Hospital EDs experience fluctuating and sometimes unexpected demand pressures, which complicates the efficient deployment of resources and potentially affect the quality of healthcare provision. Therefore, modelling and forecasting ED arrivals is critical to deal with inflows to EDs. The usefulness of INGARCH models to predict daily ED arrivals is that they can take into account past mean values and past observations in reflecting the mean parameter of the conditional negative binomial distribution, and can also characterize temporal dynamics in the volatility of patient arrivals.

Our in-sample and out-of-sample empirical results for patient arrivals at a large Spanish university hospital confirm that the INGARCH model yields better results that the Poisson model, particularly for the lower and upper quantiles of the forecasted distribution of arrivals. The fact that an INGARCH models yields a better fit for the extreme quantiles is particularly useful for management decision-making regarding resource allocation, both when a surge in arrivals may negatively affect healthcare, or when a drop in arrivals may render resources spare. Likewise, the variability of patient arrivals is well informed by INGARCH model estimations.

Acknowledgements

We would like to thank the editor and three anonymous referees for constructive comments that improved the quality of this article. Juan C. Reboredo acknowledges financial support from Agencia Estatal de Investigación (Ministerio de Ciencia, Innovación y Universidades) under research project with reference PID2021-124336OB-I00 co-funded by the European Regional Development Fund (ERDF/FEDER).

Not applicable

Permission to reproduce material from other sources

Not applicable.

Declarations

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Although the original negative binomial parametrization in those studies does not reflect the conditional mean, we keep the dynamics of the conditional mean in order to link these extensions to the Poisson case (in line with [33].

We also considered a log-linear specification as in Agosto and Giudici [2], given that it allows for negative dependence. However, the log-specification yielded poorer results than the specification presented here – a result that can be explained by the positive correlations in our data (see Table 5. This evidence is available from the authors upon request.

Agosto A, Cavaliere G, Kristensen D, Rahbek A. Modeling corporate defaults: poisson autoregressions with exogenous covariates (PARX). J Empir Financ. 2016;38:640–63.CrossRef

Agosto A, Giudici P. A poisson autoregressive model to understand covid-19 contagion dynamics. Risks. 2020;8(3):1–8.CrossRef

Asheim A, Bjørnsen LPB, Næss-Pleym LE, Uleberg O, Dale J, Nilsen SM. Real-time forecasting of emergency department arrivals using prehospital data. BMC Emerg Med. 2019;19(42):1–6.

Benevento E, Aloini D, Squicciarini N. Towards a real-time prediction of waiting times in emergency departments: A comparative analysis of machine learning techniques. Int J Forecast. 2023;39(1):192–208.CrossRef

Choudhury A, Urena E. Addressing overcrowding and emergency department management: a time series analysis. Br J Healthc Manag. 2020;26(1):34–43.CrossRef

Czado C, Gneiting T, Held L. Predictive model assessment for count data. Biometrics. 2009;65(4):1254–61.CrossRefPubMed

De Santis A, Giovannelli T, Lucidi S, Messedaglia M, Roma M. Determining the optimal piecewise constant approximation for the nonhomogeneous Poisson process rate of Emergency Department patient arrivals. Flex Serv Manuf J. 2022;34:979–1012.CrossRef

Duarte D, Walshaw C, Ramesh NA. Comparison of time-series predictions for healthcare emergency department indicators and the impact of COVID-19. Appl Sci. 2021;11:3561.CrossRef

Duvald I, Moellekaer A, Boysen MA, Vest-Hansen B. Linking the severity of illness and the weekend effect: a cohort study examining emergency department visits. Scand J Trauma Resusc Emerg Med. 2018;26(1):72.CrossRefPubMedPubMedCentral

10.

Ferland R, Latour A, Oraichi D. Integer-valued GARCH process. J Time Ser Anal. 2006;27(6):923–42.CrossRef

11.

Ferreira DC, Vieira I, Pedro MI, Caldas P, Varela M. Patient satisfaction with healthcare services and the techniques used for its assessment: a systematic literature review and a bibliometric analysis. Healthcare. 2023;11:639.CrossRefPubMedPubMedCentral

12.

Fokianos K. Some recent progress in count time series. Statistics. 2011;45(1):49–58.CrossRef

13.

Fokianos K, Rahbek A, Tjøstheim D. Poisson autoregression. J Am Stat Assoc. 2009;104(488):1430–9.CrossRef

14.

Fry M, Fitzpatrick L, Considine J, Shaban RZ, Curtis K. Emergency department utilisation among older people with acute and/or chronic conditions: a multi-centre retrospective study. Int Emerg Nurs. 2018;37:39–43.CrossRefPubMed

15.

Grunwald GK, Hyndman RJ, Tedesco L, Tweedie RL. Non-Gaussian conditional linear AR(1) models. Aust N Z J Stat. 2000;42:479–95.CrossRef

16.

Harper A, Mustafee N. A hybrid modelling approach using forecasting and real-time simulation to prevent emergency department overcrowding. In Proceedings of the Winter Simulation Conference (WSC '19). IEEE Press, 1208–1219. 2020.

17.

Heinen A. Modelling time series count data: an autoregressive conditional Poisson model. CORE Discussion Paper2003/62, Université Catholique de Louvain. 2003.

18.

Heinen A. Modelling time series count data: an autoregressive conditional poisson model. SSRN Electronic J. 2011a;1–38. https://doi.org/10.2139/ssrn.1117187

19.

Heinen A. Modelling time series count data: an autoregressive conditional poisson model. SSRN Electron J. 2011. https://doi.org/10.2139/ssrn.1117187.CrossRef

20.

Hitzek J, Fischer-Rosinský A, Möckel M, Kuhlmann SL, Slagman A. Influence of weekday and seasonal trends on urgency and in-hospital mortality of emergency department patients. Front Public Health. 2022;10: 711235.CrossRefPubMedPubMedCentral

21.

Hoot NR, Aronsky D. Systematic review of emergency department crowding: causes, effects, and solutions. Ann Emerg Med. 2008;52(2):126–36.CrossRefPubMedPubMedCentral

22.

Jung RC, Tremayne AR. Useful models for time series of counts or simply wrong ones? AStA Advances in Statistical Analysis. 2011;95(1):59–91.CrossRef

23.

Kim S, Whitt W. Are call center and hospital arrivals well modeled by nonhomogeneous Poisson processes? Manuf Service Oper Manag. 2014;16(3):464–80.CrossRef

24.

Loureiro C, Pereira PJ, Cortez P, Guimarães P, Moreira C, Pinho A. Predicting Multiple Domain QueueWaiting Time via Machine Learning. International Conference on Computational Science and Its Applications, ICCSA 2023: Computational Science and Its Applications: 2023;404–421

25.

McCarthy ML, Zeger SL, Ding R, Aronsky D, Hoot NR, Kelen GD. The challenge of predicting demand for emergency department services. Acad Emergency Med. 2008;15(4):337–46.CrossRef

26.

Morley C, Unwin M, Peterson GM, Stankovich J, Kinsman L. Emergency department crowding: a systematic review of causes, consequences and solutions. PLoS ONE. 2018;13(8): e0203316.CrossRefPubMedPubMedCentral

27.

Reyes-Santias F, Reboredo JC, de Assis EM, Rivera-Castro MA. Does length of hospital stay reflect power-law behavior? A q-Weibull density approach. Physica A. 2021;568: 125618.CrossRef

28.

Santi L, Golinelli D, Tampieri A, Farina G, Greco M, Rosa S, et al. Non-COVID-19 patients in times of pandemic: Emergency department visits, hospitalizations and cause-specific mortality in Northern Italy. PLoS One. 202122;16(3):e0248995. https://doi.org/10.1371/journal.pone.0248995.

29.

Van den Heede K, Van de Voorde C. Interventions to reduce emergency department utilisation: a review of reviews. Health Policy. 2016;120(12):1337–49.CrossRefPubMed

30.

von Wagner M, Queck A, Beekers P, Tolhuizen L, Synnatschke A, Boesing J, Chatterjea S. Towards accurate and automatic emergency department workflow characterization using a real-time locating system. Int J Healthc Manag. 2023;1–12. https://doi.org/10.1080/20479700.2023.2172829.

31.

Wargon M, Guidet B, Hoang TD, Hejblum GA. Systematic review of models for forecasting the number of emergency department visits. Emerg Med J. 2009;26(6):395–9.CrossRefPubMed

32.

Weiss CH. Modelling time series of counts with overdispersion. Stat Methods Appl. 2009;18(4):507–19.CrossRef

33.

Weiss CH. An introduction to discrete-valued time series. In An Introduction to Discrete-Valued Time Series. 2017. https://doi.org/10.1002/9781119097013.CrossRef

34.

Whitt W, Zhang X. A data-driven model of an emergency department. Operations Research for Health Care. 2017;12(1):1–15.CrossRef

35.

Whitt W, Zhang X. Forecasting arrivals and occupancy levels in an emergency department. Operations Research for Health Care. 2019;21:1–18.CrossRef

36.

Winkelmann R. Econometric analysis of count data. In Econometric Analysis of Count Data. 2008. https://doi.org/10.1007/978-3-540-78389-3.CrossRef

37.

Xu HY, Xie M, Goh TN, Fu X. A model for integer-valued time series with conditional overdispersion. Comput Stat Data Anal. 2012;56(12):4229–42.CrossRef

38.

Zhu F. A negative binomial integer-valued GARCH model. J Time Ser Anal. 2011;32(1):54–67.CrossRef

Titel: Forecasting emergency department arrivals using INGARCH models
verfasst von: Juan C. Reboredo
Jose Ramon Barba-Queiruga
Javier Ojea-Ferreiro
Francisco Reyes-Santias
Publikationsdatum: 01.12.2023
Verlag: Springer Berlin Heidelberg
Erschienen in: Health Economics Review / Ausgabe 1/2023
Elektronische ISSN: 2191-1991
DOI: https://doi.org/10.1186/s13561-023-00456-5

Model	Advantages	Disadvantages
Normal linear regression \(y=x\beta +\epsilon\) \(\epsilon \sim N\left(0,{\sigma }^{2}\right)\)	Normal distribution approximates the Poisson distribution if the mean is higher than 20	No possible inference on single outcomes The model allows for a negative outcome The prediction is not coherent, i.e., the forecast is not an integer-valued outcome
Log-linear model \(\mathrm{log}\left(y\right)=x\beta +\epsilon\) \(\epsilon \sim N\left(0,{\sigma }^{2}\right)\)	The variable y is modelled as a log-normal variable	The zeros in the data have to be deleted to estimate this model, which leads to endogenous sample selection problems The prediction is not coherent, i.e., the forecast is not an integer-valued outcome There is a restriction on the conditional variance, i.e., it must be quadratic in the conditional expectation.*
Log-linear model with constant c to deal with zeros \(\mathrm{log}\left(y+c\right)=x\beta +\epsilon\) \(\epsilon \|x\sim N\left(0,{\sigma }^{2}\right)\)	The model can be estimated even if there are zero elements in the dataset	The log(y) is not linear in x, which introduces bias in the estimation of the model The prediction is not coherent, i.e., the forecast is not an integer-valued outcome
Non-linear model \(y=\mathrm{exp}\left(\mathrm{x\beta }\right)+\upepsilon\) \(\epsilon \sim N\left(0,{\sigma }^{2}\right)\)	There is no problem in dealing with zero values	The model allows for a negative outcome The prediction is not coherent, i.e., the forecast is not an integer-valued outcome
Ordered probit and logit state equation: \({y}^{}=x\beta +\epsilon\) Observation equation: \(y=0\;\text{if}\;{y}^{}<{\alpha }_{0}\) \(y=1\;\text{if}\;{\alpha }_{0}\le {y}^{}<{\alpha }_{1}\) \(y=2\;\text{if}\;{\alpha }_{1}\le {y}^{}<{\alpha }_{2}\) \(\vdots\)	The integer-valued structure of the data is considered The prediction can be coherent, i.e., if we wanted to forecast the future median value, it would be an integer-valued outcome	The underlying count process is not reflected The forecast is limited to values already observed in the data Complexity is excessive when the number of counts is high

Springer Medizin

Abstract

Background

Objective

Material and methods

Results

Conclusion

Publisher’s Note

Background

Material and methods

The INGARCH model

Data

Results

Empirical evidence

Discussion

Conclusions

Acknowledgements

Patient consent statement

Permission to reproduce material from other sources

Declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Weitere Artikel der Ausgabe 1/2023

The fragile healthcare system in Lebanon: sounding the alarm about its possible collapse

Cost-effectiveness in unstable economies: the case of sacubitril/valsartan in heart failure with reduced ejection fraction in Argentina

Approximating the influence of external factors on the technical efficiency score of hospital care: evidence from the federal states of Germany

Should we adjust health expenditure for age structure on health systems efficiency? A worldwide analysis

Robotic-assisted surgery for prostatectomy – does the diffusion of robotic systems contribute to treatment centralization and influence patients’ hospital choice?

Work accident effect on the use of psychotropic drugs: the case of benzodiazepines