1 Introduction

The Dutch hospital system, as in many other European countries, is highly regulated and consists only of private not-for-profit hospitals. For instance, the Dutch government allocates budgets to each hospital based on production contracts but does not take into account the many other forms of regulation, such as labor contracts and capacity constraints, that affect cost. These factors are beyond the control of hospital managers and may not be adequately remunerated. Further, hospitals cannot alter costs by cream skimming since they have to treat all patients presented. Methodological approaches that require assumptions of cost minimization or profit maximization may not be appropriate since hospital managers may not adhere to strict economic criteria.Because of these environmental implications, health policy makers should be aware of these factors so that the allocated hospital budgets are allocated fairly. Rather, by teasing out environmental factors, inefficiency that is under the management’s control can be determined which, should be identified but not reimbursed. We stress that we focus only on the policy implications of budget allocation, not hospital regulation in general, i.e., the use of inputs.

In this paper, we apply Data Envelopment Analysis (DEA) to derive cost efficiency scores and follow up by employing various models that can be used in a second stage of the analysis. Specifically, we apply bootstrapping techniques to identify the effect (if any) of environmental factors on the cost efficiency scores. We apply this approach to a set of Dutch hospital data since we wish to ascertain the equity in the regulated budget allocations used in The Netherlands. Inequity would not be an issue if cost efficiency was not affected by exogenous factors beyond the hospital manager’s control since the government can impose a lower budget allocation which would give management an incentive toward more cost efficient production. However, if social obligations or regulations in associated input markets impact the hospital’s cost efficiency negatively, but are exogenous to management control, then resource allocations neglecting these factors would be punitive.

Assessing the impact of environmental factors on efficiency scores derived using data envelopment analysis (DEA) has garnered attention in the literature. Whereas some have used OLS, Tobit analysis has been the most popular analytical method wherein the output based or the reciprocal of the input based efficiency score is regressed on a variety of variables thought to affect efficiency (see e.g. [4, 6]). Simar and Wilson [11] challenge this approach by demonstrating that bias may arise, since the efficiency score is a point estimate without a probability distribution around it as required by the Tobit methodology or any other parametric regression technique. In other words, Simar and Wilson [11] indicate that the dependent variable is unobserved and must be replaced by an estimate. Using this point estimate in a second stage analysis may cause the well-known errors in variables problem which on its turn causes biased and inconsistent estimates of the parameters of the environmental variables. As an alternative to simply using the efficiency measure as a discrete point, Simar and Wilson [11] advocate the use of bootstrapping techniques in order to obtain unbiased and consistent estimates.

Even though there is an argument for bootstrapping in the second stage analysis, the issue is whether any difference in results (i.e. robustness) leads to better estimates. We answer this proposition by comparing the results on the estimated parameters from the bootstrapping with several other estimation techniques including truncated regression with and without the bootstrap correction and Tobit regression again with and without the bootstrap correction. Using the results from the various specifications, we can demonstrate how the non-bootstrapping methods may lead to an incorrect conclusion regarding efficiency factors affecting the productive performance of hospitals.

The outline of this paper is as follows. In Section 2 we define the DEA model and the bootstrapping procedure. In Section 3 the available data is described. In Section 4 we present the empirical results and concluding the paper in Section 5.

2 Model

In the first step of our analysis, we conduct standard cost efficiency DEA as described by Färe et al. [3]. Since we have information on input prices for our sample of Dutch hospitals, we can use the cost-efficiency model rather than the technical efficiency DEA model that does not require input prices. In this standard cost efficiency DEA-model the cost efficiency of hospital ‘A’ equals the ratio of minimum cost to actual cost. In other words, we gauge the minimum expenditure required to produce service levels given resource prices. The actual cost efficiency measure (CE) is derived by the radial distance between the observed hospital’s resources-services correspondence to the “best practice” frontier. This best practice frontier is constructed by the linear combination of hospitals producing the same levels of services as hospital A but with a lower level of costs. The flexibility of the DEA approach has many benefits: the use of multiple inputs producing multiple outputs to characterize the hospital production frontier, no need to specify strict cost-minimization or profit maximization, however, hospitals characterized as cost efficient is a necessary condition for cost minimization. One oft-mentioned draw back is the lack of an error term, which may account for deviation from the best practice frontier. In order to address this issue, we use two steps to perform our analysis. First, we derive cost efficiency using DEA based on constant returns to scale (CRS) assumption.

The mathematical formulation for DEA-CRS is:

$$ \begin{gathered} CE = \mathop {\min }\limits_{z,x} \frac{{w^A x^A }}{{w^A x}}{\kern 1pt} \quad subject\;{\kern 1pt} {\kern 1pt} to \hfill \\ \sum\limits_j {z^j y^j } \ge y^A \hfill \\ \sum\limits_j {z^j x^j } \le x^A \hfill \\ z_j \ge 0\quad \left( {\forall j} \right) \hfill \\ \end{gathered} $$
(1)
CE :

cost efficiency

w A :

vector of resource prices of hospital A

x A :

vector of resources of hospital A

y A :

vector of services of hospital A

x :

best practice vector of resources

z :

vector of weights

Since we are interested in both the possible impact of environmental factors on cost efficiency as well as determining which second stage approach may contribute information to decision making, we describe our approach.

After solving for the cost efficiency scores for hospitals in our data set, we define an equation by regressing the reciprocal of cost efficiency scores on a set of explanatory variables. (We use the reciprocal of the CE scores because we wish to truncate the distribution of scores at one).We follow the methodology indicated as algorithm 1 in Simar and Wilson [11]. The (explanatory) variables in the equation account for the environmental factors facing each hospital:

$$ \delta = \beta_0 + \sum\limits_k {\beta_k Q_k } + \varepsilon \ge 1 $$
(2)
δ :

reciprocal of cost efficiency

Q k :

k-th environmental feature

β k ‘s:

parameters to be estimated

ε :

error term

We apply the constant returns to scale (CRS) approach, since we are interested in the long-run effects of the correction which also corresponds to the minimum point on the U-shaped long run average cost curve.

Because the cost-efficiency scores measured by the DEA approach are measured non-parametrically, there is no error term associated with the measure which when used as a dependent variable in the second stage analysis could lead to biased and inconsistent estimators. To address this potential econometric issue, we begin by specifying the equation given in (3).

$$ \hat{\delta } = E\left( {\hat{\delta }} \right) + u $$
(3)

with E(u i ) = 0. The bias of the estimator \( \hat{\delta } \) is defined by:

$$ bias\left( {\hat{\delta }} \right) \equiv E\left( {\hat{\delta }} \right) - \delta $$
(4)

Substituting and rearranging terms yields:

$$ \hat{\delta } - bias\left( {\hat{\delta }} \right) - u = \beta_0 + \sum\limits_k {\beta_k Q_k } + \varepsilon \ge 1 $$
(5)

Even though the u’s have a zero mean, the term \( bias\left( {\hat{\delta }} \right) \)does not, this is always strictly negative in finite samples. Although, the u’s are unknown and cannot be estimated, the bias term can be estimated by bootstrap methods (for a detailed discussion of the bootstrapping approach, see Efron and Tibshirani [1] and Simar and Wilson [10]). Since it is to be expected that \( bias\left( {\hat{\delta }} \right) \) is correlated with the environmental variables, using a bootstrap procedure may be more appropriate than a simple multiple regression approaches since a benefit of bootstrapping is that it leads to consistent estimates of β k . For these reasons, we hypothesize that a truncated regression may appear to be more appropriate and should be preferred to the Tobit model. Given that we have discussed the rationale for also using a truncated regression model, we next describe the bootstrapping procedure to be used in this paper.

  1. 1.

    Compute the DEA-scores using (1).

  2. 2.

    Use OLS or Maximum likelihood to obtain estimates \( \hat{\beta }_k \) and \( \hat{\sigma }_{\varepsilon } \) for the regression of the efficiency scores on the environmental variables using (2).

  3. 3.

    Apply the next three steps “L” times to obtain a set of bootstrap estimates to “A” number of hospitals

$$ {\rm A} = \left\{ {\left( {\hat{\beta }^{*}, \hat{\sigma }_{\varepsilon }^{*} } \right)_b } \right\}_{b = 1}^L . $$
(6)
  1. 3.1

    For each draw i = 1,..,n draw ε i from the \( N\left( {0,\hat{\sigma }_{\varepsilon } } \right) \) distribution with left truncation at \( c = 1 - \hat{\beta }_0 - \sum\limits_k {\hat{\beta }_k } z_k \). It thus requires draws from a normal distribution N(0,σ 2) with left truncation at c. We generate the constant c’=c/ σ and define v’=Φ(c’)+[1- Φ(c’)]v. Then the left-truncated normal deviate equals u= σ Φ-1(v’).

  2. 3.2

    Again for each i = 1,.., n compute

$$ \delta_i^{*} = \hat{\beta }_0 + \sum\limits_k {\hat{\beta }_k } z_k + \varepsilon_i . $$
(7)
  1. 3.3

    Use the regression method to estimate the regression of \( \delta_i^{*} \) on the z k ’s yielding estimates \( \left( {\hat{\beta }^{*}, \hat{\sigma }_{\varepsilon }^{*} } \right) \).

  2. 4.

    Use the bootstrap values as described by Simar and Wilson [11] and the original parameter estimates to construct estimated confidence intervals for each element as follows. If the distribution of \( \left( {\hat{\beta }_j - \beta_j } \right) \) were known, the confidence interval follows from finding values a α and b α such that:

$$ \Pr [ - b_{\alpha } \le \left( {\hat{\beta }_j - \beta_j } \right) \le - a_{\alpha } ] = 1 - \alpha $$
(8)

for small values of α>0 [11]. However, the distribution is unknown and therefore we use the j-th element of each bootstrap value instead to find values \( a_{\alpha }^{*} \)and \( b_{\alpha }^{*} \)such that:

$$ \Pr [ - b_{\alpha }^{*} \le \left( {\hat{\beta }_j^{*} - \beta_j } \right) \le - a_{\alpha }^{*} ] \approx 1 - \alpha $$
(9)

Finding \( a_{\alpha }^{*} \) and \( b_{\alpha }^{*} \) involves sorting the values \( \left( {\hat{\beta }_j^{*} - \beta_j } \right) \)in increasing order and then deleting \( \left( {\frac{\alpha }{2} \times 100} \right) \)percent of the elements at either end of the sorted list.Footnote 1 After the sorted list is determined we set \( - a_{\alpha }^{*} \) and \( - b_{\alpha }^{*} \) equal to the endpoints of the truncated, sorted array. The estimated \( \left( {1 - \alpha } \right) \)percent confidence interval is then given by:

$$ \left[ {\hat{\beta }_j + a_{\alpha }^{*}, \;\hat{\beta }_j + b_{\alpha }^{*} } \right] $$
(10)

It should be noted that this application is based on the first algorithm described in Simar and Wilson [11]. In their paper they also bootstrap the DEA-scores by recalculating them after correcting for environmental factors. In that approach (algorithm 2), they also construct confidence intervals for the DEA-scores. It should also be noted that the bootstrapping of DEA scores (algorithm 2) was empirically used on a sample of Ukrainian hospitals [9]. Since our primarily goal is to get improved estimates of the effects of environmental factors on cost efficiency, we opt to apply to algorithm 1. Since policy implications are involved, it behooves us to demonstrate the robustness of the findings. The relevance of robustness in policy choice is because inconsistent results may lead to the wrong decision. Further, we expand on this earlier approach by illustrating the differences using several different second stage methodologies.

3 Data

3.1 General

In this study, we use hospital data for the year 2000. These data were obtained from the Ministry of Health, Welfare and Sport which were collected by the Institute for Health Care Management and are derived using numerous surveys, such as financial, patient and personnel surveys. For the purposes of this study, observations on hospitals with missing or unreliable data and academic/top clinical hospitals were excluded from the dataset. In most cases hospitals with missing and unreliable data were recently involved in a merger process (7 hospitals). Academic (7) and top clinical hospitals (13) have a very different cost structure due to their teaching and research activities such that comparing them to general hospitals is precarious. Therefore, due to a lack of comparability, we omit the academic and top clinical hospitals (20), what remains are a sample of 76. After eliminating these observations in the dataset 69 (out of 76 hospitals) observations remained.

3.2 Production

Since the main objective of hospitals is patient care, we define the services of hospitals as the number of first time visits (i.e. number of patients treated by physicians without an admission) and the number of discharges. Discharges have been separated into medical specialties in order to capture case-mix differences. The dataset distinguishes over 30 specialties, so for computational ease, we aggregated these medical specialties into four categories on the basis of average length of stay (LOS) and whether or not patients had surgery. We selected 4 days as the censor because there are two peaks in the distribution of LOS: 1–2 days specialties with a majority of rather simple surgery and 8–10 days (more complex surgery).

3.3 Resources

Resources include staff and administrative personnel, nursing personnel, paramedical personnel (such as lab technicians), other personnel (including maintenance, security and cleaning), and material supplies.Footnote 2 Material supplies include such aspects as medical supplies, food and heating. Personnel and material supplies are treated as variable resources since the hospital can change these in the short run.

There are data on the costs and the quantity for each resource personnel category. For each region, wages are defined as the average salary cost (including benefits) per full time equivalent and are treated as the market prices for labor. We partition by region, since regional wage differentials exist in The Netherlands. Qualitative differences reflecting varying levels of experience and skill of the personnel among hospitals are included in the amount of labor resources.

Since there is no natural unit of measurement for material supplies we used the price of material supplies defined as a weighted index based on components of the consumer index calculated for the Netherlands by Statistics Netherlands with the weights derived from cost shares.

Descriptive statistics of the variables are given in Table 1.

Table 1 Descriptive statistics, Dutch General Hospitals, 2000

3.4 Environmental characteristics

One of the environmental factors we consider is the role of part time personnel. A simple, linear conversion to full time equivalency (FTE) status may not be sufficient to account for the productivity of part time labor, which can have varying effects on efficiency. One assumption is that part time labor may increase capital input per FTE, but also increase the overhead per FTE (for instance human resources management staff (HRM)). In either case, it is possible that cost efficiency declines. On the other hand, part time personnel may increase cost efficiency by the “loss” of the less productive hours. Since there is a shortage of nursing personnel, nurses have leverage when it comes to negotiating wages and other labor conditions, which again may affect cost efficiency. Part time factor is measured by the ratio of the number of FTEs and the number of personnel.

The same holds for the second environmental factor, seniority of personnel. Seniority of personnel is defined as labor cost per FTE controlled for regional differences in prices. Differences in this ratio reflect differences in salary scales. Wages increases with experience, according to wage scales. It is questionable whether these wage differentials are compensated by productivity gains of more skilled labor or simply because the work force is older. Earlier research on Dutch general hospitals shows that the presence of relatively large share of skilled labor causes allocative inefficiencies [2]. Because of regional labor market shortages coupled with the strong legislative protection against dismissals in The Netherlands, the number of senior personnel is not under control of the management.

The third environmental factor applies to the input of capital services. Capital input is strongly regulated in terms of the number of beds, the number of operating theatres, x-ray rooms and delivery rooms per hospital. A misallocation, by the government, among these capital resources may also adversely affect cost efficiency. Therefore we use the ratio between practice rooms (the sum of operating rooms, X-Ray examination rooms, delivery rooms et cetera) to beds as a proxy for the allocation of various capital inputs. We use this ratio to gauge for the possibility of bottlenecks in the hospital which may lead to a decrease in the amount of patient care produced. For example, if the practice rooms are operating at full capacity, the treatment of patients may have to be temporarily suspended until capacity is once again available. Since generally the hospital industry is characterized by an overcapacity of beds and a under capacity of practice rooms and physicians it is to be expected that an increase in practice rooms may lead to higher efficiencies.

Since physicians are mostly self employed in the Netherlands they are not included as a direct resource of hospitals. Physician practices may influence the production process in which the hospital operates substantially. This inefficiency may arise since physicians may substitute hospital resources for their time, thereby increasing cost-inefficiency [8]. The issue raised by Pauly [8] is particularly relevant for The Dutch hospital system since the entry of physicians is regulated by the government and self-employed physicians have long life-time contracts with a hospital. Because of this relationship among physicians, the governmental regulators, and the hospitals we consider the number of physicians (in full time equivalents—FTE’S) per admission as an environmental factor (physicians’ intensity). An increase in the number of physicians per admission results in more procedures and therefore increased cost and lower cost efficiencies.

4 Empirical results

Recall, that as a first step, we measure the cost efficiency score via DEA (under CRS). We find that, on average, the cost efficiency measure for general hospitals is 87%, ranging between a low efficiency score of 62% and the cost efficient score of 100%. These outcomes are very common. Ozcan [7] summarizes the efficiency scores of a number of hospital studies. Most of these studies report scores near 90%, depending on the DEA-variant chosen, the distinct services and resources and sample. This measure provides an overview of the general cost efficiency in the Dutch hospital sample. The variability of performance, however, may be due to environmental factors beyond the manager’s control. Hence, in accordance with the theoretical section, we conduct several estimation techniques:

  • OLS regression without bootstrapping;

  • truncated regression without bootstrapping;

  • truncated regression with bootstrapping;

  • Tobit regression without bootstrapping;

  • Tobit regression with bootstrapping.

Since the efficiency measure ranges from between 0 and 1, we invert the dependent variable so that a negative sign on the independent variable has a positive effect on efficiency. Tables 2, 3 and 4 present the estimates of the various regression analyses.

Table 2 OLS-estimates and diagnostics
Table 3 Truncated estimates and bootstrap truncated-estimates (n = 2000)
Table 4 Tobit estimates and bootstrap Tobit-estimates (n = 2000)

In order to evaluate whether the model specification makes any sense we conduct several statistical diagnostic checks on the data and the OLS-outcomes. We check for:

  1. 1.

    Multicollinearity of the independent variables by calculating the variance inflation factor for each independent variable;

  2. 2.

    the explanatory power of the model by presenting R2;

  3. 3.

    All coefficients equals zero by means of an F-test;

  4. 4.

    Normality distribution of residuals by Jarque-Bera test;

  5. 5.

    Influential observations by inspecting the diagonal of the hat matrix and identifying observations which exceed 3 times the ratio of the number of estimated parameters and the number of observations. If we identified influential observations we additionally conduct an augmented regression analysis including dummy variables for these influential observations.

Table 2 shows that the variation inflation factors are all very small implying that the model hardly suffers from multicollinearity problems. The explained variance by the independent variables equals 0.43. The F-test on all coefficients equals zero is overwhelmingly rejected. According to the Jarque-Bera test, the hypothesis that the residuals follow a normal distribution cannot be rejected. The diagonal of the hat matrix only reveals one influential observation. However, excluding this observation from the data hardly affects the estimates. If we use a threshold of two times the ratio of the number of estimated parameters and the number of observations, seven observations are identified as potentially influential. However, further inspection show that none of these observations can be regarded as really influential. Using an augmented regression analysis with seven dummies for the influential observations, we find that none of the corresponding coefficients is significant at the 5%-level.

From our modeling, we find that the estimated parameters between the single (non-bootstrapped version) and the bootstrapped version are statistically significantly different from each other. Further, depending on the version of the model, different environmental factors statistically impact the cost efficiency measure. For example, we demonstrate via the results in Tables 2, 3 and 4 that the number of part time personnel significantly contributes to lower cost efficiency in the single estimation models. The composition of capital has a positive effect on cost efficiency; however, a higher ratio between practice rooms to beds implies a higher occupancy rate of beds and a lower average stay of patients, both which could be affecting cost efficiency in a positive way.

For all specifications, physicians’ intensity was statistically significant and positive; the interpretation of this finding is that as intensity increased, cost efficiency decreased. This variable can also be considered the most robust since it appears statistically significant in all of the different regression models. This makes intuitive sense since the physicians act as their patients’ agent and demand services from the hospital for the patients’ medical care needs.

To reiterate, in all the above cases, the single estimate parameters (constants excluded) and their corresponding standard deviations were in, absolute value, greater than the parameter estimates using the bootstrap approach.

Based on these estimates we are able to recalculate the cost efficiency scores. These corrected cost efficiency scores are summarized in Table 5.

Table 5 Descriptive statistics: results mean efficiency scores

For ease of comparison with the original DEA efficiency scores, we re-ran the specifications using the interval scores of cost efficiency (between 0.00 and 1.00). Except in the case of the non-bootstrapped Tobit, each of the estimated efficiency scores is smaller than the simple CE measure of 0.871. This finding suggests that environmental factors my lead to lower levels of cost efficiency which may not be recognized in budgetary allocations. Even though the magnitude may not seem overwhelming (0.841–0.871) the environmental factors included in our analysis could lead to an underpayment of € 3,170,000 (average cost * difference in CE measures). However, this underpayment should be taken with caution, but the fact that in three of the four approaches we used, the estimation is smaller than the CE measure that does not include environmental factors.

5 Conclusions

In this paper we describes the cost efficiency of Dutch hospitals using the method of Data Envelopment Analysis (DEA), a linear programming procedure for determining a frontier or best practice of resource usage and service delivery. We use the DEA measure of cost-efficiency on the hospital level. Our focus then turns to explaining variations in cost inefficiency which is outside managerial control due to a hospital’s operating environment.

The most popular way to conduct such analysis has been a Tobit analysis wherein the efficiency score is regressed on a variety of variables thought to affect efficiency. However, in this second stage, DEA-scores are derived relative to a best practice frontier which does not have an associated error term. Without this error term, it may be surmised that bias may arise leading to measurement error in the dependent variable problem i.e., biased and inconsistent estimates. Simar and Wilson [11] suggest using a bootstrapping in order to obtain consistent estimates of the effects of factors that are beyond managerial control.

Following this suggestion, we proceed to an analysis of a sample of Dutch hospitals using several steps. In the first stage, DEA results indicate, that on average, cost efficiency for general hospitals is 87% ranging between 62% and 100%. The second stage shows that the cost efficiency scores are affected by the operating environment—mostly by the physicians’ intensity variable that contributes significantly to cost inefficiency. There are several theoretical underpinnings why physician intensity may increase cost inefficiency. First, there is the suggestion, alluded to above, made by Pauly [8] that physicians seek to substitute hospital inputs for their own time. If more physicians practice in a given hospital, and they all behave similarly, then hospital inputs would necessarily be increased. Alternatively, there is the view that physicians are the de facto demanders of services for their patients. If physicians seek the best care for their patients, they may all demand more services than necessary. A third theoretical consideration is that physicians may try to maximize their income by increasing the number of procedures [5]. In the Dutch context physicians are partly reimbursed by the number of procedures. We could make a sounder argument if we had quality outcome variables. This may at least allow us to control for the portion of cost efficiency that relates to higher quality, including not only hospital inputs, but physician intensity as well.

In our example of applying DEA to a sample of Dutch hospitals, we are provided with valuable information regarding hospital performance, specifically how deviations from the best practice frontier may lead to a non Pareto-optimal state due to cost inefficiency, i.e., a waste of resources. However, questions arose on how best to measure whether there are systematic affects that may preclude the efficient operation of a hospital. We found that neglecting the impact of physicians on hospital performance may lead to lower reimbursements. If the hospitals are reimbursed at lower levels not due to inefficient production but environmental factors, this may lead to a reduction of services due to budgetary constraints.

This is another reason as to why assessing environmental conditions may ‘explain’ areas of operation in which managers have little control. The other most robust variable is the seniority of personnel since in none of the specifications, either single or bootstrapped, were the estimates statistically significant. However, part-time personnel contributes significantly to cost efficiency, but in four of the five specifications. Due to the importance of robustness in the policy area, the role of part time staff should be further analyzed, especially if the costs to the hospital are significant.

Since all the estimated effects are lower, the consequences for policy recommendations may vary. For example, under the simple Tobit estimate example, the impact of increasing capital would predict a greater impact on cost efficiency than in the estimates using the bootstrap approach. Therefore, relying on the simple model may result in over-capitalization and a social cost from the resulting inefficiency.

If policy makers are interested in allocating the budget for hospitals, then the corrected bootstrap should be employed since differences in the derived CE measure via the DEA and the more robust measure estimated by either the truncated or Tobit equation, on average, were 3%.

We also note that, these findings demonstrate how, even, after controlling for environmental factors, managerial inefficiency varies among hospitals. Although this 3% amount seems inconsequential, when compared to possible financial implications for hospitals, this possible error may result in over- or underpaying hospitals by millions of Euros. As more specific patient level data become available in Dutch hospitals (as well as hospitals world-side) more detail can be added to these models, particularly the impact of quality of care, regulations regarding quality and costs, among other health care related policies.