Setup
The argument that income level—either personal or GDP per capita level—determines the health conditions of individuals and the population is profound in the health economics literature. However, the heterogeneity in health status between individuals or nations, even at the same income levels, requires a more detailed relationship between health conditions and specific expenditures targeted to promote health and care provision. The distinction between public and private expenditures here is important, since the former is likely a policy variable determined by the level of GDPc and the political agenda by the state and local public authorities, and the latter reflects the individual-level resources devoted to health care. Thus, both are endogenous variables in the long run. However, past findings suggest that the exogenous direct and delayed effects of HEPUB on life expectancy and on infant mortality are positive and significant. Taking HEPRIV as an exogenous variable is less warranted, as it is a form of derived demand (i.e. sickness and illness force people with short-run income constraints to put their money on HE).
Next, we propose dynamic panel data models to determine the levels of life expectancy at birth (
LE) and infant mortality rate (
IM), depending on
\( HE_{PRIV} \) and
\( HE_{PUB} \). We prefer the logarithmic forms of the variables in the following dynamic panel fixed effects (
FE) model.
$$ \begin{aligned} lnLE_{it} & = \alpha_{0} + \alpha_{i} + \alpha_{1} lnLE_{i,t - 1} + \alpha_{2} lnHE_{PRIV,it} \\ &\quad +\,\alpha_{3} lnHE_{PUB,it} + \alpha_{4} lnPCR_{it} + \alpha_{5} lnRDE_{it} + \varepsilon_{i,1t} \\ lnIM_{it} & = \beta_{0} + \beta_{i} + \beta_{1} lnIM_{i,t - 1} + \beta_{2} lnHE_{PRIV,it}\\&\quad +\, \beta_{3} lnHE_{PUB,it} + \beta_{4} lnPCR_{it} + \beta_{5} lnFS_{it} + \varepsilon_{i,2t} \\ \end{aligned} $$
(1)
In the first model, life expectancy is determined by private and public expenditures, and by exogenous variables of primary education rate (PCR) and level of R&D expenditures per capita (RDE). In the model for infant mortality, we replace RDE for food supply (FS). We stress that HE are direct means and resources to achieve good health and care among the population in the country, not the income level of the country as such. Thus, in the above models, the level of education, level of technology, and FS per capita refer generally (among many other similar variables) to the country’s development level that sustains life expectancy and lowers infant mortality.
One-period lagged health variables,
LE−1 and
IM−1, in models reflect the dynamics of health status (i.e. past health status determines the current one). Note, however, that both equations can be recursively solved for current and past values of other variables in the models and for the
starting values of life expectancy and infant mortality (i.e.
LEi,0 and
IMi,0). These and other variables’ effects on current values of
LEit and
IMit are determined by the sizes of adjustment parameters
\( (\alpha_{1} \;{\text{and}}\;\beta_{1} ) \). If they are close to but below one, the past variable values can still have large effects on current-level health status (see Eq.
3 below).
In general terms, the model captures more directly the income-driven health part of the bi-directional income–health relationship (Weil
2009, chapter 6). Income per capita, and other indicators of the living standard determine the health status of a country’s population. For example, if primary schooling is missing and the
FS per capita is low, the income level of the country is typically low and the average health status is also low. Evidently, the so-called growth process has not started or it has halted because of missing factors that are important to sustain income generation. Although the needs for health care and medication are most urgent, the resources for them are sparse, even missing, or used elsewhere.
Our main argument is that, at least for poor countries, the resources devoted to public health provision—the \( HE_{PUB} \)—are more important for the population’s health status than the private expenditures. The reason for this stems from the large (income) inequalities prevailing in most poor countries supporting high incomes and HEPRIV only for a small fraction of the population. The large population share of the poor can only get health benefits of public health care that is not exclusive.
Country clusters and groups
In order to analyse effectively the public and private expenditure effects on life expectancy and infant mortality, we used the following country grouping strategy. We need different country clusters and groups to identify how public and private expenditure determines life expectancy and infant mortality that are quite heterogeneous across the sample countries and sample years. The
K-means cluster method identified two clusters of countries with average country-specific growth rates of life expectancy in the sample period 1995–2014 (i.e.
\( \tfrac{1}{T - 1}\sum\nolimits_{t = 2}^{T} {\Delta lnLE_{it} } \)). Table
1 reports the cluster mean values and number of cluster countries. We observe that in cluster 1 the mean growth rate of life expectancy was 3.5 times larger than in cluster 2. Typically, cluster 1 includes some of the poorest countries that have experienced significant health benefits from their care systems started in recent years.
Table 1Clusters in average growth rate of life expectancy
Cluster mean | 0.00872 | 0.00257 |
Number of countries | 42 | 153 |
Next, the
K-means method was also applied to growth rates of infant mortality rates (see Table
2). Due to the heterogeneous growth rates of infant mortality, the method proposed three clusters for average growth rates of infant mortality rates (
\( \tfrac{1}{T - 1}\sum\nolimits_{t = 2}^{T} {\Delta lnIM_{it} } \)). Here cluster 3 comprises countries that belong to both
LE growth clusters, that is, countries whose development process started before the sample period and their rapid progress in health status can also be seen in fast-declining infant mortality rates (e.g. China, Turkey, Brasil). Cluster 1 contains some of the poorest countries but also some developed countries that have already obtained a low level of infant mortality that is not declining anymore. Thus, most cluster countries here belong to life expectancy growth cluster 2. In infant mortality cluster 2, a typical country is a rich country (i.e. European country) with relatively low growth in life expectancy, but also some non-rich countries with a rapidly rising life expectancy (e.g. India).
Table 2Clusters in average growth rate of infant mortality
Cluster mean | − 0.0125 | − 0.0349 | − 0.0615 |
Number of countries | 56 | 100 | 39 |
Finally, we divided the countries into two groups based on their average level of GNIc during the sample period. If a country’s average GNIc level was below 2440 US$ during the sample years, it belonged to group 1 (77 countries, 39.5% of countries); countries with a level higher than 2440 US$ formed group 2 (118 countries, 60.5%). Note that in the sample, the mean income is 10085 US$ and the median is 3298 US$. Thus, 2440 US$ is close to 75% of the global median income in the years 1995–2014. This means that group 1 countries are globally the poorest countries.
Summary statistics
Tables
3,
4,
5 provide detailed summary statistics in different clusters and income groups. In Table
3, clusters based on average life expectancy growth across the sample countries show that between low- and high-growth countries the difference between life expectancy is 11 years. Thus, during the sample period (1995–2014) the high level of life expectancy means less growth in life expectancy than at a lower level of life expectancy. The level of private and public
HE per capita is 7–10 times larger in cluster 2 than in cluster 1.
Table 3Summary statistics for life expectancy growth clusters
CLUSTER 1 |
Mean | 0.0081 | 60.638 | 39.718 | 60.906 |
SE of mean | 0.0012 | 0.286 | 2.174 | 5.076 |
Standard deviation | 0.035 | 8.287 | 63.016 | 147.119 |
CV | 4.402 | 0.137 | 1.586 | 2.415 |
Median | 0.0076 | 60.101 | 17.275 | 12.387 |
Sample size | 798 | 840 | 840 | 840 |
CLUSTER 2 |
Mean | 0.0027 | 71.523 | 271.344 | 636.829 |
SE of mean | 0.0002 | 0.141 | 8.285 | 23.621 |
Standard deviation | 0.0145 | 7.815 | 458.309 | 1119.407 |
CV | 5.297 | 0.109 | 1.689 | 1.757 |
Median | 0.0027 | 73.601 | 98.058 | 173.977 |
Sample size | 2907 | 3060 | 3060 | 3060 |
Table 4Summary statistics for infant mortality clusters
CLUSTER 1 |
Mean | − 0.0146 | 40.583 | 234.764 | 420.675 |
SE of mean | 0.0006 | 0.954 | 18.083 | 26.263 |
Standard deviation | 0.0247 | 31.924 | 605.189 | 882.296 |
CV | − 1.482 | 0.786 | 2.577 | 2.048 |
Median | − 0.0161 | 31.551 | 47.969 | 118.399 |
Sample size | 1064 | 1120 | 1120 | 1120 |
CLUSTER 2 |
Mean | − 0.0341 | 33.754 | 219.487 | 592.067 |
SE of mean | 0.0006 | 0.618 | 7.410 | 25.985 |
Standard deviation | 0.0247 | 30.491 | 331.392 | 1162.122 |
CV | − 0.721 | 0.903 | 1.510 | 1.963 |
Median | − 0.0331 | 22.610 | 57.637 | 83.022 |
Sample size | 1900 | 2000 | 2000 | 2000 |
CLUSTER 3 |
Mean | − 0.0592 | 26.900 | 207.348 | 427.396 |
SE of mean | 0.0011 | 1.108 | 9.205 | 22.888 |
Standard deviation | 0.0287 | 28.449 | 257.096 | 799.483 |
CV | − 0.485 | 1.057 | 1.239 | 1.823 |
Median | − 0.0579 | 15.301 | 116.807 | 160.625 |
Sample size | 741 | 780 | 780 | 780 |
Table 5Summary statistics for GNI per capita level groups
GROUP 1 |
Mean | 0.0052 | 61.846 | − 0.0333 | 58.871 | 27.097 | 29.181 |
SE of mean | 0.0008 | 0.197 | 0.0007 | 0.749 | 0.836 | 1.376 |
Standard deviation | 0.0311 | 7.736 | 0.0277 | 29.412 | 32.827 | 53.021 |
CV | 5.979 | 0.125 | − 0.031 | 0.499 | 1.211 | 1.851 |
Median | 0.0045 | 62.010 | − 0.0315 | 56.851 | 15.741 | 14.050 |
Sample size | 1463 | 1540 | 1463 | 1540 | 1540 | 1540 |
GROUP 2 |
Mean | 0.0030 | 73.964 | − 0.0339 | 18.340 | 348.282 | 828.355 |
SE of mean | 0.0020 | 0.129 | 0.0006 | 0.392 | 10.237 | 24.966 |
Standard deviation | 0.0095 | 6.299 | 0.0300 | 19.033 | 497.315 | 1212.87 |
CV | 3.162 | 0.085 | − 0.887 | 1.037 | 1.428 | 1.464 |
Median | 0.0027 | 75.001 | − 0.0309 | 12.701 | 169.356 | 309.718 |
Sample size | 2242 | 2360 | 22420 | 2360 | 2360 | 2360 |
The distributions of expenditures are skewed towards low values corresponding to GNIc distributions among the global countries. Note also the large standard deviation (and CV) values showing large heterogeneity, especially in ΔlnLE and expenditure observations.
Similar remarks are valid for the infant mortality growth clusters, but now we also observe (see Table
4), that the levels of
HE across the clusters are not as large as in the above life expectancy clusters. Particularly, the level of
HEPRIV does not vary across the
ΔlnIM clusters and levels of
IM. Thus, infant mortality seems to be independent of
HEPRIV. However, the level of
IM clearly determines the speed of its decline (i.e. the lower the level of infant mortality rate is, the larger the decrease is). Note also that
IM distributions are skewed towards low values.
Table
5 provides summary statistics for
GNI per capita level groups. The most interesting result is that the rate of decrease of infant mortality (
ΔlnIM) is almost the same in both
GNIc groups, albeit there is a huge difference between the
levels of infant mortality (
IM). The difference between life expectancy (
LE) is 12 years, but in poor countries the growth rate of life expectancy is almost two times greater than in non-poor countries. However, a huge difference remains between the country groups in levels of
HE.
Generally, these findings with respect to our dynamic panel data models mean that we do not expect much success in infant mortality modelling, as the variable seems to be insensitive to the level of private expenditure. However, the large within heterogeneity in clusters and income groups masked by the above location statistics need a country-level fixed effect (FE) modelling approach that can provide some valuable results across the cluster countries.
Dynamic panel data models
Consider the following dynamic fixed (
FE) or random (
RE) effect model (for more details, see Pesaran
2015, chapters 26–27):
$$ y_{it} = \alpha_{i} + \lambda y_{i,t - 1} + \beta^{{\prime }} x_{it} + \mu_{it} ,\quad i = 1,2,{ \ldots },N\;{\text{and}}\;t = 1,2,{ \ldots },T. $$
(2)
Typically, regressors,
\( \varvec{x}_{it} , \) are assumed to be
strictly exogenous (i.e.
\( E[\mu_{it} |\varvec{x}_{it} ] = 0 \)) for all
i and
t. However, the assumption of strict exogeneity is not valid by construction for lagged dependent variable
\( y_{i,t - 1} \), since even if we assume that
\( E[\mu_{it} \alpha_{i} ] = E[\mu_{it} y_{i0} ] = 0, \) the
FE/RE demeaning term
\( E[\mu_{it} \bar{y}_{i, - 1} ] \ne 0 \) will not vanish for short panels. In the process without regressors
\( \varvec{x}_{it} , \) this will cause bias for the
FE or
RE estimators of
\( \lambda \) with its size depending on the true value of
\( |\lambda | < 1 \) and the length of panels (Nickell
1981; Pesaran
2015, p. 679)
$$ \mathop {plim}\limits_{N \to \infty } (\hat{\lambda }_{FE/RE} - \lambda ) = - \frac{(1 + \lambda )}{T} + O(T^{ - 2} ). $$
The bias is order of 1/T and vanishes when \( T \to \infty . \) For example, when λ is close to 1 (the non-stationary case) and T =20, the bias is close to − 0.1. Note that if regressors \( \varvec{x}_{it} \) are included in the model, the size of bias for λ and β depends on the correlation between \( y_{i,t - 1} \) and \( \varvec{x}_{it} \). If regressors \( \varvec{x}_{it} \) are only weakly exogenous (i.e. allowing for feedbacks from \( \mu_{i,t - 1} \)) or if they are endogenous variables, the FE/RE bias for β is still present, even if no lagged dependent variable is found in the model.
The generic problems of the above dynamic panel model can be seen when we solve for
\( y_{it} \) recursively from the initial values of
\( y_{i0} \)$$ y_{it} = \lambda^{t} y_{i0} + \sum\nolimits_{j = 0}^{t} {\lambda^{j} \beta^{{\prime }} x_{i,t - j} + \frac{{1 - \lambda^{t} }}{1 - \lambda }\alpha_{i} + \sum\nolimits_{j = 0}^{t - 1} {\lambda^{j} \mu_{i,t - j} } } ,\quad t = 1,2,{ \ldots },T $$
(3)
When
\( \lambda \) is close to one, initial values
\( y_{i0} \) and unit-specific effects
\( \alpha_{i} \) have large and permanent effects on the
\( y_{it} \) observations determining the properties of dynamic panel data model estimators. As the process for
\( y_{i,t - 1} \) has a similar presentation, we obtain, abstracting from terms for regressors and errors:
$$ y_{it} - y_{i,t - 1} = (\lambda^{t} - \lambda^{t - 1} )y_{i0} + \lambda^{t - 2} \alpha_{i} . $$
This shows that initial effects, but not necessarily the unit-specific
\( \alpha_{i} \) effects, have a small role in determining the one-period differenced values of
\( y_{it} \). Subsequently, the following difference model has also been popular to eliminate the unit-specific effects on
\( \lambda {\text{ and }}\varvec{\beta} \) estimates:
$$ \Delta y_{it} = \lambda \Delta y_{i,t - 1} +\varvec{\beta}^{{\prime }} \Delta \varvec{x}_{it} + \Delta \mu_{it} . $$
(4)
However, this will not solve the (OLS) estimation problems for the model parameters, since
$$ E[\Delta y_{i,t - 1} \Delta \mu_{it} ] = E[\lambda \Delta \mu_{i,t - 1} \Delta \mu_{it} ] \ne 0. $$
$$ {\text{Because}}\;{\text{of}}\;E[\Delta \mu_{i,t - s} \Delta \mu_{it} ] = \left\{ {\begin{array}{*{20}l} {2\sigma_{u}^{2} ,} \hfill & {{\text{for}}\;s = 0} \hfill \\ { - \sigma_{u}^{2} ,} \hfill & {{\text{for}}\;s = 1} \hfill \\ {0,} \hfill & {{\text{for}}\;s > 1} \hfill \\ \end{array} } \right. $$
we need at least two-period lagged values of
\( y_{i,t - j} {\text{ and }}\Delta y_{i,t - j} \)\( (j \ge 2) \) that do not correlate with
\( \Delta \mu_{it} \) (but correlate with
\( \Delta y_{i,t - 1} \)). We can use them as instruments for
\( \Delta y_{i,t - 1} \) as long as
\( \lambda < 1, \) but as
\( \lambda \to 1 \), we face the weak instrument problem for
\( y_{i,t - 2} \) because
\( E[y_{i,t - 2} ,\Delta y_{i,t - 1} ] \) depends on the size of λ (for more details, see Pesaran
2015, p. 682).
The short panel bias of
FE/RE and the efficiency problem of the
IV approach for the first difference model started the search for
IV/GMM-type estimators, leading to consistent and more efficient estimators like
GMM estimators by Arellano and Bond (
1991), Ahn and Schmidt (
1995), and Blundell and Bond (
1998). These surprisingly popular methods are extremely complex estimators, which are unbiased and efficient only when no residual serial correlation is found, the dynamic lag order of the model is correctly specified, we have strictly exogenous regressors, no correlation is found between explanatory variables and unit-specific effects
\( a_{i} \), errors are homoscedastic, the sample length is small (i.e.
\( T/n \to 0 \) convergence), low autocorrelation is present in endogenous series, and the problem of weak or too many instruments is not present (see e.g. Dang et al.
2015; Gouriéroux et al.
2010; Hahn et al.
2007; Kiviet et al.
2017).
Empirical drawbacks of the
IV/GMM agenda have led to a large group of alternative estimators that have tried in several different ways to correct for 1
/T time series bias. Chudik and Pesaran (
2015) divide this literature into the following broad categories: (i) analytical corrections based on an asymptotic bias formula (Bruno
2005; Bun
2003; Bun and Carree
2005,
2006; Bun and Kiviet
2003; Hahn and Kuersteiner
2002; Hahn and Moon
2006; Kiviet
1995,
1999), (ii) bootstrap and simulation-based bias corrections (Everaert and Ponzi
2007; Phillips and Sul
2003,
2007), and (iii) other methods, including jackknife bias corrections (Dhaene and Jochmans
2012) and the recursive mean adjustment correction procedures (So and Shin
1999). In addition, some methods have been proposed on long differences (Hahn et al.
2007; Han and Phillips
2013; Han et al.
2014), on forward filtering (Keane and Runkle
1992; Keane and Neal
2016; Pesaran
2015, chapter 27.2), and on the transformed likelihood method (Hayakawa and Pesaran
2015; Pesaran
2015, chapter 27.6).
In the following, we take methods that are planned to be robust enough against near unit-root case and avoid strict exogeneity assumption. This means that we use a long difference
IV method,
LDIV, proposed by Hahn et al. (
2007) as well as the Keane–Runkle estimator (
1992), which allows for predetermined variables as instruments.
The
LDIV technique uses long differencing, i.e.
\( \Delta_{k} y_{t} = y_{t} - y_{t - k} \) with
\( k = 2,3,4, \ldots \), instead of first differencing and iterated two-stage least square (
2SLS) in estimating persistent dynamic models with a short time dimension. The
LDIV estimator uses lagged levels of the regressors (including
\( y_{i,t - k - 1} \)) and the residuals as instruments. The setup for the model is (Hahn et al.
2007, pp. 586–587; Huang and Ritter
2009, p. 269):
$$ \Delta_{k} y_{it} = \lambda \Delta_{k} y_{it - 1} +\varvec{\beta}^{{\prime }} \Delta_{k} \varvec{x}_{it} + \Delta_{k} \mu_{it} $$
(5)
where we can use
\( y_{i,t - k - 1} ,\varvec{x}_{i,t - k} \) (if strictly exogenous or predetermined) as instrument variables. After obtaining
2SLS estimates for model 5 we calculate the residuals
$$ y_{i,t - 1} - \hat{\lambda }y_{i,t - 2} - \hat{\varvec{\beta }}^{{\prime }} \varvec{x}_{i,t - 1} , \ldots ,\;{\text{and}}\;y_{i,t - k} - \hat{\lambda }y_{i,t - k - 1} - \hat{\varvec{\beta }}^{{\prime }} \varvec{x}_{i,t - k} . $$
Next we use these as additional instrumental variables with \( y_{i,t - k - 1} ,\varvec{x}_{i,t - k} \) to estimate (5) once again. This is the first iteration. Next new 2SLS estimations are then further iterated via the new results. Typically, less than five iterations are sufficient for convergence.
The Keane and Runkle (
1992) estimator (
KRPRE) uses the idea of forward filtering or decomposition from the time-series literature to improve the efficiency of the estimates when the error contains some form of serial correlation. Under Cholesky transformation, the orthogonality conditions implied by predetermination are maintained (Keane and Neal
2016). In practice, a key feature of the approach is to use only one or two lags of the predetermined variables as instruments rather than all available lags back to the first period like in complex
GMM estimation. Keane and Runkle assume that
\( \varvec{x}_{it} \) are predetermined, in the sense that
\( E[\varvec{x}_{is} \mu_{it} ] = 0 \), for
t ≥
s. This is a natural approach in this context where public and private
HE in current and previous periods drive life expectancy but not necessarily vice versa. However, life expectancy targets or its unobserved determinants in coming periods
t +
i (
i =1,2,…) will affect public and private
HE in the future (i.e.
\( E[\varvec{x}_{ir} \mu_{it} ] \ne 0 \), for
r >
t). Note that in the first difference model
\( \varvec{x}_{it} \) is correlated with
\( \mu_{i,t - 1} \) because
\( \varvec{x}_{it} \) is predetermined but not strictly exogenous. However,
\( y_{i,t - 2} \;{\text{and}}\;\varvec{x}_{i,t - 1} \) are now valid instruments.
In the Keane–Runkle method, the model 2 has a general covariance specification for \( v_{it} = \alpha_{i} + \mu_{it} \). That is,\( E[\varvec{vv}'] = \varvec{I}_{N} \otimes {\varvec{\Sigma}} \), where \( \varvec{v} \) is a stacked \( NT \times 1 \) vector of \( \varvec{v}_{i \cdot } = (v_{i1} ,v_{i2} ,{ \ldots },v_{iT} )^{{\prime }} \) and \( {\varvec{\Sigma}} = E[\varvec{v}_{i \cdot } \varvec{v}_{i \cdot }^{{\prime }} ] \). To implement the KRPRE estimator, we need an estimate for \( {\varvec{\Sigma}} \). It is obtained from consistent preliminary 2SLS/IV estimation of model 2 using the instruments \( \varvec{Z} \) to obtain the 2SLS/IV residuals \( \hat{\varvec{v}}_{i \cdot }^{{}} \) and \( {{\hat{\boldsymbol{\Sigma}}}} = \tfrac{1}{N}\sum\nolimits_{i =
1}^{N} {\hat{\varvec{v}}_{i \cdot }^{{\prime }} } hat{\varvec{v}}_{i
\cdot}. \) Note that a similar two-step procedure can be applied also for difference model 4.