BAYESIAN GEO-ADDITIVE REGRESSION MODELS
Spatial analyses of FGM often are confined to using region-specific dummy variables to capture the spatial dimension. Here, we go a step further by exploring regional patterns of FGM/C and, possibly nonlinear, effects of other factors within a simultaneous, coherent regression framework using a geo-additive semi-parametric mixed model. Because the predictor contains usual linear terms, nonlinear effects of metrical covariates and geographic effects in additive form, such models are also called geo-additive models. Kammann and Wand [
27] proposed this type of model within an empirical Bayesian approach. Here, we apply a fully Bayesian approach as suggested in Fahrmeir and Lang [
19], which is based on Markov priors and uses MCMC techniques for inference and model checking.
Classical linear regression models of the form.
$$ {y}_i\kern0.5em =\kern0.5em {w_i}^{\hbox{'}}\gamma \kern0.5em +\kern0.5em {\varepsilon}_i,\kern2em {\varepsilon}_i\kern0.5em \sim \kern0.5em N\;\left(0,{\sigma}^2\right)\kern0.5em , $$
(1)
for observations (
yi,
wi) ,
i = 1, . … ,
n, on a response variable
y and a vector
w of covariates assume that the mean
E (
yi |
wi) can be modeled through a
linear predictor wi'γ. In our application to FGM/C and in many other regression situations, we are facing the following problems: First, for the
continuous covariates in the data set, the assumption of a strictly linear effect on the response
y may not be appropriate. In our study, such covariate is the respondent’s age. Generally, it will be difficult to model the possibly nonlinear effect of such covariates through a parametric functional form, which has to be
linear in the parameters, prior to any data analysis.
Second, in addition to usual covariates, geographical small-area information was given in form of a location variable
s, indicating the region, department or community where individuals or units in the sample size live or come from. In our study, this geographical information is given by the regions in Senegal. Attempts to include such small-area information using region/department-specific dummy-variables would in our case entail more than 30 dummy-variables for the departments and 9 dummies for the regions and using this approach we would not assess spatial inter-dependence. The latter problem cannot also be resolved through conventional multilevel modeling using uncorrelated random effects [
15]. It is reasonable to assume that areas close to each other are more similar than areas far apart, so that spatially correlated random effects are required.
To overcome these difficulties, we replace the strictly linear predictor,
ηi =
x'β +
wi'γ +
εiwith a logit link function with dynamic and spatial effects, Pr(y
i = 1/
ηι) =
eηι/(1 +
eηι), and a geo-additive semi-parametric predictor
μi =
h(
ηι)
:$$ {\eta}_{\iota }={f}_1\left({x}_{i1}\right)+\dots +{f}_p\left({x}_{ip}\right)+{f}_{spat}\left({s}_i\right)+w{'}_i\gamma +{\varepsilon}_i $$
(2)
where
h(.) is a known response function with a logit link function,
f1, …,
fp are non-linear smoothed effects of the metrical covariates (respondent’s age), and
fspat (si) is the effect of the spatial covariate
si ∈ {1,...,S} labelling the region in Senegal. Covariates in
w’i are usual categorical variables such as gender and urban-rural residence. Regression models with predictors as in (
2) are sometimes referred to as geo-additive models. The observation model (
2) may be extended by including interaction
f(x)w between a continuous covariate
x and a binary component of
w, say, leading to so called varying coefficient models, or by adding a nonlinear interaction
f1,2 (x1, x2) of two continuous covariates.
In a further step, we may split up the spatial effect fspat into a spatially correlated (structured) and an uncorrelated (unstructured) effect
$$ {f}_{spat}\left({s}_i\right)={f}_{str}\left({s}_i\right)+{f}_{untsr}\left({s}_i\right) $$
A rationale is that a spatial effect is usually a surrogate of many unobserved influences, some of them may obey a strong spatial structure and others may be present only locally. By estimating a structured and an unstructured effect, we aim at separating between the two kinds of factors.
As a side effect, we are able to assess to some extent the amount of spatial dependency in the data by observing which one of the two effects is larger. If the unstructured effect exceeds the structured effect, the spatial dependency is smaller and vice versa. It should be noted that all functions are centred about zero for identification purpose, thus fixed effects parameters automatically include an intercept term γ0.
In a Bayesian approach unknown functions fj and parameters γ as well as the variance parameter σ2 are considered as random variables and have to be supplemented with appropriate prior assumptions. In the absence of any prior knowledge we assume independent diffuse priors γj ∝ const, j = 1,...,r for the parameters of fixed effects. Another common choice is highly dispersed Gaussian priors.
Several alternatives are available as smoothness priors for the unknown functions
fj (xj), see Fahrmeir and Lang [
19], Fahrmeir, Kneib and Lang (2004) [
28]. We use Bayesian P(enalized) – Splines, introduced by Eilers and Marx [
29] in a frequentist setting. It is assumed that an unknown smooth function
fj (xj) can be approximated by a polynomial spline of low degree. The usual choices are cubic splines, which are twice continuously differentiable piecewise cubic polynomials defined for a grid of k equally spaced knot
p on the relevant interval [
a,
b] of the x-axis. Such a spline can be written in terms of a linear combination B-spline basis functions
Bm(
x), i.e.
$$ f(x)=\sum \limits_{m=1}^l{\beta}_m{B}_m(x) $$
(3)
These basis functions have finite support on four neighbouring intervals of the grid, and are zero elsewhere. A comparably small number of knots (usually between 10 and 40) is chosen to ensure enough flexibility in combination with a roughness penalty based on second order difference of adjacent B-spline coefficients to guarantee sufficient smoothness of the fitted curves. In our Bayesian approach this corresponds to second order random walks.
$$ {\beta}_m=2{\beta}_{m-1}-{\beta}_{m-2}+{u}_m $$
(4)
with Gaussian errors
um~
N(0,
τ2). The variance parameter
τ2 controls the amount of smoothness, and is also estimated from the data. More details on Bayesian P-Splines can be found in Lang and Brezger [
30]. Note that random walks are the special case of B-Splines of degree zero.
We now turn our attention to the spatial effects
fstr and
funstr. For the spatially correlated effect
fstr (s), s = 1, … S, we choose Markov random field priors common in spatial statistics [
31]. These priors reflect spatial neighbourhood relationships. For geographical data one usually assumes that two sites or regions s and r are neighbours if they share a common boundary. Then a spatial extension of random walk models leads to the conditional, spatially autoregressive specification.
$$ {f}_{str}(s)\mid {f}_{str}(r),r\ne s\sim N\left(\sum \limits_{r\in {\partial}_s}{f}_{str}(r)/{N}_s,{\tau}^2/{N}_s\right) $$
(5)
where
Ns is the number of adjacent regions, and
r ∈ ∂s denotes that region r is a neighbour of region s. Thus the (conditional) mean of
fstr(s) is an average of function evaluations
fstr(s) of neighbouring regions. Again the variance τ
2 controls the degree of smoothness.
For a spatially uncorrelated (unstructured) effect
funstr a common assumption is that the parameters
funstr(s) are i.i.d. zero mean Gaussian such that.
$$ {f}_{unstr}(s)\mid {\tau^2}_{unstr}\kern0.5em \sim N\left(0,{\tau^2}_{unstr}\right) $$
(6)
Variance or smoothness parameters τ2j, j = 1, … , p, str, unstr, are also considered as unknown and estimated simultaneously with corresponding unknown functions fj. Therefore, hyper-priors are assigned to them in a second stage of the hierarchy by highly dispersed inverse gamma distributions p(τ2j)~IG(aj, bj) with known hyper-parameters aj and bj. Standard choices for the hyperparameters are a = 1 and b = 0.005 or a = b = 0,001. Jeffrey’s noninformative prior is closer to the later choice, and since practical experience shows that regression parameters depend on the choice of hyperparameters, we have investigated in our application the sensitivity to this choice.
Since some regions in Senegal do not have many neighbours, we have investigated the sensibility of the choice of Markov Random Field (MRF) prior with other priors supported by BayesX such as Gaussian random field (GRF) priors, but the resulting maps from the two priors did not differ much. Therefore, we considered the MRF prior for the spatial effects. For model choice, we routinely used the Deviance Information Criterion (DIC) developed in Spiegelhalter et al. [
32] as a measure of fit and model complexity. Before commenting on the substantive results, it is important to point out this model had the best fit after evaluation of the fit criteria using Deviance Information Criteria (DIC).
The model assumes that f1 (), f2() and fstr are nonlinear effects and spatial effects were the same in all the country. This was confirmed by prior separate analyses of the non-linear effects in other countries, which were found to be remarkably similar. The analysis was carried out using BayesX version 0.9, software for Bayesian inference based on Markov Chain Monte Carlo techniques.
Quite clearly, the methods used here are able to identify more subtle socioeconomic and spatial influences on FGM than reliance on linear models with regional dummy variables. As such, they are useful for diagnostic purposes to identify the need to find additional variables that can account for this spatial structure. Moreover, even if the causes of spatial structures are not fully explained, one can use this spatial information for campaigns to eliminate the practice of FGM and planning purposes, which is gaining increasing importance in policy circles, which attempt to focus the allocation of public resources to the most at high risk population.
Multivariate Bayesian geo-additive regression models were used to evaluate the significance of the POR determined for the fixed effects and spatial effects between prevalence of FGM/C in Senegal. Each factor was looked at separately in unadjusted models using conventional logistic regression models. Next, fully adjusted multivariate Bayesian geo-additive regressions analyses were performed to look again for a statistically significant correlation between these variables, but this time further controlling for any influence from individual (age), ethnicity, education and religious factors. A P-value of < 0.05 was considered indicative of a statistically significant difference.
Advantage of using Bayesian geo-additive regression models compared to conventional logistic regression models: It is worth mentioning some advantages of our approach over existing ones using, say, logistic models with constant-fixed effects of covariates and fixed (or random) district effects or standard two-level multilevel modelling with unstructured spatial effects. First, the Bayesian model (with spatial element) is able to account for spatial autocorrelation of FGM/C prevalence in Senegal by exploring in unified framework the spatial patterns in the prevalence of FGM/C and possible nonlinear effects of continuous covariates (i.e. age) and the usual fixed-effects covariates (i.e. education, income etc..) within a simultaneous, coherent regression framework using a geo-additive semi-parametric mixed model.
Secondly, with conventional models, it is assumed that the random components at the contextual level (district or region) are mutually independent, even though, in practice, this assumption is not actually implied by these approaches, so correlated random residuals could also be specified (see [
33]). Borgoni and Billardi [
34] pointed out that the independence assumption has an inherent problem of inconsistency: if the location of the event matters, it makes sense to assume that areas close to each other are more similar than areas that are far apart. Also, the SDHS data are based on a random sample of regions. That is, the structured component introduced here allows us to ‘borrow strength’ from neighbours in order to cope with the sample variation of the region effect and obtain estimates for areas that may have inadequate sample sizes or be un-sampled. We tried several models in order to highlight the differences that can be found by adopting this approach in a spatial context and the possible bias involved with the violation of the independent assumption between aggregated spatial areas. Some of these models have a spatial component and a random component that reflect spatial heterogeneity globally and relative homogeneity among neighbouring regions, while some did not. A failure to take into account the posterior uncertainty in the spatial location (district or region) would overestimate the precision of the prediction of FGM/C prevalence in un-sampled locations. The Bayesian paradigm allows the incorporation of prior knowledge to complement the likelihood of the observed data given the model parameters. For our purpose here, we choose Markov Random Field (MRF) prior. The reason behind this choice is that the dependence introduced via the MRF prior ensures that information from neighbouring regions are shared among neighbours (see, for example, [
35]). A major advantage of this is that for sparsely populated regions with neighbours having higher populations, there is a reduced variability of estimates as implied by eq. (
5) (for the structured spatial effects), that is, the larger the number of neighbours the smaller the variance. We also performed a sensitivity analysis of the various priors’ assumptions of the spatial effects and the model with spatial effects (with MRF prior) outperformed the other models.
Furthermore, the choice of Markov chain Monte Carlo (MCMC) as the implementation framework of the Bayesian model is due to the flexibility of MCMC for complex Bayesian hierarchical models. MCMC allows sample-based inference for complicated and conditionally specified models in which it is easier to study the posterior surface than the likelihood surface. Also, MCMC techniques allow inference from joint posterior and marginal posterior distributions of any subsets of parameters of interest and easy assessment of the variability of all posterior estimates. It is also easier to obtain an additional knowledge of the shape of the likelihood or posterior surface, such as the mode, skewness and kurtosis.
Finally, the conditional independence structure of the MRF has a computational advantage in that the full conditional distribution of each
fstr(s) (eq.
5) is easily computed with greatly reduced computational cost due to sparseness. From the foregoing we argue that the Bayesian model implemented in MCMC framework is a powerful natural choice when interest is on simultaneously accounting for spatial dependence, nonlinear effects and fixed-effects.