Study area
Burundi is located in East-central Africa, between 2°20 and 4°27 of latitude south and between 28°50 and 30°53 of longitude east; the altitude varies between 775 metres (Lake Tanganyika) and 2,670 metres (Crest Congo - Nil). Burundi has in general a tropical highland climate with a significant daily temperature variation in many areas [
12]. Temperature also varies significantly from one region to another mainly due to differences in altitude. The area in the central plateau is cool, with temperature averaging 20°C. The area near Lake Tanganyika is warmer, averaging 23°C; the areas in the highest mountains are cooler with temperature averaging 16°C. Rain is irregular and falls most heavily in the northwest region [
12]. Dry season varies in length with sometimes longer periods of drought. Most parts of Burundi receive rainfall between 130 cm and 160 cm per year [
12]. Bounded on the north by Rwanda, in south-east by Tanzania and in west by the Democratic Republic of Congo, Burundi covers an area of 27,834 km
2 (of which 2,634 km
2 are occupied by Tanganyika Lake) and has a population estimated at about 8 million. In terms of habitat, it remains essentially rural, with 91.6% of the population living in rural area. The urban population is 8.4% with an annual growth rate of 5.7%. The Burundi population is young: 46.1% are under 15 years of age, while people aged 60 and above represent only 5.4%. With an average density of 266 inhabitants per km
2, a population growth rate of 3.44% and a total fertility rate of 6 children per woman, Burundi is one of Africa's most densely populated countries [
13]. Burundi is structured in 17 provinces. The epidemiological profile can be summarized as follows. The health system suffers from a shortage of qualified personnel with 1 doctor per 34,750 inhabitants and 1 nurse for 3,500 inhabitants [
13]. 17.4% of patients do not have access to health care, while 81.5% of patients are forced to go into debt or sell property to pay the health costs. There is a big disparity between the capital Bujumbura and the remainder of the country as 80% of doctors and more than 50% of nurses are engaged in Bujumbura. Responsible for more than 50% of hospital deaths in children under five years of age and more than 40% of all consultations in health centres, malaria is undoubtedly the main public health problem, the main cause of mortality and morbidity in Burundi [
13].
Data description
The goal in our study is to understand the dependence of malaria cases on factors such as climatic variables and spatial (correlated and uncorrelated) effects in Burundi. Monthly data on malaria morbidity in Burundi over 12 years (from 1996 to 2007) were collected from EPISTAT (Epidemiology and Statistics in Burundi) [
14], a department of the Burundi Ministry of health in charge of collecting and storing data on epidemiology all over the country. The well-known nearest neighbour method was used to fill the missing data (~5%). The estimated population for each province, for the study period, was obtained from the Institute of Statistics and Economic Studies in Burundi (ISTEEBU)[[
15] Malaria incidence in a given province was computed by dividing the number of malaria cases by the total population of the province, assuming that the whole population is susceptible. Monthly data on cumulative precipitation, monthly average of daily maximum temperature, minimum temperature, maximum humidity and minimum humidity for 1996-2007 was obtained from the Geographic Institute of Burundi (IGEBU) [
16]. The record of these variables from 1996 through 2007 has remained uniform, with the same calibration and the same precision. The missing data (2% - 3%) were filled by the same method as in Malaria data (nearest neighbour and cross-validation). Data for three provinces (Bubanza, Bujumbura rural and Cibitoke) were not available for the study period; they were estimated using ordinary kriging [
17]. The data are available on different scales and units (malaria incidence and humidity are unit free, rainfall is measured in centimetre (cm), temperature in degree centigrade (°C)). They were then standardized to avoid the effect of scale in the modelling.
In a previous study [
18], assuming that climatic covariates have a nonlinear effect on malaria incidence and based on the Akaike information criterion (AIC) using the algorithm described in [
23], the following generalized additive mixed model (GAMM) [
24] was proposed to assess the dependence of malaria cases on climatic variables.
Here η
it
is the predictor of malaria incidence assumed to have a gamma distribution, R
nit
is the rainfall, H
xit
is the maximum humidity, T
xit
is the maximum temperature and T
nit
is the minimum temperature, of the province i in month t. T
xp
,T
np
,H
xp
are the same variables for the previous month. f1, ···, f4 are unknown nonlinear smooth functions of the covariates. The α
i
(i = 1,···, 3) are the regression coefficient of the linear effects. α0 is the intercept (accounting for unmeasured covariates).ε
it
is the error.
The aim here was to assess the climatic factors that are highly associated with monthly malaria incidence in Burundi; hence spatial effect was not included in the model. The results have shown that malaria incidence in a given month is positively associated with the minimum temperature in the previous month. In this study, the GAMM in (1) is replaced by a geo-additive model by incorporating the spatial effects as follows [
25‐
32].
Here, as above,
f1,···,
f4 are nonlinear smooth functions of the metrical continuous covariates and
f
spat
is the effect of the spatial covariate
p
i
,(
i = 1, ···, 17) representing province
i. The spatial effect
f
spat
is then split up into correlated (structured) and uncorrelated (unstructured or random) effects as follows [
30,
31].
The logic behind this is that a spatial effect is usually a combination of many unobserved influences, some of them obeying a strong spatial structure and others being present only locally [
26‐
31,
33]. Eq. (
2) is then written as
This geo-additive model assumes that the nonlinear effects f1,···, f4 are the same for all provinces.
Prior assumptions and inference
For Bayesian inference, the unknown functions
f1,....,
f4 in predictor (4), the vector of the linear effects parameter
α = (
α0,
α1,
α2,
α3), are considered as random variables and are supplemented by prior assumptions. In the absence of any prior knowledge, diffuse priors are the appropriate choice for fixed effects parameters, i.e.
p(
α
i
) ∝
const[
32,
34,
35]. Another common choice are highly dispersed Gaussian priors [
31].
For the continuous (smooth) functions
f1,....,
f4 , a second order random walk prior is considered for
f defined as follows. Consider the case of a metrical covariate
x with equally spaced observations
x
i
,
i = 1, ···,
m ,
m ≤
n (
n is the number of observations). Suppose that
x(1) < ··· <
x(t)< ··· <
x(m)is an ordered sequence of distinct values for a covariate and define
f(
t) =
f(
x(t)). The second order random walk is then defined by
with Gaussian errors
u(
t) ~
N(0,
τ2) and diffuse priors
f(1) ∝
C
st
and
f(2) ∝
C
st
, for initial values. A second order random walk penalizes deviations from the linear trend 2
f(
t-1)-
f(
t-2) [
33,
36,
37]. For the spatially correlated effect
f
str
, Markov random field prior is chosen [
32,
38]. This prior indicates spatial neighborhood relationship. For geographical data, a common assumption is that two sites or regions
r1 and
r2 are neighbors if they have a common boundary [
25‐
32]. Thus, a spatial extension of the random walk model leads to the following conditional spatially autoregressive specification [
25‐
32]
Here
N
s
is the number of adjacent provinces and
p' ∈
p denotes that province
p' is a neighbour of province
p. The prior is called a Markov random field (MRF) [
31,
32,
38]. We define provinces as neighbours if they share the same boundary and assume that the effect of a province
p is conditionally Gaussian with expectation equals to the mean of the effects of neighbouring provinces and a variance that is inversely proportional to the number of its neighbours
N
s
[
26,
31]. The conditional mean of
f
str
(
p) is an unweighted average of function evaluations of neighbouring provinces. For the spatially uncorrelatated (unstructured) effect,
f
unstr
are assumed to be i.i.d. Gaussian (this is a common assumptions) [
26‐
31]:
The variance parameters
control the trade-off between flexibility and smoothness [
36,
37]. They are also considered as unknown and estimated simultaneously with corresponding unknown functions
f
j
. Weakly informative inverse Gamma hyperprior
are assigned to
. The corresponding probability density function is given by [
39].
Using proper priors for
(
a
j
> 0 and
b
j
> 0) ensures propriety of the joint posterior [
39].
Bayesian inference is based on the posterior of the model and is carried out using MCMC simulation techniques. For the predictor (4), let
γ denotes the vector of all unknown parameters in the model. Then, under conditional independence assumptions, the posterior of the model is given by [
26‐
31].
The full conditionals for the parameter vectors
f
j
,
j = 1, ···. 4 as well as the full conditionals for
f
str
,
f
unstr
are multivariate Gaussian. The MCMC simulation is used for successive draw of
from the full conditionals [
26‐
31]. The model is implemented in BayesX, a public domain software for Bayesian inference in structured Additive Regression Models [
40]. Only the main effects are modelled. The effects of two-factor interactions are assumed to be smaller and are omitted. The main reason is that we wish to preserve the simplicity and easy interpretation of the effects, which are often lost by including interactions [
24]. The effects of the continuous covariates are modelled by cubic p-splines [
41,
42] with 20 equidistant knots and second order random walk penalty [
36,
43]. Positive hyperparameters
a = 0.0001 and
b = 0.0005 have been chosen for
τ2 to ensure the propriety of the posterior [
39]. 12,000 iterations of the MCMC were run with a burn-in phase of 2,000 iterations. Thinning was applied to the Markov Chain to reduce autocorrelations, by requiring the programme to store only every 10
th sampled parameter. Single block updating scheme is adopted, with inverse weighted least square (IWLS) proposal [
35,
37]. Sensitivity of the results with respect to changes in the hyperparameters
a and
b was checked. The model was then re-estimated with different choices for the hyperparameters
a and
b for each effect in the model by (
a = 1,
b = 0.005); (1 = 0.001,
b = 0.001); (
a = 0.001,
b = 0.005); (
a = 0.001,
b = 0.005) (
a = 0.0001,
b = 0.0001); (
a = 0.001,
b = 0.0005) to assess the dependence of results on minor changes in the model assumptions. The results showed any significant change.