Background
Malaria transmission is highly heterogeneous in endemic areas, with a small fraction of the population suffering a disproportionately large fraction of infections and clinical disease [
1]. Recognition of the fact that a sub-group of individuals suffer more malaria attacks than one would expect is crucial to targeting malaria control efforts for maximum impact [
2], and is also necessary for correct statistical inference [
3,
4]. However, the proportion of individuals in cohort studies who experience a count of zero malaria episodes is often larger than would be expected on the basis of a Poisson or negative binomial distribution, and may form a distinct sub-group, but this is less frequently considered. Zero-inflated versions of Poisson and negative binomial regression models can be used to address such situations [
5], and have been used to analyse data on HIV prevention [
6], sexual health [
7] and cholera [
8]. Use of zero-inflated methods in the study of malaria has focused mainly on spatial applications [
9‐
11] or time series analysis [
12,
13] but these approaches have not been used widely to analyse prospective data from cohort studies.
Zero-inflated regression models are two-part models, comprising binary and count components [
5], which explicitly model the two separate processes that may give rise to a child experiencing a count of zero malaria episodes. In the case of malaria, a child who is exposed to bites from infectious mosquitoes may not experience malaria during a particular study, because, by chance, s/he happens not to become infected or does not become unwell during the time observed. These ‘sampling’ zeroes are estimated by the count section of the model. Alternatively, a child may not experience malaria because they are never exposed to infection so cannot become unwell. These ‘certain’ zeros, estimated by the binary component of the model, are responsible for the excessive number of zero counts observed. Zero-inflated models allow these two distinct processes to be disentangled, and the fraction of the population not at risk to be estimated.
Understanding whether part of the population would remain malaria-free regardless of protective measures may be particularly important for studies of preventive interventions, such as a vaccine, when absence of an episode may be considered a success [
14]. Failure to account for an unexposed fraction can lead to biased estimates of intervention effects. For interventions that may partially protect some individuals and completely protect others, differentiating partial and complete protection may be of particular interest [
15‐
17]. This is possible within the zero-inflated model framework by including covariates in the count or binary sections of the model, respectively. Understanding what factors are associated with remaining malaria-free, particularly in areas of apparently high transmission, may be important in understanding where malaria control efforts should, and should not, be focused.
To explore these issues, data from two cohorts of Ghanaian children followed from early in infancy until two years of age were re-analysed.
Methods
Data
This study used data from a cluster-randomized trial of intermittent preventive treatment (IPTi) undertaken in 2,485 infants followed until two years of age in Navrongo, Ghana (described in detail in [
18] and Additional file
1). Malaria transmission in Navrongo is intense and highly seasonal [
19]. Data from a birth cohort in Kintampo, Ghana [
20], an area of year-round high transmission [
21], were used, restricting the study cohort to children followed up beyond 18 months of age (n = 733). In both studies, clinical malaria was defined as a history of fever within 48 hours (or a recorded temperature ≥37.5°C) plus parasitologically confirmed malaria infection. For this analysis, only passively detected clinical episodes were included. To avoid counting the same episode twice, malaria attacks occurring within seven days of a previous episode were discounted. To avoid making any additional assumption about the duration of post-treatment prophylaxis from the anti-malarials used for treatment, person-time at risk was not adjusted after treatment for a malaria episode.
Statistical methods
All analyses were undertaken in Stata 12 (StataCorp, TX, USA). The count of malaria episodes per child was described first. The Kaplan-Meier method was used to estimate the proportion of children free of malaria; levelling-off of the survival curve was used as a graphical means to assess whether follow-up was sufficient to establish that children who remained malaria-free were unexposed. Several formal tests of sufficiency of follow-up have been proposed, e g, Maller and Zhou [
22] and Shen [
23]. The Maller and Zhou test was used to assess formal evidence of an unexposed fraction in the cohorts (Additional file
2).
Four types of model were then fitted to the data: Poisson, negative binomial (NB), zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB). For each model, a set of covariates were included on the basis of having a plausible association with malaria incidence. For the Navrongo trial, these were sex, intervention group (placebo or IPTi), zone of residence (urban, reference; rocky highland rural; lowland rural; irrigated rural, as defined in [
19]), and season of birth (late wet season (Sep-Nov, reference); early dry season (Dec-Feb); late dry season (Mar-May); early wet season (June-Aug)). For the Kintampo study, the covariates included were as defined in [
20]: sex, socio-economic group based on quintiles of asset scores (least poor as reference), rural (
vs urban) residence, distance of residence from a health centre (≥5 km
vs <5 km), thatched roof (
vs non-thatched), sibling antibody titre (used as a proxy measure of exposure; based on tertiles, with low as reference) and bed net use (based on tertiles; low use as reference). Red blood cell polymorphisms were measured in a sub-group of children studied in Kintampo (Additional file
3).
In each model, person-days at risk were included in the model to account for varying exposure. Robust standard errors were used to account for the cluster-randomized design of the Navrongo trial data. The effect of assuming an inverse Gaussian distribution instead of a Gamma distribution for the heterogeneity was also explored (Additional file
4).
Model fitting
The fitted probability distribution from each model was compared visually to the observed distribution of malaria episodes in each cohort. For the Poisson model, the deviance and Pearson goodness-of-fit tests were used to assess the null hypothesis that data were Poisson. For the NB model, a likelihood ratio test (LRT) that the overdispersion parameter, α = 0 was used formally to assess the evidence against the null hypothesis of a Poisson distribution; for the Navrongo data, this was not possible due to the use of robust standard errors, so the point estimate of α and its confidence interval were inspected.
ZIP and ZINB models were then fitted, including the same set of covariates in the count component of the model as for the Poisson and NB models. The logit component of the zero-inflated models estimates the odds of not experiencing any malaria episodes, i e, remaining malaria-free. For simplicity, only covariates that could plausibly influence whether a child never experienced malaria by two years of age were included in the logit component (for Navrongo, intervention group and zone of residence; for Kintampo, socio-economic group, rural residence, thatched roof, sibling antibody titre category and bed net use).
The Akaike information criterion (AIC) was used to compare all models. For the Kintampo data, the Vuong test was also used to assess evidence for the superiority of the zero-inflated model over its non-zero-inflated equivalent (i e, ZIP
vs Poisson, ZINB
vs NB), and a likelihood ratio test was used to compare the ZINB and ZIP models [
5]. Having identified the most suitable model to analyse the data, the importance of the different risk factors for malaria in the two datasets were then evaluated.
Discussion
Including a zero-inflation component improved the fit of negative binomial models and allowed more meaningful interpretation of the association of malaria with different risk factors. ZINB models have not been used widely in malaria cohort studies, despite the fact that their formulation allows for two well-accepted aspects of malaria epidemiology: overdispersion (a greater degree of variability between individuals than would be expected on the basis of a given statistical model) and zero-inflation (a larger number of children remaining free of malaria than would be expected if all children are genuinely at risk). However, given that these models can be fitted easily in standard statistical packages, this approach could be used more widely to disentangle the different ways that risk factors influence a child’s chances of developing malaria.
In both of the study cohorts, residence in a rural area was a clear risk factor for higher malaria incidence rates, consistent with other studies [
25,
26]. Urban residents were at substantially higher odds of never experiencing malaria. The relatively large fraction of children who did not experience malaria in both cohorts suggests that a considerable proportion of children, predominantly urban residents, are at no malaria risk, despite the fact that these studies took place in areas of Ghana with very high malaria transmission [
19,
21]. This adds to a growing body of evidence that malaria can be focal in areas of high transmission [
26] in addition to areas of lower endemicity [
2,
27]. In Kintampo, higher socio-economic status was associated with lower incidence rates, and there was evidence of decreasing odds of remaining malaria-free with lower SES when this was fitted as a single linear term. Given the well-known links between urban/rural residence and relative wealth, it is likely that these two factors are inter-related.
IPTi reduces the incidence rate of malaria. There was no evidence from these analyses that some children were completely protected, and although the CI was wide, this fits with the rationale of IPTi as periodic chemoprevention that allows infection (and development of immunity) between courses [
28]. Identification and separation of the influence of factors that provide partial and complete protection is of major interest for the analysis of the results of malaria vaccine trials, since this could help understand the mechanism by which a particular vaccine provides protection [
4].
The lower incidence of malaria among children born late in the dry season in Navrongo could be due to protection from maternal immunity and foetal haemoglobin which lasts until around six months of age [
29,
30], a similar length of time as the rainy season. This effect would balance over the course of childhood, but not during the course of a cohort followed up to a fixed age. This idea is supported by the finding that month and season of birth were not associated with malaria incidence in Kintampo, where malaria transmission is perennial (data not shown).
An analogous, analytical approach to zero-inflated models is the use of cure or mixture survival analysis models [
22], which also assume that a proportion of the population is not susceptible to the outcome of interest. Halloran
et al. developed frailty mixing models, limited to survival analysis of first episodes [
16]. This was extended recently by Xu
et al. to multiple episodes [
31]. These approaches have the potential advantage over zero-inflated models that they can allow for event dependence and variation in the hazard with time. In the study of Xu
et al., which also used the Navrongo data, IPTi was found to provide complete protection to some children, as well as the partial protection seen in this study. It may be that the information provided by timing of events gives greater power to identify factors enabling complete protection.
The Akaike information criterion (AIC) is used as a guide to comparing models. The large differences in AIC to the next best fitting model (the negative binomial) provide very strong grounds for preferring the ZINB model [
24]. The advantages of the zero-inflated model were retained when heterogeneity between individuals was modelled as inverse-Gaussian rather than a gamma distribution, suggesting that the excess zeroes cannot be accounted for by simply assuming a different distribution of heterogeneity between individuals (Additional file
4).
Conclusion
Zero-inflated models can help understand the mechanism by which different risk factors influence malaria, either by preventing or allowing exposure, influencing the level of exposure, or both. The protective effect of urban residence on malaria incidence was partly due to decreasing incidence rates in children who were exposed, and partly because living in an urban area prevents some children from being exposed at all. This finding is an elaboration of what would have been found using only a negative binomial regression model, i e, that urban residence decreases malaria incidence. Other studies to investigate malaria incidence, or other diseases with similar biology, could employ these models to better understand how risk factors affect clinical outcomes. Given the known features of malaria epidemiology, the use of zero-inflated models should be considered more widely than they are at present.
These findings are consistent with existing knowledge and emphasize the importance of targeted malaria control. Delivery strategies that reach only easily accessed urban populations will have less impact than if targeted successfully at rural areas. Furthermore, these results show that protecting some urban residents may have no impact at all on the overall malaria burden, because some urban residents are essentially at no risk even if not protected. These results therefore have implications for malaria burden estimates, and underline the importance of delivery strategies that reach the most disadvantaged, and achieve high coverage in rural areas.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
MEC conceived and designed the study, analysed the data, and wrote the first draft of the manuscript; KPA supervised field activities for the Kintampo cohort study and helped analyse the trial data. SOA and DC supervised field activities in the Navrongo study and assisted with interpretation of the trial data. BMG contributed to study design and writing of the draft manuscript. PJM contributed to study design, analysis of the data and writing of the draft manuscript. All authors contributed to interpretation of the analyses and revised the draft manuscript.