Background
Recently, there has been increasing interest in assessing the prospects of currently applied and proposed nation-wide interventions for achieving the global elimination or control of the major preventable helminthic diseases, ranging from soil-transmitted helminthiases to schistosomiasis, onchocerciasis, and lymphatic filariasis (LF) [
1‐
5]. Partly, this is in response to the urgent policy demands for more accurate scientific information for determining if the roadmap set by the World Health Organization (WHO), based on sustaining and expanding drug access programs, will accomplish the elimination or control of these neglected tropical diseases (NTDs) by the target year of 2020 [
6]. In part, this interest also reflects the recent advances made in the areas of data science and computational epidemiology that increasingly enable the parameterization and execution of complex mechanistic models for simulating the outcomes of interventions reliably over large spatial domains [
7‐
11].
Previous modeling studies aiming to evaluate the prospects of meeting the goals of NTD programs at large regional or country scales have mainly employed two basic approaches. First, generalized parasite transmission models relying on parameter values obtained from limited datasets have been used to simulate intervention outcomes for a range of input values [
1,
2,
12]. In essence, this approach assumes that employing models parameterized at point-support spatial scales, i.e., using parameters and model structures originally defined from data collected at a few spatial sites, invariantly across a region is valid for mimicking regional-scale parasite population dynamics [
13‐
15]. More recently, approaches that use global grid-based parameter search methods for calibrating transmission models against either mean national-level infection values or a range of subgrid values within countries have been applied for undertaking these modeling investigations [
1‐
5]. While these studies have provided important strategic insights into the impacts of intervention options on likely timelines to parasite elimination, an implicit assumption behind these methods is stability and stationarity in the fine-scale pattern-process relationships used to develop the transmission models [
13,
14,
16‐
18]. If spatial nonstationarity or heterogeneity occurs in these pattern-process relationships, then significant aggregation errors can occur, severely biasing the model predictions produced and used at broader or coarser spatial levels [
16‐
18]. Such predictions will significantly underestimate the full range of heterogeneity in infection dynamics and consequently the outcomes of interventions across a spatial domain [
19].
We have previously shown how a key variable connected with LF elimination, viz., infection breakpoints or thresholds, is highly sensitive to local transmission conditions, and how this heterogeneity in the values of this variable can play a significant role in generating between-site variability in the timelines to parasite elimination as a result of applying interventions across a spatially heterogeneous domain [
20‐
22]. This outcome indicates the crucial need to address spatial heterogeneities in LF transmission dynamics if better predictions of the interventions proposed to eliminate this parasitic disease are to be delivered. It also implies that modeling frameworks that do not address such heterogeneity will not be able to offer the explicit predictions across space required by policy makers desiring to understand where programs are working and where they are unlikely to meet goals so that tactically targeted remedial actions focused on these aberrant sites can be applied. These considerations imply that to support undertaking better model-based LF control decision analysis at large regional or global scales, development and use of prediction platforms that can address the full range as well as uncertainty in the expected system response across the entire spatial domain of interest will be of paramount importance [
13].
Learning parasite transmission models that take a fuller account of heterogeneous dynamics across a spatial domain is a difficult task, but the increasing availability of geolocated demographic, intervention, and disease data [
23‐
26] together with growing advances made in computational science approaches to knowledge discovery, particularly in the areas of (1) high performance grid-based computing and programming [
8,
11], (2) data discovery, integration, and assembly [
11,
19,
27‐
31], and (3) data-driven approaches for inferring models from measurements [
32‐
38], mean that simulating disease dynamics and responses to interventions effectively across heterogeneous spatially structured environments at large scales are now becoming increasingly feasible. Bayesian data-driven modeling frameworks have received considerable attention in this regard given their ability for not only facilitating the induction of a dynamical system from data, but also in the use of multiple data sources for constraining the parameters of a model to capture the local transmission features of a spatial setting [
21,
22,
33,
39‐
41].
In this paper, we describe the development of a spatially hierarchical data-driven computational platform to serve as a tool for supporting the simulation of the heterogeneous transmission dynamics and control of the major vector-borne macroparasitic disease, lymphatic filariasis, across a major endemic spatial domain, focusing here on the sub-Saharan African continent. We begin by describing how such a platform can allow estimation of the local population dynamics of this disease across this spatially complex disease-endemic continent by facilitating the learning of locality-specific transmission models from georeferenced data. We then use the discovered local models in conjunction with LF intervention data assembled for each relevant endemic country to highlight how such a system can be used to investigate the emergent policy questions germane to the elimination of this highly debilitating disease from this important endemic continent, viz., (1) which countries are on course to meet the LF elimination target year of 2020, (2) which are unlikely to meet this goal, and (3) which remedial strategies are best suited to enhance disruption of parasite transmission most effectively in the latter case. We also contrast the findings with those resulting from recently conducted national-level intervention modeling work [
1‐
5,
12] by focusing on two themes: (1) the importance of constraining model parameters to reflect the complexity of subgrid transmission heterogeneities with multiple localized input data and (2) the need for addressing such heterogeneous transmission dynamics for minimizing aggregation error when making coarse-scale predictions.
Discussion
Using parasite transmission models for producing regional-scale intervention predictions presents a number of difficulties, which chiefly arise from the heterogeneity that underlies infection patterns across a spatial domain [
17,
18,
55‐
58]. At the heart of these challenges is the problem of how best to scale transmission processes up from the local setting to predict phenomena at coarser hierarchical scales of space and time, particularly when inference on aggregate properties of entities of interest is based on models developed using components and processes estimated at small fine-scale levels [
55‐
57,
59,
60]. This is compounded further by the fact that spatial variability in the biophysical and social contexts of transmission will alter the association between infection pattern and process across different endemic settings [
20‐
22,
61,
62], and extrapolation across scales often involves transmutation, where this relationship may change qualitatively as scales are crossed [
18,
63].
Here, we have sought to address the task of predicting the impact of LF interventions at the country level by employing an approach that focuses on the discovery and use of local models to take specific account of within-country spatial effects in parasite transmission and control dynamics. The approach incorporates aspects of a number of strategies suggested previously for dealing with spatial scale issues, viz., ensuring that model structure is unaltered, grain is not changed, uniqueness of location effects is preserved, and there is no aggregation of data [
13,
18,
55]. This approach differs from recent attempts to model the regional impacts of interventions to either eliminate or control NTDs [
1,
2,
4,
5], which largely constitute variations of the calibration approach to upscaling model predictions for generating aggregate results [
18]. Here, a transmission model is calibrated against coarse-grained data, and models are identified that match the chosen data using various objective functions. While such methods can approximate the effects of spatial heterogeneity in system dynamics by selection of models that match various expected in-country infection ranges [
4,
5], parameter estimates are still only valid within the often arbitrarily chosen data ranges, with the reliability of calibration unknown outside this range. They also do not account for the actual distribution of infection across a real landscape, the knowledge of which is required to appropriately weight the contribution of local models for forming more reliable aggregate predictions. Such aggregation exercises also assume that fine-scale spatial variation in parameter values within the aggregate does not matter; and finally, they presuppose the existence of an aggregate landscape-wide infection property that can be derived from the finer-scale system information [
55]. The sum of these effects is that significant aggregation error will be introduced into any attempt aiming to represent what are in reality
n-dimensional complex systems using less than
n state variables [
16]. Such errors will underestimate the impact of spatial heterogeneities in transmission processes across a domain of interest and thereby significantly bias estimates of mean timelines to parasite extinction in a spatial region [
14,
17,
18].
By contrast, our technical solution to this upscaling problem is to design a hierarchical landscape-wide computational platform that facilitates learning ensembles of local LF transmission models from spatially observed/derived data within countries and uses their predictions for a representative sample of sites to support inference making at various aggregate scales (e.g., district, country, and continental levels (Figs.
5,
6, and
7; Tables
4 and
5)). In other words, the method relies on a reverse engineering paradigm in which locality-specific mechanistic models are identified and used for inference making via data-driven discovery methods [
38]. This focus on data in the approach for local model discovery is important; it meant, on the one hand, the establishment of a systematic process (Fig.
2) for conducting the search, analysis, and integration of the required data, and, on the other, given gaps in these data, also consideration of how to best estimate the needed localized data using various methods of interpolation or prediction (see Methods). These vagaries in the type of input data, whether contributed by the limited availability of measured data at the scale of modeling (MDA coverages) or through errors in the data estimated for sample sites (by mapping (e.g., mf prevalence, VC coverage), model-based predictions (e.g., ABR values), or derivations (mf age prevalences)), mean that errors in model calibration and therefore in the precision of our predictions are inevitable [
58]. While this cautions against the uncritical use of the present modeling results, it is also important to realize that this limitation in data for undertaking spatially structured modeling is partly procedural and therefore fixable. Thus, for example, given that much of the needed data for modeling LF control, particularly with regard to mf prevalence and coverages of MDA and VC, are available with the LF endemic countries undergoing MDA programs, provision of these data to modelers is currently hampered by the lack of negotiated data transfer protocols. Until this is resolved, it must be recognized by policy makers that modeling options, by necessity, will have to rely on resorting to simulating scenarios in those situations where data are lacking, just as we have done in this study with regard to comparing the outcomes of implementing MDA sequentially from highest to lowest prevalence IUs to a situation where MDAs are offered randomly to IUs. Our results will thereby perforce be approximate, with these data errors contributing to a portion of the uncertainty in our model outputs.
Nevertheless, one immediate value from using a data-driven modeling approach is highlighted by the staggered nature of the actual annual MDAs applied thus far in LF endemic countries (Table
1), indicating that substantive numbers of IUs within each country started MDA implementation between 2009 and 2013 with some countries beginning LF MDA nationally only at this time. This staggered start immediately points to the high probability that many African countries will not achieve LF elimination by 2020 using the current annual MDA and VC interventions. This conclusion is substantiated and further clarified by the simulations carried out in this study (summarized in Tables
4 and
5 and Figs.
6 and
7). The most significant of these results of urgent policy relevance is our finding that partly as a result of this staggered delivery of MDA, at best only 3/36 endemic countries will be able to meet the goal of LF elimination by 2020 if site-specific breakpoints are used as targets for signifying transmission interruption (with the majority of countries (21/36) able to achieve parasite elimination only between 2031 and 2035 (Table
5)), while if the WHO threshold of 1% mf prevalence is used and the sequential roll-out of annual MDAs is applicable, then this will increase to 24/36 countries able to achieve this target. This finding indicates that aggregate timelines to LF elimination from the application of annual MDA in a country will be a complex outcome of the spatial distribution of in-country baseline infection prevalences, implemented MDA and VC coverages, duration and nature of MDA roll-outs, and the infection breakpoint values used for determining transmission interruption, with the choice of which breakpoint values to use, whether the WHO-set 1% mf prevalence threshold or the much lower site-specific breakpoint values estimated in this study (see Fig.
4 and Table
3), playing the most critical role.
Our modeling of the impact of proposed or potential remedial measures applied from 2016 (using site-specific breakpoint values but following a sequential roll-out of interventions) to accelerate the progress to LF elimination in Africa has provided several new insights into the relative effectiveness of these interventions for achieving this goal. The first result of import is the finding that simply increasing VC coverage to 80% under existing MDA coverages will not accelerate the meeting of LF elimination at the country level (Table
5). This is unsurprising, given that insecticide bed net coverages used in the baseline simulations across the majority of IUs within the present countries were already at values as high as 60% on average; as we highlighted before [
12,
64,
65], increasing VC coverages by moderate amounts when MDA coverages are already at moderately high levels will not lead to significant impacts on timelines to elimination due to the inherently greater impact of chemotherapy versus VC in reducing LF infection. By contrast, but for the same reason, switching to MDA based either on biannual drug delivery or annual IDA regimens significantly accelerated the achievement of parasite elimination in all countries. Thus, while implementing biannual MDA from 2016 will allow a majority of countries to achieve LF elimination under current drug and VC coverages by 2025 (see Table
5 and Fig.
6), increasing VC coverage to 80% along with this regimen will allow virtually all of them to meet the goal of elimination by this year. The best results were, however, obtained by all the IDA-based regimens evaluated, with achievement of parasite elimination facilitated in all countries by the year 2023. Although contingent on the pattern of MDA roll-outs, these results clearly support increasing suggestions for countries to switch to these more intensive drug regimens, where feasible, to accelerate their prospects for meeting the goal of LF elimination as rapidly as possible [
66].
The estimates of durations required by the annual MDA intervention in this study are considerably longer than those anticipated by the Global Programme to Eliminate Lymphatic Filariasis (GPELF), which envisages all endemic countries to be under full geographic coverage by 2016 and post-MDA surveillance beginning in all countries by 2020 [
67]. They are also significantly longer than estimates developed by a recent modeling study which projected that, in the worst-case scenario, LF elimination globally will be achieved by 2028 [
4]. These discrepancies in the results between studies highlight the importance of carefully considering the methodologies and threshold targets used by various workers in deriving intervention duration estimates. Thus, while the GPELF estimates are simply based on assuming that 5 years of annual MDA will be sufficient to break transmission in all areas, the latter results were based on predictions of a deterministic model [
12,
51,
68,
69] calibrated to a limited set of expected baseline prevalences within countries and which assumed an 85% MDA coverage and a target threshold of 1% mf for all areas. This use of uniform values for various intervention parameters, a weaker constraining of models to match only overall human infection data, plus the limited consideration of spatial heterogeneity in site-specific transmission dynamics clearly underlie the finding that 6–15 rounds of annual MDA would be sufficient to eliminate LF transmission in that work, compared to the significantly longer time periods we estimate here for the same 1% mf threshold (10–24 annual rounds between countries, depending on within-country heterogeneity in baseline mf prevalence and actual MDA/VC coverages and roll-out patterns achieved (Table
4)). This finding highlights how ignoring a fuller consideration of heterogeneous transmission dynamics across a spatial domain (as a result of not constraining a model by locally varying infection data) can lead to biased and overly optimistic aggregate predictions of the prospects for eliminating a spatially variable parasitic disease.
However, as with any modeling study, ours also has limitations that need to be considered when interpreting the results presented here. First, although our computational platform is designed to aid simulations of the effects of interventions based on discovery and use of local LF models calibrated to site-specific data, spatial correlation between sites was captured only approximately via a geostatistical model describing spatial variations in mf prevalence within an individual country. Thus, we assume that model structure is spatially invariable but parameter values will vary according to specific location as an explicit function of the spatial structure governing the distribution of this infection state variable across a region [
33]. While the creation and use of ABR maps would have strengthened the incorporation of spatial structure in an important driving input variable too, this option was precluded by the lack of sufficient data on this variable for all study sites (see Methods), although note that model-estimated ABR data were used for calculating mf breakpoint values in each of our study sites. For the same reason, we have also not considered the impact of human migration patterns or mosquito dispersal patterns in our simulations, but we note that the spatial correlation in the mf prevalence data is likely to subsume some of these effects indirectly [
70].
Our data-fit approach also depends on data availability as well as quality. Although here we have addressed errors in the mf/ABR data via model calibration to 95th percentile ranges in these data, it is clear that without a full observational model it is difficult to assess whether any lack of fit of our models is due to poor mechanistic understanding of the effects of spatial heterogeneity in transmission processes, or to problems in the calibration data (poor overall survey data quality, missing data). It is known that under this circumstance, model calibration efforts may need to be flexible and might need to examine the use of semiquantitative and qualitative pattern matching methods [
71], rather than be based solely on quantitative data-fitting approaches [
58].
Executing large continental-scale model discovery and simulation programs presents a further challenge associated with the handling and processing of large datasets. While we have created a plausible data management and scientific workflow system to tackle the issues of discovery, assembly, and data transformations/interpolations required to provide the input data for identifying the locally applicable LF models, we note that there is a need to automate our current approaches to speed up these data delivery and processing activities. We are currently working with computer scientists to develop a server-side infection data processing system based on using data warehouse principles and methods [
27,
30,
31] to address this issue. A similar requirement for running data-intensive models across a large heterogeneous spatial domain is looking at advances in software and hardware to speed up the computational discovery and simulation process. This means not only optimizing our current Matlab codes for running on batch compute multicore systems and clusters, but also examining more flexible and faster code implementations using C, C++, or even Java [
11]. Speeding up database and simulation scalability using hardware acceleration employing graphics processing units (GPUs) or similar accelerated parallel computing platforms [
72] presents another current option we are investigating to overcome the high performance and memory overheads connected with our data-driven modeling approach. We expect that the effective resolution of these challenges will allow us to accomplish the next stage of the work reported here, viz., the provision of intervention simulations for decision making at the small spatial scale of the village or community.