Background
Pathogen genomics is revolutionizing public health by providing a rich data source for informing real-time, actionable recommendations for public health programmes [
1]. Each pathogen genome is a unique record of its previous transmission history that can be used to study the origin and spread of infectious diseases in real time. Genetic surveillance of pathogen populations provides an opportunity to characterize pathogen transmission structure and provide data-informed recommendations to public health programmes to decrease transmission. In the past two decades alone, breakthroughs in genomic technologies and analytical techniques [
2‐
4] have expanded pathogen genetic surveillance to a wide variety of viral and bacterial pathogens. Recent examples include the SARS-CoV-2 pandemic [
5,
6], the 2013–2016 West African Ebola outbreaks [
7,
8], and the Middle East Respiratory Syndrome (MERS) outbreaks in the Middle East [
9].
Despite this success, extending genetic surveillance to more complex pathogens, such as the eukaryotic
Plasmodium falciparum parasite that is the causative agent for the deadliest form of malaria, has been challenging. Unlike viral or bacterial pathogens,
P. falciparum has a complex, 23-megabase genome with over 5000 genes whose genomic architecture is heavily influenced by meiotic recombination [
10].
Plasmodium falciparum must undergo sexual reproduction within a mosquito vector to complete its life cycle prior to being transmitted to a new human host. The sexual nature of the
P. falciparum parasite complicates many of the phylogenetic and phylodynamic techniques used in viral and bacterial genetic surveillance studies [
2‐
4].
Malaria genetic surveillance has instead relied on identifying genetic epidemiology metrics that summarize the changes in parasite genetics observed from the empirical sampling of parasite genomes from malaria endemic regions. These genetic epidemiology metrics include the frequency of multiple strain (polygenomic) infections [
11,
12], the number of strains per infection (complexity of infection, COI), the genetic relatedness of parasite strains [
13,
14], and the frequency of clonal parasites in the population [
15,
16].
Many of these genomic epidemiology metrics were identified by comparing sites with different levels of transmission intensity, whose measurement includes prevalence (frequency of infections), incidence (rate of new infections), and the entomological inoculation rate (EIR, number of infectious mosquito bites per individual). As such, malaria genetic epidemiology metrics tend to be associated with transmission intensity. Regions with high transmission intensity are expected to have high frequencies of polygenomic infections and high COIs because individuals are more likely to be superinfected with multiple infectious bites [
17] and there is greater opportunity for parasite outcrossing. Conversely, regions with low transmission intensity are expected to have high frequencies of clonal or genetically related parasites [
18,
19] due to increased levels of inbreeding associated with declining transmission and smaller effective parasite population sizes.
However, recent genomic analyses of polygenomic infections show that a large fraction of polygenomic infections are not the result of superinfection, but instead from the cotransmission of multiple parasite strains from a single infectious bite [
13,
20‐
23]. Cotransmitted polygenomic infections do not represent multiple infectious bites and their presence suggests that superinfection-based predictions of transmission intensity from metrics such as the frequency of polygenomic infections and COI may be inaccurate. Despite this, lower rates of cotransmission were previously observed in Kedougou, a high transmission site of Senegal, than in Thies or Richard Toll, which are both sites with very low transmission [
13]. These results suggest that the frequency of cotransmitted polygenomic infections could also be used to infer transmission intensity.
Mechanistically, cotransmission and superinfection are important drivers of parasite genetics, but how cotransmission and superinfection define the relationship between genetic metrics and epidemiological measures of transmission intensity is unknown, and it is unclear whether these relationships are consistent across the range of transmission from low to high intensity. It is also unclear to what extent other epidemiological factors, such as transmission heterogeneity (e.g., focal transmission) and importation, affect these genetic epidemiology metrics. A major goal for this study was to characterize the relationship between parasite genetics and transmission intensity using metrics that measure the impact of both superinfection and cotransmission, and to determine whether predictions of transmission intensity can be improved if both factors are considered.
In this study, the relationship between parasite genetics and malaria incidence as reported by the National Malaria Control Programme (NMCP) was examined. Malaria transmission in Senegal is highly heterogeneous and dependent on geographic location, ranging from < 1‰ to > 1000‰ annual incidence. This geographic disparity was ideal for evaluating the relationship between parasite genetics and incidence across a range of transmission intensities in a limited geographic area where reported incidence and genetic epidemiology metrics could be measured consistently across study sites and years. A series of mathematical models were used to quantify the relationship between parasite genetics and incidence and identify transmission regimes (regions within the incidence parameter space) where the relationships between parasite genetics and incidence differ. Identifying these transmission regimes is important because they can arise from fundamental changes in transmission structure that affect how parasite genetics can be used to study transmission and develop data-informed public health recommendations.
Discussion
Parasite genetics has the potential to enable public health officials to evaluate changes in transmission in settings where the corresponding epidemiological data are either missing or difficult to collect. However, the utility of parasite genetic surveillance will depend on how informative genetics is for studying malaria transmission and whether the inclusion of genetics can enhance the confidence of estimates based on standard epidemiological measures of transmission. The major goals of this study were to (1) characterize the relationship between five malaria genetic epidemiology metrics that collectively assess the impact of superinfection, cotransmission, and clonal transmission and incidence to determine whether these relationships were constant across transmission strata, and (2) test the predictive power of five malaria genetic epidemiology metrics for inferring transmission intensity, which in this study was measured as the NMCP-reported incidence for the catchment health facility.
Senegal was an ideal setting for this analysis due to its extensive range of transmission intensities in a localized geographic region. By utilizing data collected across 16 health facilities located throughout the country, this study found that the relationship between parasite genetics and annual incidence changed in very low transmission settings with < 10‰. Based on these results, parasite genetics could be used to evaluate changes in incidence when the annual incidence is > 10‰ and used to assess potential sources of importation and other forms of transmission heterogeneity when transmission is low and falls below an annual incidence of 10‰.
When transmission is above an annual incidence > 10‰, the relationship between parasite genetics and reported incidence were consistent with previously established superinfection-based hypotheses that predict higher rates of multiple infections as transmission intensity increases [
37‐
39]. Under these conditions, parasite genetics can be used to accurately infer incidence and increasing transmission intensity is associated with an increase in polygenomic fraction and COI and a decrease in the frequency of clonal parasites in the population. These results suggest that NMCPs can utilize these correlations to quantify and compare the incidences of regions where transmission is high enough to be explained by superinfection, which in Senegal occurs when annual incidence > 10‰. However, accurately inferring NMCP-reported incidence in these moderate-to-high transmission areas required the incorporation of all five of the metrics used in this study, including those designed to measure cotransmission (R
H and cotransmission fraction). Thus, while superinfection is likely the dominant driver of parasite genetics when incidence > 10‰, the impact of cotransmission should not be discounted.
This study suggests that genomic epidemiological inference can be made with as few as 24 SNPs and a relatively small number of samples, which could be advantageous for assessing changing levels of transmission in high transmission settings with limited resources. These 24 SNPs can be genotyped from the genomic material extracted from discarded RDTs, which greatly reduces the technical and logistical complexities involved with collecting appropriate genetic material from clinical populations [
25]. Collectively, this sampling strategy allowed us to develop a regression model to characterize and predict the NMCP-reported incidences when incidence > 10‰. In this study, the average number of samples per site year was 98.34, ranging from a minimum of 35 to 243 samples. These results suggest that a relatively small number of samples are needed to infer NMCP-reported incidence, but additional work is needed to assess the effect of sample size and sampling bias on genomic epidemiology inference more thoroughly.
However, the relationship between parasite genetics and incidence was not consistent across all transmission strata. When transmission falls below annual incidence < 10‰, many of the relationships between parasite genetics and incidence observed in higher transmission regions were reversed; increasing transmission intensity resulted in a decrease in polygenomic fraction and COI and an increase in the frequency of clonal parasites in the population. These results are difficult to explain under superinfection-based hypotheses, especially as the study sites with the lowest incidence, such as Richard Toll, had polygenomic fractions that were consistent with those seen in study sites with annual incidence > 400‰.
One possibility for the unusual trends in parasite genetics in very low transmission settings is that accurate quantification of the NMCP-reported incidence values is more difficult as transmission declines because infected individuals are infrequent and difficult to identify. Superinfection in very low transmission settings may also be more difficult to detect as parasite populations become more clonal and genetically related [
40]. While the problems associated with sample ascertainment or measurement error in low transmission settings cannot be discounted, it is difficult to attribute these observations to sample ascertainment bias alone given the tight, but reversed, correlation observed between polygenomic fraction, the fraction of non-unique monogenomic clones, and COI in this transmission regime.
Instead, these changes could be driven by fundamental changes in transmission structure that affects the parasite genetics of very low transmission settings [
41]. In Senegal, the reversed relationships between parasite genetics and transmission intensity could reflect the disproportionate impact of importation as local transmission declines. The 2013 Senegal census estimated that 14.6% of the population were internal lifetime migrants, meaning that their current area of residence differs from their birthplace [
42]. The most popular destinations of internal lifetime migrants are in the low and very low transmission regions, such as Dakar, Diourbel, and Thies. Richard Toll also experiences seasonal influxes of migrant workers due to the presence of the Senegalese Sugar Company, and identical parasite clones were previously detected between Dakar and Richard Toll [
42]. Anecdotal evidence obtained from the health facilities in the low transmission regions of this study suggest that patients with recent travel history were more likely to be tested and diagnosed with malaria. Regions with moderate to high levels of transmission, such as Kaolack and Kedougou, reported a net loss in population in the 2013 census. Thus, while it is possible to infer incidence from parasite genetics in the very low transmission settings of Senegal (Fig.
3B
, GLM
below10), this is possibly due to the importation of parasites from the moderate- to high-transmission regions to the lower transmission regions of Senegal. Parasite genetics in very low transmission settings should be combined with data regarding travel history or other indicators of human movement [
41,
43] to evaluate the potential impact of importation or focal transmission.
Overall, these results suggest that there are two distinct regimes where parasite genetics could be used to inform public health decision-making. When transmission is sufficiently high such that superinfection dominates, changes in parasite genetics can be used to infer incidence and quantify the transmission intensity in different regions. Parasite genetics could be especially valuable for evaluating the efficacy of public health interventions in reducing transmission in moderate to high transmission settings. However, when transmission falls below a certain threshold, these results suggest that parasite genetics should instead be used to begin evaluating the impact of importation or other heterogeneous transmission processes whose effects are masked by local mixing and transmission in high transmission settings but whose contributions are proportionally greater in low transmission settings. The exact incidence threshold for distinguishing between these two paradigms is uncertain, but likely lies between an annual incidence of 10 and 100‰. Until additional study sites with annual incidence between 10 and 100‰ can be examined, the current WHO guidelines for defining sites with very low transmission sites (annual incidence < 100‰) could be used to determine when parasite genetics can be used to infer transmission intensity and when it can be used to study subtler effects associated with importation and other sources of transmission heterogeneity.
The careful examination of parasite genetics in very low transmission sites could help NMCPs address long standing issues regarding the role of importation and focal transmission in low transmission settings. Very low transmission sites should be identified prior to using parasite genetics, as inappropriately applying the trends in parasite genetics from higher transmission settings risks over-estimating incidence and ignores possible sources of infections in low transmission areas. This phenomenon is most clearly seen in the model predictions from the GLM trained on all the sites (Fig.
3A) and the out-of-sample extrapolations made with the GLM trained on only sites with annual incidence > 10‰ (GLM
above10, Additional file
2: Fig. S9). In practice, very low transmission sites can be distinguished from higher transmission sites using standard epidemiological metrics of incidence. The advantage of parasite genetics in very low transmission settings is that it could potentially allow NMCPs to identify source-sink populations or focal transmission sites that require targeted intervention for elimination. Additionally, parasite genetics in very low transmission sites could potentially be used to help countries confirm the absence of locally sustained transmission when applying for WHO certification of elimination.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.