Background
Malaria caused an estimated 429,000 deaths worldwide in 2015, with the overwhelming majority of deaths occurring in sub-Saharan Africa [
1]. In regions of holoendemic malaria transmission, individuals are routinely exposed to malaria parasites and subsequently develop naturally acquired partial immunity to malaria clinical disease despite harbouring malaria parasites [
2‐
10]. Individuals with asymptomatic or chronic malaria have been identified as important reservoirs for malaria transmission and represent a major challenge for malaria control and elimination strategies [
11‐
15].
Early molecular studies revealed that genetically diverse
Plasmodium falciparum strains circulate in malaria endemic regions and that this genetic heterogeneity contributes to the ability of
P. falciparum to evade the host immune response and develop resistance to anti-malarial drugs [
16‐
22]. It has been suggested that multiclonal malaria infections can influence clinical outcomes in a manner that is dependent on transmission intensity [
23], and may negatively impact an individual’s response to anti-malarial drug treatment [
24]. Further, multiclonal
P. falciparum infections increase the likelihood of inter-strain genetic recombination during the sexual stage in the anopheline vector, resulting in the generation of genetically diverse
P. falciparum strains and facilitating parasite evolution [
25‐
29]. Multiclonal
P. falciparum infections can occur either via multiple mosquito bites each with a different strain of
P. falciparum or via a single mosquito bite containing multiple
P. falciparum strains [
4,
30,
31]. The number of distinct
P. falciparum strains present within a single individual is defined as the complexity of infection (COI) [
32]. The relationship between COI and malaria transmission intensity is complex. On one hand, recent studies have shown a positive correlation between the intensity of malaria transmission and
P. falciparum COI, with malaria holoendemic regions typically experiencing higher
P. falciparum COIs compared to areas with seasonal or low malaria endemicity [
6,
33‐
39]. Thus, COI has been proposed as a method for measuring changes in malaria transmission intensity after the implementation of malaria control programmes [
33,
35,
40‐
42]. Conversely, other studies have demonstrated a lack of correlation between malaria transmission intensity and
P. falciparum COI [
43‐
45]. Additional studies into the relationship between malaria transmission intensity and
P. falciparum COI are, therefore, needed to better understand the relationship between malaria parasite genetic diversity and transmission dynamics and the potential utility of COI as a measure of change in malaria prevalence.
Several genetic tools and strategies have been employed to detect multiclonal
P. falciparum infections, including targeting size polymorphisms of the merozoite surface proteins (MSP1, MSP2) and GLURP [
5,
8,
46‐
49]. Some PCR based methods rely on DNA sequence length polymorphisms, which can be visualized via gel or capillary electrophoresis and the COI defined as the number of distinct bands present. However, these methods lack the sensitivity to identify distinct
P. falciparum strains that differ by only a few nucleotides in length or that contain single nucleotide polymorphisms (SNPs). Also, these methods have poor sensitivity in terms of detecting less abundant strains [
50‐
53], and differing methods can result in high variability in the number of strains detected between laboratories [
54]. Novel approaches based on DNA deep sequencing technologies provide increased capabilities to detect minor variant
P. falciparum strains as well as discriminate SNPs and small indels. These deep sequencing technologies provide a more accurate determination of the COI within an individual or population thereby improving subsequent population genetic analyses [
4,
6,
34,
50,
55,
56].
In the Democratic Republic of Congo, malaria is a leading cause of morbidity and mortality with over 95% of malaria infections due to
P. falciparum [
57]. The DRC Ministry of Health estimates that 97% of the population in the DRC live in areas where malaria transmission occurs 8–12 months out of the year [
57]. The 2007 DRC demographic and Health Survey (DHS) and subsequent studies reported over one-third (33.5%) of adults (15–59 years) were positive for malaria by real-time PCR (qPCR) [
58,
59]. Several studies have explored the complex malaria spatial epidemiology and population genetics in the DRC [
59‐
67]. For instance, a recent spatial and genetic analysis revealed
P. falciparum parasite populations are dispersed across seven geographical areas, likely due to movement of human populations between provinces in the DRC and the region [
61]. Additionally, Taylor et al. report spatial and genetic clustering of
P. falciparum sulfadoxine drug resistance between western and eastern DRC [
65]. Further studies to examine
P. falciparum haplotype diversity are, therefore, warranted to inform malaria control strategies and to monitor changes in malaria parasite population structure in response to malaria control efforts in the DRC.
In this study, a PCR amplicon-based deep sequencing approach was utilized to target the extensive allelic diversity of the
P. falciparum apical membrane antigen 1 (
pfama1) gene in order to (1) examine the relationship between
P. falciparum COI and
P. falciparum prevalence as determined previously by real-time PCR [
59], (2) to investigate the
P. falciparum population genetic structure at both the individual and population level in the DRC, and (3) to explore AMA1 amino acid frequencies and potential selection pressures between geographically distinct malaria parasite populations in the DRC and Mali. The authors hypothesized that
P. falciparum COI would be positively correlated with
P. falciparum prevalence in a region, and that similar
pfama1 haplotypes would be identified at the individual and population level in the DRC and Mali. In order to investigate
pfama1 diversity at both the individual and population level, individual samples (representing a malaria infection in a single person), and pooled samples (representing population cluster samples) were targeted in this study. Pooling samples is a cost-effective approach to amplicon-based deep sequencing as it reduces the number of PCR reactions and library preparations, and this pooled approach has been utilized in several malaria population genetic studies [
68‐
71]. This dual sample type (individual and population cluster) approach allows for the examination of COI using the individual samples and also powers spatial population genetic analyses combining the individual samples and the pooled population cluster samples.
Overall, a total of 77 unique pfama1 haplotypes were identified across DRC provinces. The vast majority of individual malaria infections were polyclonal (COI > 1), and no correlation was found between COI and malaria prevalence at sites/regions. Population genetic analyses revealed extensive genetic diversity of P. falciparum parasites based on the pfama1 gene and similar amino acid frequencies between malaria parasite populations in the DRC and Mali. Herein, this manuscript highlights the utility of combining individual and pooled amplicon-based deep sequencing methods for population genetic analyses layered onto the infrastructure and sample collection process of a routine Demographic and Health Survey. This manuscript also describes the spatial and genetic diversity of pfama1 haplotypes circulating in the DRC and Mali to improve the understanding of malaria transmission dynamics that could potentially inform future malaria control and elimination efforts in the region.
Discussion
In this study, an amplicon-based deep sequencing was utilized to investigate the diversity of
pfama1 genes from asymptomatic malaria infections at both the individual and population cluster level from across the DRC and in Mali. Overall, a total of 77 unique
pfama1 haplotypes were identified and the majority of individual infections in the DRC were polyclonal (64.5%). Population genetic analyses revealed
pfama1 haplotypes are not isolated based on distance or province within the DRC. These results align with a previous study in the DRC, which found a lack of spatial restriction of malaria parasite populations. This diversity, however, may not be due to the extensive movement of
P. falciparum parasites with their human hosts between provinces and neighbouring countries [
61]. Rather, more likely, potential explanations for the extensive
pfama1 haplotype diversity identified in the DRC in this study include human host immune selection that maintains the antigenic diversity of
pfama1 (balancing selection) and spatially restrictive protein–protein interactions [
92,
93].
In order to more fully explore
pfama1 diversity between geographically divergent malaria endemic regions, haplotype frequencies were compared at the amino acid level in parasite populations from the DRC and Mali. Highly similar amino acid frequencies were observed between parasite populations in the DRC and Mali (Fig.
6), suggesting analogous selective pressures could be maintaining
pfama1 haplotype diversity between the two regions across the continent more so than parasite movement. A previous study to investigate the diversity of the circumsporozoite protein (CS), another hypervariable surface antigen, also showed shared amino acid frequencies between two geographically separated malaria parasite populations [
93]. Highly diverse regions under balancing selection, such as AMA1, while excellent markers for COI, may therefore be poorly suited to discriminate geographically distinct malaria parasite populations or serve as a marker for malaria parasite diversity.
In contrast to several recent studies [
6,
33‐
36], this study found a minimal positive correlation between COI and malaria prevalence that was not significant (Fig.
4). While additional samples could have increased the power in this study, other studies have also reported no correlation between COI and
P. falciparum prevalence [
43‐
45]. Potential explanations for these discrepancies include differing methodologies for detection of
P. falciparum strains and varying malaria transmission intensity by region. Previous studies that reported significant correlations between COI and malaria prevalence typically compare low and high malaria transmission areas [
34‐
37,
39]. This study was conducted in the DRC, which experiences high malaria transmission year round. Therefore, the lack of a significant association between COI and
P. falciparum prevalence in this study compared to other studies could be because due to the high stable malaria transmission across the DRC. Additional research studies including larger sample sizes and additional markers are needed to further explore the potential relationship between COI and malaria prevalence and how population diversity indices could be utilized to monitor changes in malaria transmission intensity in the DRC and other malaria endemic regions. However, given the wide variance observed in the correlation between COI and prevalence, it may not be a reliable surrogate in differentiating malaria transmission levels within the DRC.
Deep sequencing technologies have enhanced ability to detect low frequency, minor variant
P. falciparum haplotypes and characterize malaria COI from a variety of sample types including dried blood spots [
4,
6,
34,
50,
55,
56]. Amplicon-based deep sequencing was utilized in this study to detect polyclonal
P. falciparum infections for several reasons, including its cost-effectiveness compared to whole genome sequencing and the ability to utilize barcoding and pool several dozen samples thereby increasing sample size. The SeekDeep bioinformatics pipeline is designed for analysis of haplotype frequency from amplicon-based deep sequencing data and has been used successfully in several studies investigating malaria population genetics globally [
76,
78,
79].
Pfama1 was chosen for amplicon-based deep sequencing based on several factors. First,
pfama1 is a highly polymorphic gene, containing several single nucleotide polymorphisms (SNPs), likely maintained via balancing selection due to immune pressure in the human host [
80,
94‐
96]. Previous studies in malaria endemic regions have identified over 60 polymorphic sites within
pfama1 [
96‐
99]. Similarly, sequencing of human samples from a malaria endemic region in Mali identified over 200 unique
pfama1 haplotypes [
80]. The
P. falciparum AMA1 antigen is also a highly-studied malaria vaccine antigen candidate. Vaccine studies have demonstrated that AMA1 based vaccine protection against clinical malaria is extremely strain-specific and, therefore, a clear understanding of AMA1 diversity is critical to develop an effective malaria vaccine based on this polymorphic antigen [
100‐
105]. The results from this study provide further evidence of the extensive heterogeneity of
pfama1 haplotypes in the DRC and surrounding malaria endemic regions.
This study has several important limitations that may have restricted the ability to detect minor variants and calculate COI in the malaria parasite population circulating in the DRC. These limitations include: possible
pfama1 sequence polymorphisms in primer binding sites, malaria parasite nucleic acid degradation stored on dried blood spots, and
pfama1 haplotype frequency below the limit of detection of the PCR assay or 2.5% cut off for sequencing analysis. In addition, this study focused on a subset of asymptomatic malaria samples collected as part of the 2007 DHS in the DRC. The inclusion of more malaria positive samples, including symptomatic as well as asymptomatic malaria infections, would provide a more comprehensive description of the
P. falciparum population genetic structure in the DRC. Another potential limitation is that this study targeted a region in the highly polymorphic
pfama1 gene as surrogate for the entire
P. falciparum genome. As such, the true genetic heterogeneity of
P. falciparum parasites circulating in the DRC is underestimated. Further, as the number of polymorphic sites (S) was unexpectedly higher in the individual samples compared to the pooled population cluster samples, it is important to note that pooled sampling likely missed some variants occurring at low frequency within one or a few individuals within the population (Table
2). As such, it is critical to consider whether samples were pooled prior to amplicon-deep sequencing when designing studies to detect low frequency variants and for cross comparisons between individuals and pools while choosing statistics minimally influenced by rare variants or haplotypes, particularly in low malaria prevalence areas. However, targeted deep sequencing shows great improvement in COI estimates over traditional methods [
50], particularly for
pfama1 given its high 0.95 haplotype diversity. To account for the chance of strains sharing the same AMA1 haplotype, a permutation-based model incorporating undercall probability was used to simulate corrected COIs (Fig.
4). Given the high heterozygosity of
pfama1 and the observed COIs, the corrections showed minimal differences compared to the observed (uncorrected) COI results (Fig.
4). This would not be the case if the average COIs in this study were higher, as COIs > 5 were estimated to be undercalled for the majority of observed measures. As deep sequencing technologies become increasingly more cost effective and less labour-intensive, future studies targeting
P. falciparum strain diversity in malaria endemic regions could include whole genome deep sequencing.
Authors’ contributions
Designed the study and experiments: RM, SM, JJ, AS, JB. Performed the experiments: RM, OK. Analysed and interpreted data: RM, NH, SM, JJ, AS, JB, Contributed reagents/materials/analysis tools: NH, KM, AT, SM, ST, JJ, AS, JB. All authors read and approved the final manuscript.