Background
Pregnancy-associated malaria (PAM) is a major public health concern. In areas of stable malaria transmission in sub-Saharan Africa approximately one in four pregnant women have evidence of malaria infection at time of delivery [
1,
2]. PAM is detrimental to both mother and child, placing the mother at increased risk of severe anaemia whilst increasing the chance of adverse birth outcomes, including stillbirth, preterm birth and low birth weight (LBW) [
1‐
3].
The adverse effects of PAM are mediated by the sequestration of infected erythrocytes in the placental microvasculature through binding of VAR2CSA—a large and genetically diverse parasite protein expressed during pregnancy-to human chondroitin sulfate A (CSA) [
4,
5]. Naturally occurring anti-VAR2CSA antibodies provide partial protection against future episodes of PAM, such that primigravid women are most susceptible and risk of severe infection and LBW decreases in subsequent pregnancies [
4,
6,
7]. Vaccines against VAR2CSA are currently undergoing initial trials [
8‐
10].
Extensive effort has gone into characterizing the particular sub-region of the VAR2CSA protein responsible for binding CSA and inducing a protective immune response. These efforts have led to the identification of ID1-DBL2X; a 1.6 kb segment encoding the minimal binding epitope [
11]. ID1-DBL2X has been shown to raise antibodies that abrogate the adhesion of infected erythrocytes to CSA with the same efficacy and specificity as the full-length extra-cellular protein, while maintaining high cross-reactivity to multiple parasite lines [
12]. Both leading PAM vaccine candidates, PlacMalVac and PRIMALVAC, use recombinant proteins that target overlapping constructs of this region [
9,
10].
The ability of any such vaccine to have a sustained impact on malaria will depend on the extent of antigenic variation in the vaccinated population, which in turn depends on the level of sequence polymorphism in the
var2csa gene. Studies into global diversity at the
var2csa locus have identified extremely high sequence polymorphism, with evidence that diversity is being maintained by balancing selection [
13‐
16]. Furthermore, this high level of diversity occurs against the backdrop of a major dimorphic split in the N-terminal segment of the VAR2CSA protein, leading to multiple sequence clusters, each of which has been found to associate with a different level of parasitaemia [
16] and a different risk of poor birth outcomes [
17]. Sequence polymorphism at the
var2csa locus is, therefore, both functionally relevant in vaccine design and clinically relevant in understanding the basic epidemiology of PAM.
The Democratic Republic of Congo (DRC) bears one of the highest malaria burdens in sub-Saharan Africa, with over 1 million
Plasmodium falciparum affected pregnancies each year [
18]. Transmission is stable throughout the country, and prevalence is estimated at 34.1% on average by PCR [
19]. The majority of studies into the genetics of
P. falciparum in DRC have focussed on issues of drug resistance (see Additional file
1). Studies into
dhps mutations, which confer resistance to sulfadoxine, have found distinct geographic clustering of the most highly resistant haplotypes in the east of the country [
20,
21]. In contrast, countrywide studies into neutral genetic variation [
22] and variation in the
pfama1 gene [
23] have found little signal of population structure or isolation by distance even over large geographic scales. To date, no study has explored the geographic and genetic structure of
var2csa in DRC, and so it is unknown what challenges may lie ahead in terms of vaccine design.
This study focused on quantifying genetic variation at the var2csa locus in samples obtained from the 2013–14 Demographic and Health Survey (DHS); a large, cross-sectional study separated into spatial clusters spanning the DRC. Central aims of this study were: (1) to explore var2csa diversity in these samples in the context of global diversity at this locus, (2) to quantify the level of spatial structure in the DRC, and (3) to determine what epidemiological factors (if any) are predictors of observed levels of diversity at this locus. These questions will be important for any future interventions aimed at reducing PAM in the DRC.
Discussion
In recent years, a great deal of effort has gone into studying the VAR2CSA protein, both in terms of its immune profile and its impact on clinical outcomes. However, while good data exists on levels of
var2csa sequence polymorphism in several countries spanning four continents [
15], to date no study has explored variation at this locus in samples obtained from DRC. More generally, understanding of the basic epidemiology and population genetics of this large and diverse country has been limited historically by a lack of good quality data. This study aimed to address this issue by quantifying
var2csa variation in DRC and relating it back to basic epidemiological questions.
The results of this study demonstrate that
var2csa is highly diverse within DRC, with 583 sequence variants identified among 812 children. High Tajima’s D and an excess of synonymous mutations suggest that this high diversity is being maintained by long-term balancing selection, as would be expected of a gene involved in antigenic variation, and as found in previous VAR2CSA studies [
13‐
16]. However, there was no clear signal of phylogenetic structure to this diversity. Low bootstrap values in the neighbour joining tree indicate that different loci provide contradictory information about the position of sequences in the phylogeny, suggesting that recombination is acting to intertwine the phylogenetic branches and break down patterns across loci (see Additional file
2). In the spatial analysis, there was high heterozygosity within and between clusters but no signal of increasing genetic distance with geographic distance or other barriers to gene flow. Together these results indicate that the combined effects of rapid diversification through balancing selection, gene flow between spatial clusters, and recombination within the gene are acting to break down any clear signal of geographic population structure in this particular sub-region of
var2csa. This contrasts with the result of Doritchamou et al. [
16], who found clear signal of population structure in the N-terminal subregion just a few hundred bp upstream of the target region used here, in samples taken from Benin. However, when the Doritchamou et al. [
16] sequences are truncated to the same target region used here, there is no clear signal of population structure (results not shown). Therefore, it appears that the extent of
var2csa phylogenetic signal varies greatly even within narrow sub-regions of this highly diverse gene. It is not clear from this study what impact this diversity has on acquired immunity, however the excess of synonymous mutations suggests some role in diversification of the VAR2CSA antigen.
Results of general linear modelling demonstrate that
var2csa diversity is directly related to epidemiological factors. The best-fitting model found that the allelic richness of a cluster tends to increase with prevalence, eventually plateauing out at high prevalence. Crucially, this analysis takes sampling effort into account—that is, prevalence is an important predictor of allelic richness even after accounting for the absolute number of infected children in the analysis. This result is in line with a wider body of evidence showing that genetic diversity tends to be higher in areas of intense transmission, perhaps due to increased opportunity for recombination [
45‐
47]. In terms of vaccine design, this may indicate that a vaccine would be more likely to succeed in areas of low transmission where populations tend to be more clonal. The finding that common alleles in DRC also tend to be represented in other African countries (see Additional file
3) is also encouraging for vaccine development, as it indicates that some variants are common over large geographic scales, despite the general trend of strong local diversification.
One strength of this study is the use of samples derived from the DHS, which is both cross-sectional and nationally representative. This ensures that asymptomatic and low-density (i.e. sub-microscopic) infections are captured in the analysis, which may include a different subset of strains to those found in clinically ill individuals [
48]. A caveat is that samples were extracted from children under 5 years of age, and not from infected pregnant women, and so the pool of strains sampled here may not perfectly reflect those implicated in clinical disease. The use of pooled samples also makes it possible to explore a wide geographic area, and to answer questions about spatial connectivity (Fig.
1). However, the use of pooled samples also limits analysis to making observations at the cluster level, and cannot reliably determine factors such as the complexity of infection of each individual within a cluster.
The results of this study highlight the extreme evolutionary pressures acting at the
var2csa locus to promote antigenic variability in the DRC. This variability creates a substantial hurdle in the development of VAR2CSA-based vaccine, which is extenuated by the highly-connected population structure found within the DRC. A broad multivalent VAR2CSA vaccine candidate could thus benefit from targeting stable regions and common variants to address the substantial genetic diversity [
49].
Authors’ contributions
SRM conceptualized the study. KM and AKT provided field support in obtaining samples. SMD, NJH, AW and JCP carried out lab work and bioinformatics. RV and OJW performed phylogenetic and statistical analyses. JAB, JJJ, ACG and SRM provided supervision and coordination. All authors contributed to the final manuscript. All authors read and approved the final manuscript.