Background
The massive use of drugs for treating
Plasmodium falciparum malaria has selected for mutations that confer resistance in endemic areas worldwide, rendering traditional anti-malarial drugs ineffective in vast regions of the globe [
1,
2]. Artemisinin combination therapy (ACT) is now being used in many endemic areas; however, there are concerns that mutations conferring resistance against ACT could also emerge [
3,
4]. Understanding such complex evolutionary processes, especially in the context of combination therapies, is a matter of great interest. Valuable information about such dynamics can be obtained by retrospectively investigating the rise of resistance against sulphadoxine-pyrimethamine (SP), a combination drug therapy that has been widely used and for which the molecular basis of resistance is well known.
SP acts as an inhibitor of the
P. falciparum folic acid pathway, and point mutations in two genes, dihydrofolate reductase (DHFR) and dihydropteroate synthetase (DHPS), confer resistance to SP [
5]. Point mutations at
dhfr codons 50, 51, 59, 108 and 164 act synergistically to increase resistance to pyrimethamine. Of note, S108N has a low level of resistance, the double mutants N51I/S108N and C59R/S108N have moderate levels of resistance, the triple mutant N51I/C59R/S108N has a higher level, and the quadruple mutant parasite (N51I/C59R/S108N/I164L) is considered to be resistant to the effects of pyrimethamine [
6,
7]. Similarly, mutations at
dhps codons 436, 437, 540, 581 and 613 act synergistically to increase the level of resistance to sulphadoxine. Simply, the mutations S436A and A437G alone confer a low level of resistance, and when in combination with K540E and/or A581G and/or A613S/T the parasite has an increased level of resistance to sulphadoxine [
1,
8].
The evolution of drug resistance is further complicated by the fact that resistant alleles may have multiple origins intertwined with migration patterns among
P. falciparum populations; such complex dynamics are still poorly understood. There is compelling evidence indicating a common origin for highly resistant pyrimethamine alleles across Southeast Asia and at a few sites in Africa [
9‐
14]; however, additional, novel low frequency lineages for the triple mutant (51I/59R/108 N)
dhfr allele have been documented in Cameroon and also in western Kenyan [
14,
15]. Similarly, recent studies from sites across Africa and Asia show multiple independent origins of mutations at
dhps [
16‐
18]. However, the patterns for
dhps highlight different evolutionary processes than those for
dhfr. Thus, SP-induced selection on resistance-associated mutations may differ for the two genes and across different endemic regions. Hence, reliable estimates of selective parameters for various
dhfr and
dhps mutations are highly desirable.
A few studies have addressed the genetic consequences of SP drug selection, yet the temporal dynamics of mutations are rarely investigated in both loci. Indeed, patterns consistent with selective sweeps of highly resistant
dhfr alleles have been reported in multiple populations [
9,
19,
20], but there are only a few studies on
dhps [
14,
17,
18,
20]. Despite the limited evidence,
dhps shows a clear pattern of reduced diversity in multiple populations, indicating an increase in mutant alleles conferring resistance to sulphadoxine. Notably, the patterns of the selective sweeps in
dhps and
dhfr appear to be different, providing evidence that the strength of selection is not the same on both loci [
14,
20]. However, all these studies are based on cross-sectional data and measures of the strength of drug selection are limited. Attempts to infer the strength of selection have been made for
dhfr [
9,
21] but such estimates focused only on the proportion of clinical failures, an indirect line of evidence that does not consider the actual frequency of resistant mutations and may lead to inaccurate predictions. A direct comparison of the selective strengths on
dhfr and
dhps during the early stages of the onset of clinical resistance is still missing. Indeed, estimates of drug selection have not been obtained from molecular data. Moreover, pattern of selective sweeps studied so far just indicate drug selection but the importance of linking estimates of selection parameters with the pattern of the sweep have been neglected.
Here, a population-based characterization and analysis of genetic signatures around
dhfr and
dhps from samples collected in western Kenya from 1992-1999 was conducted. At the time these samples were collected, SP had been exerting selective pressure on
P. falciparum populations since the 1980s. SP was introduced in Kenya as a second-line treatment for uncomplicated malaria in 1983 and as a first-line treatment in 1999 [
22,
23]. However, clinical SP resistance was noted as early as 1982 [
23]. Thus, this study captures some of the early events in the dynamics of drug-resistant mutations in the local
P. falciparum population. Even before SP was chosen as a first-line treatment, all alleles in the population had
dhfr mutations associated with pyrimethamine resistance. In contrast, sulphadoxine-sensitive alleles at
dhps were still present while resistant double-mutant alleles were increasing in frequency. The longitudinal data, allowed inferences of the selective strengths on various mutations at
dhfr and
dhps based on a theoretical model tailored to
P. falciparum. Overall, these investigations highlight the differences in selective pressures on these two loci, when the drugs were part of a combination drug therapy.
Methods
Study subjects
Two hundred thirty-six blood samples collected from the Asembo Bay Cohort Project, from the years 1992-1999 [
24], were analysed. This study was approved by the ethical committee of (Institutional Review Board) CDC and the Kenya National Ethics Review Committee. The participants provided written informed consent. In short, this was a longitudinal study conducted between 1992 and 1999 in western Kenya, a holo-endemic area of intense transmission estimated at approximately 300 infective bites per person per year [
25]. Blood samples were taken from mother-infant pairs and other siblings less than five years old once per month until the children turned five years old. Malaria parasitaemia was treated with SP.
DNA isolation and genotyping methods
DNA was isolated from whole blood using the QIAamp
® DNA Mini Kit (Qiagen, Valencia, CA, USA). All samples were genotyped for
P. falciparum mutations at
dhfr codons 50, 51, 59, 108, and 164 and
dhps codons 436, 437, 540, 581, and 613 by pyrosequencing as previously described [
20,
26].
Microsatellite characterization
Microsatellite characterization was conducted on all samples. Samples were assayed for 18 microsatellite loci that span 138 kb on chromosome 4 around
dhfr [
9‐
11], 18 loci that span 138 kb on chromosome 8 around
dhps [
19], five loci on chromosome 2 that span 101 kb, and four loci on chromosome 3 that span 94 kb [
20]. The microsatellites used around
dhfr are at -89, -58, -30, -17, -10, -7.5, -5.3, -4.5, -4.4, -3.8, -1.2, -0.3, 0.2, 0.52, 1.48, 4.05, 5.87, 30.3, and 50 kb; where negative numbers refer to positions 5' to the gene and positive numbers refer to positions 3' to the gene. The microsatellites used around
dhps are at -72.7, -34.5, -18.7, -11, -7.4, -2.8, -1.5, -0.132, 0.034, 0.5, 1.4, 6.4, 9, 16.3, 22.8, 36, 49.5, and 66.1 kb. The loci around
dhps have been previously published [
19,
20]; however, it was recently brought to the authors' attention that the orientation of the microsatellite loci along chromosome 8 around
dhps was incorrect in [
20]: loci that have been reported previously as 5' to
dhps are actually 3' and vice versa. To avoid any confusion, the corrections along with previously published positions and primers are in Additional file
1: Table S1. The correct positions of
dhps loci have been used throughout this manuscript.
The microsatellites used on chromosome 2 are at 302, 313, 319, 380, and 403 kb. The microsatellites used on chromosome 3 are at 335, 363, 383, and 429 kb. The PCR primers for 403 kb chromosome 2 are 5'-AAATATAAATCTTCTTCTTCTTTTTT-3' (forward) and 5'-TAGAGAAATAAATATATCCAT-3' (reverse); and for 363 kb chromosome 3 are 5'-CAAAAATGAAAAATGAAAAGG-3' (forward) and 5'-TAAAGGGTGCGCATATCAAT-3' (reverse). All remaining microsatellite PCR primers are detailed in [
20]. Single reaction PCR and thermal cycling conditions are detailed in [
9]; and nested PCR reactions and thermal cycling conditions are detailed in [
10]. PCR products were separated on Applied Biosystems 3100 capillary sequencer and scored using GeneMapper
® software v3.7 (Applied Biosystems, Foster City, CA, USA).
Genetic variation per locus and allele
The genetic variation for each microsatellite locus was measured by calculating the expected heterozygosity (
H
e
) and number of alleles per locus (
L).
H
e
was calculated for each locus as
, where
n is the number of isolates sampled and
p
i
is the frequency of the
i th allele (
i = 1,...,L). The sampling variance for
H
e
was calculated as
[
19,
21].
H
e
was calculated using all alleles that occurred in the respective group including those in isolates that carried more than one microsatellite allele.
H
e
, was also calculated for microsatellite loci associated with specific dhfr and dhps mutant alleles. For dhfr alleles, only samples with single 'clone' infections of the respective mutant allele were used. This guarantees that the microsatellite variation is linked to the respective allele. The pattern of variation present, before the occurrence of a beneficial mutation, should be reflected by H
e
among wildtype alleles; however, since sensitive wildtype alleles were only present at marginal frequencies an estimate of H
e
could not be calculated. As a proxy to estimate the initial variation, H
e
was calculated among non-triple mutant dhfr alleles. For this estimation, all mixed infections that did not contain the 51I/59R/108 N triple mutant (e.g. an isolate with mixed codon 51I/S 108N was included, but an isolate with mixed codon 51I/59R/S 108N was excluded) were included. At microsatellite loci around dhps, H
e
was calculated separately among isolates that contained single infections with the 437 G/540E mutant allele, and isolates that contained single infections with the sensitive (wildtype) alleles.
Haplotype characterization
Approximately 70% of the samples used in this study were 'multiple infections', i.e. multiple parasite lineages or genomes were present in an infection. Based on dhfr and dhps genotyping alone, 63.0% and 72.2% were multiple infections, respectively. The neutral microsatellite markers on chromosomes 2 and 3 collectively showed that 70.0% of the samples contained multiple infections, and the microsatellites around dhfr and dhps showed 73.9% and 64.5% multiple infections, respectively. A goal with this study is to present a population-based perspective and analysis of the data; thus, data from multiple infections for appropriate analyses was retained. Multiple infections are inappropriate for all of the analyses; and it is stated when data from multiple infections were excluded.
Microsatellite haplotypes are defined as a collection of sites close to the genes dhfr and dhps that had low variation and were more likely to be in linkage disequilibrium. Thus, 11 microsatellite loci spanning 11.5 kb around dhfr and nine loci spanning 20 kb around dhps were used to characterize haplotypes relative to dhfr and dhps alleles. Haplotypes were classified as different if they contained > 1 different allele across loci. Only samples without mixed infections detected by pyrosequencing were used for haplotype characterization.
Haplotype analysis
eBURST groups haplotypes, based on a simple evolution model, which assumes that one lineage or founding haplotype reaches high frequency in the population and then starts to differentiate, producing closely related haplotypes; this is depicted as a cluster [
27]. Data from the 11 microsatellite loci spanning 11.5 kb around
dhfr and nine loci spanning 20 kb around
dhps (as for haplotype characterization) were used to depict genetic relationships in eBURST. Only samples in which multiple infections were not detected by pyrosequencing of
dhfr or
dhps were used for the eBURST analysis. Since eBURST does not allow for missing data, samples with incomplete haplotypes were removed; therefore, there were fewer samples utilized for the eBURST analysis than for haplotype characterization. If multiple alleles were detected at a single microsatellite locus in a sample, the most frequent allele was used, i.e. the one that was present at the highest peak in the electropherogram.
Genetic differentiation between alleles was measured using Wright's F-statistics [
28]. The statistic
F
ST
measures genetic differentiation between populations but, here,
F
ST
was used as a statistic to compare groups of alleles. For
dhfr the microsatellite loci from -10 kb to 1.47 kb and for
dhps the loci from -2.5 kb to 17.5 kb were used for the
F
ST
analysis.
F
ST
calculations were computed using Arlequin ver 3.01 [
29]. The Excel Microsatellite Toolkit was used to format data for Arlequin [
30].
Linkage disequilibrium (LD) between loci along the chromosomes and also between
dhfr and
dhps point mutations was assessed by using an exact test of LD [
31]. Samples with multiple alleles at any locus were removed from the analysis; this was done for
dhfr,
dhps, and the neutral markers independently. Similarly, samples where multiple infections were detected at any site were removed from the LD analysis, testing pairs of point mutations in
dhfr and
dhps; this was done independently for
dhfr and
dhps for a given sample. Only loci or sites that showed polymorphism among the used samples were used for the analysis. Associations were tested between pairs of loci or sites by using 10,000 Monte Carlo steps in Arlequin version 3.01 [
29]. To correct for multiple testing the Bonferroni-Holm correction was used.
Measuring the strength of selection
The strength of selection on
dhfr and
dhps was estimated from the changes in frequency over time of the various mutant alleles at each gene. The strength of selection,
s, of allele A compared with allele B, was defined as
1 + s being the average relative reproductive advantage of A over B [
32]. Hence, if
p
t
and
p
t+T
are the relative frequencies of A at times
t and
t + T, log
1 + s = 1/T ( log
p
t+T
/(1-p
t+T
)- log
p
t
/(1-p
t
)) [
32].
Measurements for the frequency of the advantageous allele A were made at time t, p
t
, at different equally spaced time points (t
k
= k*180 days (k = 0, 1, 2...)) within the six years covered by the samples. The frequency at t
k
was calculated from all samples that were taken between time t
k
and t
k
+ 360 days. Hence, the intervals [t
k
, t
k
+ 360] overlapped (sliding window). The strength of selection was obtained by performing a linear regression of the explanatory variable log p
t
/(1-p
t
), where only those time points t
k
as regressors for which at least three triple and three non-triple mutations occurred were included. The actual strength of selection per generation is derived from the slope of the linear regression divided by the number of malaria generations per year, N
gen
, which was assumed to be N
gen
= 17.3 (i.e., one transmission cycle every three weeks, corresponding to infections throughout the whole year). More precisely, if s is the strength of selection, and α and β are the constant and linear regression coefficients respectively, log p
t
/(1-p
t
) = tN
gen
log(1 + s)- log p
0
/(1-p
0
) = α+β t. Hence, s = exp(β/N
gen
)-1.
Two double mutant dhfr alleles were present in the Kenyan population, and both confer a level of pyrimethamine resistance. It is not clear a priori whether selection for both double mutant alleles is equally strong; therefore, the strength of selection for 51I/108 N allele with that for 59R/108 N was compared. The strength of selection of the triple mutant allele (51I/59R/108 N) was measured over the 51I/108 N and the 59R/108 N double mutants, separately. For these measurements, only samples with single infections at these alleles (as detected by pyrosequencing) were included.
The purpose of estimating these three strengths of selection at dhfr is as follows. If s
1
, s
2
, and s
3
denote the strength of selection of 51I/59R/108 N over 51I/108 N, 51I/108 N over 59R/108 N, and 51I/59R/108 N over 59R/108 N, then the standard haploid selection model yields 1 + s3 = (1 + s2)*(1 + s1).
Dhps single mutants were at relatively low frequency and only the 437 G/540E double mutant was found in single infections. Thus, for dhps, the strength of selection of double mutant alleles (jointly) was measured over all other alleles. To derive the frequencies included were all samples with 437 G/540E single infections, and all samples that did not contain the mixed codon A 437G/K 540E. More precisely, excluded were only those samples for which it was unclear whether they contained the 437 G/540E mutant.
The reduction of
H
e
flanking
dhfr and
dhps was utilized to evaluate whether the estimates for the strengths of selection were meaningful. For this purpose,
H
e
was compared with the analytical prediction
H
e
pred
, given by
H
e
pred
=
H
0
(1-p
0
2r(1-F)/s
). Here,
H
0
is the initial expected heterozygosity,
r denotes the recombination rate,
F the inbreeding adjustment (
F = 1 corresponds to complete inbreeding, and
F = 0 to random mating), and
p
0
is the initial frequency of the 51I/59R/108 N or 51I/108 N allele, or of the 437 G/540E allele. As in [
9‐
14]
r = 5.88*10
-4
Morgans/kb and
p
0
= 10
-4
were used. Also,
F = 0.4, which corresponds to 60-70% mixed clone infections was used. For
H
e
pred
among
dhps 437 G/540E alleles,
H
e
among wildtype alleles was used as an estimate for
H
0
, since it should not be affected by the sweep. For
H
e
pred
among 51I/59R/108 N alleles,
H
e
among double, single mutant and wildtype alleles was used as an estimate for
H
0
. For
H
e
pred
among 51I/108 N double mutants,
H
e
among 59R/108 N double, single mutant and wildtype alleles was used as an estimate for
H
0
.
Conclusions
The three signatures of a selective sweep in a population: altered distribution of polymorphic sites along the chromosome, altered allele frequency spectrum, and an increase in the amount of linkage disequilibrium, are all seen in dhfr and dhps allele populations in western Kenya. The independent origination, genetic differentiation, and maintenance of alleles allude to the fact that rapid, dynamic events in the clinical and ecological settings have given rise to the patterns of resistant mutations we see today. Regardless of the fact that SP is a combination drug therapy, the strength of selection on the two loci is different and the drug by itself does not appear to select for "multidrug"-resistant parasites in areas with high recombination rate. The various estimates for the selective strengths on various mutant alleles, allow for a more complete understanding of the evolutionary dynamics associated with drug-resistance. Thus, the local demographic history (effective population size and recombination rate) needs to be taken into account when investigating the rise of multi-resistant genotypes in Plasmodium populations.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
AMM, KAS, AAE, and VU designed the study and drafted the manuscript. AMM, SMG, and ZZ carried out the molecular genetics studies. KAS carried out the theoretical and statistical analyses. SK, FK, YPS, LS, and AAL participated in the design and coordination of sample collection. All authors read and approved the final manuscript.