Background
Plasmodium vivax is globally the most widely distributed
Plasmodium species that infects humans, being common in tropical and sub-tropical areas outside of Africa [
1,
2]. Several factors have highlighted the clinical importance of malaria caused by
P. vivax, such as the spread of parasite resistance to available drugs [
3]. In addition, the concept of vivax malaria as a benign disease has evolved with the description of severe cases and even deaths [
4‐
6]. Finally, dormant forms of the parasite in the liver, i.e., hypnozoites, act as a reservoir for the disease and have hindered the control of malaria caused by
P. vivax [
7]. These factors have all increased interest in vivax malaria, primarily in the new Malaria Eradication Research Agenda (malEra) [
8].
Plasmodium vivax infections are often characterized by the presence of two or more genetically distinct parasites in the same individual [
9‐
11]. These infections are very common in malaria-endemic areas worldwide [
12‐
15] and can arise from a single mosquito bite carrying a mixture of parasites or from inoculation by different mosquitoes carrying single clones. Additionally, relapses of
P. vivax infection due the reactivation of hypnozoites can contribute to increased clonal diversity. As a result, the association between the multiplicity of
P. vivax infection and malaria endemicity is weak, with areas of low endemicity sometimes featuring high rates of multiple infections [
14‐
17]. The number of parasite clones in a patient varies greatly, and some infections contain up to nine clones [
12]. Characterizing the multiplicity of infection has broad implications ranging from population genetic studies of the parasite to malaria treatment and control. First, evolutionary and population genetic studies rely on accurate parasite genotype/haplotype inference, which is non-trivial when more than one clone is present and clones differ at examined loci [
18,
19]. Second, characterizing the within-host diversity is essential to address several issues, such as differentiation between new infection and recrudescence, in order to better estimate the true risk of treatment failure and explore the dynamics of clones influenced by host immunity during anti-malarial treatment or challenge with vaccine [
12,
20,
21]. Third, malaria patients infected by multiple parasite strains have been shown to be at a higher risk of treatment failure [
22]. Thus, a broad understanding of the genetic diversity of parasite populations can contribute to the definition of control measures, including an appropriate anti-malarial treatment.
The publication of the complete genome sequence of
P. vivax has led to the discovery of many molecular markers, such as microsatellites, tandem repeats and single nucleotide polymorphisms (SNPs) [
23]. These markers have proven useful for population genetic studies and for the characterization of the multiplicity of
P. vivax infections. However, many studies have shown that the characterization of multi-clonal infections depends on both the accuracy of the genotyping method, and the type and number of the molecular markers analysed [
24,
25]. Thus, the use of different approaches may significantly affect the ability to detect multi-clonal
P. vivax infections and may hinder comparability among studies [
26,
27]. Furthermore, the method used may influence the estimation of the relative abundance of clones in multiple infections.
This study evaluated and compared the ability of different molecular markers—two microsatellites, one tandem repeat and three antigen-coding genes—to estimate the number and the relative abundance of alleles present in multi-clonal
P. vivax infections. In order to simulate multiple-clone infections with well-defined proportions of different parasite genotypes, cloned PCR products or patient-derived genomic DNA were artificially mixed. In addition, the performance of these markers was also evaluated by genotyping
P. vivax isolates that had infected patients from the Brazilian Amazon. The PCR-capillary electrophoresis-based method (PCR-CE), which offers several advantages, such as high resolution (1 bp), reproducibility in determining fragment size, and a cost-benefit for the analysis of a large number of field samples, was used to genotype all markers [
28]. Although this method of quantification is subject to some limitations, many studies have shown that the peak heights correspond to the actual relative proportions of clones in an infection when data are properly normalized [
24,
29,
30]. The ability to identify less abundant clones depends on the criteria applied to differentiate minor peak from artifacts, which allows the multiplicity of infection to be properly defined. Two criteria are commonly used to score multiple alleles per locus: cut-off values for minor peak detection of (1) one-fourth or (2) one-third the height of the predominant peak [
9,
17,
31‐
34]. Although the one-third criterion is the most widely used, few studies have evaluated the sensitivity of these criteria for the detection of multi-clonal infections [
24]. Here, the results showed the necessity to apply the less stringent one-fourth criterion to increase the detection of multiple-clone infections. Specifically, a minimum panel of four markers was defined to characterize the multiplicity of a
P. vivax infection.
This is the first study to show that depending on the type of marker used for P. vivax analysis, a considerable amplification bias is observed. This relationship may have serious implications for the characterization of the complexity of an infection. Moreover, these findings were facilitated by the use of parasite DNA samples with well-defined proportions of each genotype in artificial mixtures as well as the use of molecular markers with different features, such as neutral and non-neutral markers.
Discussion
Numerous studies have revealed substantial clonal diversity of
Plasmodium within its vertebrate hosts. The characterization of this diversity can influence treatment outcome and elucidate within-host dynamics that may be shaped by several factors, such as host immunity, density-dependent control mechanisms and drug treatment [
12,
21,
37]. Furthermore, estimates of within-host variability are relevant to correctly infer evolutionary and population genetics parameters, e.g., selection and recombination [
18,
19]. To infer the multiplicity of infection of
P. vivax, a panel of suitable molecular markers was defined herein, which included microsatellites, a tandem repeat and antigen-coding genes.
The analysis of artificial DNA mixtures with well-defined proportions of cloned products showed that the commonly used
msp1 antigen-coding marker and the tandem repeat
MN21 allowed for the estimation of the expected ratio of both alleles in the majority of preparations when using normalized data. Conversely, the microsatellite markers were sensitive to the decreased relative abundance of alleles but, in addition to
msp3α, did not accurately estimate the relative clonal proportions in artificial mixtures. For example, a preferential amplification of the shortest allele (with fewest repeats) for
PvMS6 was consistently observed in tested dilutions. Accordingly, the preferential amplification of alleles of differing length was previously reported for other microsatellite and antigen-coding loci [
38‐
40]. As indicated by Walsh et al. [
32], the extent of preferential amplification is related to the size difference between the allelic PCR products, which was significantly greater for
PvMS6 (102 bp). Although not assessed in this study, several other conditions may lead to preferential PCR amplification, such as significant differences in the GC content between alleles, stochastic fluctuation in the presence of low amounts of target DNA molecules [
39], and reduced amplification efficiency due to sequence polymorphism in the primer-binding site [
41]. Finally, the capillary-based instrument itself may introduce errors in measured relative density of the PCR product for each allele. However, instrument-based errors are unlikely in this study because the results presented here indicate that the method is reproducible. Specifically, replicate experiments yielded similar results, as also reported in other studies [
29].
Although some markers, such as the antigen-coding marker
msp1 and the tandem repeat, provide an indication of the actual relative proportions of clones in artificial infections, the quantification yielded by the method described herein is subjected to limitations. First, traditional end-point PCR may lead to bias in the template-to-product ratios of target sequences amplified during PCR, particularly for nested PCR, due to increasing numbers of PCR cycles [
42,
43]. Thus, the amplification bias observed in the 1:1 mixtures and other dilutions of microsatellites and
msp3α may be the result of reaction saturation (plateau phase). At later cycles, the efficiency of PCR eventually declines for a number of reasons, including the exhaustion of reagents and the enzyme, the accumulation of inhibitors and the rehybridization of PCR products, which may interfere with primer binding or extension [
43‐
45]. As a consequence, templates that reach inhibitory concentrations essentially stop amplifying while others continue to efficiently undergo amplification. In addition, the results shown here and by others indicate that accurate quantification requires normalization using a baseline mixture of known proportion to calibrate samples to be analysed [
24,
29,
30]. Such normalization restricts the use of this method for experimental infection models. To circumvent these limitations, quantitative PCR (qPCR) or next-generation sequencing (NGS) may be applied to more reliably estimate the relative abundance of clones in an infection. Whereas qPCR is restricted to known single-species infections, like the method described here [
40], the applications of NGS are much broader. As recently published for
P. falciparum and
P. vivax, NGS robustly represents clonal multiplicity and is very promising for drug-resistance or population genetics studies [
46‐
49]. Nevertheless, genomic-level studies remain infeasible in many settings.
Many criteria have been used to differentiate rare alleles from artifacts, such as stutter (peaks that result from DNA strand slippage during PCR at intervals corresponding to nucleotide repeat sizes) or non-specific peaks [
15,
16,
31]. Because the sensitivity and specificity of these criteria can vary, comparing studies that applied different criteria is problematic. The present study sought to compare the rates at which multi-clonal infections were detected using the two criteria that are frequently applied to score multiple alleles per locus, i.e., minor peaks ≥33 % or ≥25 % of the height of the predominant peak. Overall, the one-fourth criterion allowed the detection of rare alleles in more unbalanced mixtures and the detection of significantly more alleles, especially for microsatellite markers. The results clearly show that when the one-third criterion is applied for microsatellites, multi-clonal infections may be underestimated, even when the clones were present in similar proportions. This finding was not surprising because several technical difficulties have been described related to the scoring of microsatellite alleles, such as the preferential amplification of alleles with fewer repeats and the higher probability of failing to detect the less abundant alleles [
24,
30].
This study examined six loci were of differing molecular features that have been widely used in studies of population genetics and the molecular epidemiology of vivax malaria [
9,
12,
15,
35]. These loci included selectively neutral and non-neutral markers on different chromosomes and markers differing in mutation rate (microsatellites, a tandem repeat and an antigen-encoding gene). Thus, most of the variability of
P. vivax populations should have been captured, and the within-host diversity should have been effectively characterized in different epidemiological settings. In these areas, several factors may modulate the genetic variability of a locus and/or of complete genomes, such as the malaria transmission rates, selective constraints imposed by the host’s immunity and anti-malarial drug use, and the historical-demographic processes of the parasite population. Compiling the available data on the genetic diversity of
P. vivax from different geographic regions higher estimates of diversity were found for antigen-encoding loci and less variable estimates for microsatellites and tandem repeat loci. When samples from the Brazilian Amazon were genotyped for all six markers,
msp1B10, microsatellite markers and the tandem repeat
MN21 yielded similar results. Interestingly, a significant difference was observed between blocks 2 and 10 of
msp1. Whereas
msp1B10 was able to detect almost 40 % of the 36 multi-clonal infections identified,
msp1B2 allowed the detection of only 6 % of these infections. Moreover,
msp3α also showed poor performance in the detection of multiple-clone infections from field samples. Because these loci encode antigens exposed to the immune system, these results suggest that the patterns observed could reflect the different regimes of immune selection that these antigens are exposed to in this population. Notably, 50 % of the multiple-clone infections were detected by only one marker, indicating that adding additional loci may further increase the probability of detecting these infections.
Determining whether a sample is mono-clonal or contains multiple clones of the parasite requires the careful selection of markers because most randomly selected markers may not faithfully depict the true complexity of an infection. Based on the performances of the six markers used here to characterize parasite diversity in both artificial mixtures of clones and in field samples, the combination, including PvMS6, PvMS7, MN21, and msp1B10 were selected for use in molecular epidemiology studies. These markers comprise a panel containing highly informative loci that exhibit genetic variability in all examined P. vivax populations as well as loci that are less sensitive to PCR amplification biases and were able to better characterize the multiplicity of infection. In this study, a PCR-CE was proposed for allele scoring, which has the advantage of allowing the accurate and fast genotyping of a large number of field samples at relatively low cost. Furthermore, a less stringent criterion for rare allele identification significantly increases the sensitivity of this method. Thus, a minimum of one-fourth should be used as a cut-off value for minor peak detection. Although the results described apply to a small subset of alleles and field samples likely contain several unknown alleles, they reveal information about amplification bias, which can be used to identify markers and conditions under which such biases are minimized. Field samples from the Brazilian Amazon region, an area of low and unstable malaria transmission, were used to define the proposed panel. Nevertheless, further characterization is required, especially in regions of high endemicity, where this panel would also provide information about multi-clonal infections due to the high genetic variability of parasite population in such areas.