Background
Malaria is a major health threat in many parts of the globe and causes high mortality and morbidity, with 212 million cases of malaria occurring globally in 2015, with 429,000 deaths [
1].
Plasmodium knowlesi, a parasite of long- and pig-tailed macaques is now considered as the fifth
Plasmodium species infecting humans and is an emerging threat in most Southeast Asian countries [
2‐
6]. Human infections due to
P. knowlesi are increasingly reported from a number of the Southeast Asian countries, including Malaysia [
4,
7,
8], Singapore [
9], Myanmar [
10], Vietnam [
11], Indonesia [
12], Philippines [
13], Cambodia [
14], India [
15] and Thailand [
16]. In Malaysia, the public health threat posed by the zoonotic malaria parasite
P. knowlesi appears to be growing, with increasing number of human infections being reported from Peninsular Malaysia as well as Malaysian Borneo [
4,
8,
17], which highlights the need of effective measures for control as well as development of effective vaccines.
The parasite is the only human and non-human primate malaria that has a 24-h erythrocytic cycle; rapid increase in parasitaemia has been shown to be associated with the development of severe malaria in humans and is a common cause for severe and fatal malaria in Malaysian Borneo [
3,
18‐
20]. Approximately 70% of malaria cases reported from the Kapit division of Sarawak [
18] and 78% from Kudat, Sabah are due to
P. knowlesi [
8]. Recent
P. knowlesi genomic and microsatellite-based studies from Sarawak, Malaysian Borneo have shown that there are at least 3 sub-populations of the parasite and 2 of the populations were associated with long-tailed (
Macaca fascicularis) and pig-tailed (
Macaca nemestrina) macaques from Malaysian Borneo [
21‐
23]. Analysis of mitochondrial genes in
P. knowlesi isolates from patients and macaques also identified two distinct clusters which clustered geographically to Malaysian mainland and Malaysian Borneo [
24]. These studies have highlighted the complexity of
P. knowlesi infections in humans and challenges for control as well as vaccine design.
Malaria vaccine development is hindered by natural polymorphisms within blood-stage candidate antigens and, therefore, it is critical to determine the pattern of diversity, natural selection and population structure in vaccine candidates and its significance for the acquisition and effectiveness of protective immunity. For example, the protection conferred by the most advanced candidate subunit vaccine is RTS,S/AS01, which targets the
Plasmodium falciparum circumsporozoite protein (PfCSP) was found to be rapidly declining in sub-Saharan Africa [
25]. A recent study showed that polymorphisms within the merozoite invasion genes (normocyte binding protein xa and xb
, nbpxa and
nbpxb) of
P. knowlesi were linked to hyperparasitaemia and disease severity in human infections [
26]. Potential vaccine candidates like Duffy binding protein (DBP), merozoite surface protein (MSP) 1 and 3, normocyte binding protein xa have recently been studied from
P. knowlesi clinical isolates [
27‐
29].
Plasmodium knowlesi is phylogenetically closely related to
Plasmodium vivax [
30]. In
P. vivax, MSP1 is a well-known blood stage antigen which localizes on the merozoite surface and the C-terminal 19 kDa domain is responsible for binding to erythrocytes and antibodies against the C-terminal fragment of MSP1 shows parasite invasion inhibitory properties [
31,
32]. Most of the merozoite surface proteins (e.g., MSP1, MSP2, MSP4, MSP5, MSP8, and MSP10) contain 1 or 2 copies of a conserved epidermal growth factor (EGF)-like domain at the carboxyl terminal that are anchored to the membrane via glycosylphosphatidylinositol (GPI) membrane anchor [
33,
34].
Recently, a novel vaccine candidate
P. vivax merozoite surface protein 1 paralog (PvMSP1P-19; PlasmoDB accession no. PVX_099975) a glycosylphosphatidylinositol (GPI)-anchored protein which is expressed on the merozoite surface during blood-stage development and binds to erythrocytes is reported [
35]. Studies have also reported the PvMSP1-19 to be immunogenic and antigenicity has been reported in
P. vivax-infected patients [
36,
37]. The primary structure of PvMSP1P is similar to PvMSP1 and they contain a putative GPI-anchored motif and 2 epidermal growth factor (EGF)-like domains at the C terminus [
38]. The predicted molecular mass is about 215 kDa, which is similar to that of PvMSP1. These studies have also proved that the protein undergoes proteolytic processing during the invasion process similar to PvMSP1 to produce 83, 30, 38, 33 and 19 kDa domains [
35]. The C-terminal 19 kDa domain which contains the EGF-like domains are mostly conserved among all the MSPs studied to date and thus are speculated to possess conserved binding activity to host erythrocytes. Studies on genetic polymorphism of PvMSP1P C-terminal domain in worldwide isolates indicated low levels of polymorphisms and thus might serve as a potential vaccine candidate [
39]. However, no study has been done to characterize the PkMSP1P which is an ortholog gene.
In this study firstly, the domains of PkMSP1P protein were characterized based on the amino acid sequence alignment to its ortholog PvMSP1P sequence. Then the level of sequence diversity, natural selection using full-length genes at each of the domains (83, 30, 38, 33 and 19 kDa domains) were determined using 34 clinical isolates and 2 laboratory lines (along with the H-strain) of Malaysia. Based on the 19 kDa domain (PkMSP1P-19), the shared haplotypes were identified within 40 isolates and the population structure was determined based on the 42 kDa domain (PkMSP1P-42) from clinical isolates from 4 geographically different regions in Malaysia. The information obtained from this study will be helpful for future rational design and formulation of a vaccine against P. knowlesi, and will aid in the understanding of transmission dynamics of P. knowlesi within Malaysia.
Methods
PkMSP1P sequence data
PkMSP1P sequences were downloaded for 36 clinical isolates originating from Kapit, Betong and Sarikei in Malaysian Borneo, 4 long-time isolated lines from Peninsular Malaysia along with the H-strain (PKNH_0728800) (Additional file
1) [
23]. The sequence data accession numbers are listed in (Additional file
2). Of these, 36 sequences were used for characterization of the full-length PkMSP1P gene (Additional file
2). Signal peptide for the full-length PkMSP1P domain was predicted using Signal IP 3.0 and Phobious prediction software [
40,
41]. The PkMSP1P domains were characterized based on the published ortholog of PvMSP1P (PVX_099975) [
35]. Phylogenetic analysis was conducted using deduced amino acid sequences from 7 PkMSP1P full-length from Malaysian Borneo, 2 laboratory lines from Peninsular Malaysia; reference H-strain (PKNH_0728800) and the Malayan Strain (Pk1A PKNOH_S06430900) along with other ortholog members of
P. vivax Sal-1 (PVX_099975),
P. vivax P01 (PVP010728800),
Plasmodium cynomolgi (PcYB073760) and
Plasmodium ovale curtisi (PoCGH0107037800) using unrooted neighbour-joining (NJ) method also described in MEGA5. Bootstrap replicates of 1000 were used to test the robustness of the trees.
Sequence diversity and natural selection
Sequence diversity (π), defined as the average number of nucleotide differences per site between two sequences within the sequences, was determined by DnaSP v5.10 software [
42]. Number of polymorphic sites, number of synonymous and non-synonymous substitutions, haplotype diversity (Hd), number of haplotypes (h) within the
pkmsp1p sequences were also determined by DnaSP software.
To investigate departure from neutrality, Tajima’s D analysis was conducted [
43]. Under neutrality, Tajimas D is expected to be 0. Significantly, positive Tajima’s D values indicate recent population bottleneck or balancing selection, whereas negative values suggest population expansion or negative selection. The rates of synonymous (dS) and non-synonymous (dN) mutations were estimated and compared by the Z-test (
P < 0.05) in MEGA5 using the Nei and Gojobori’s method with the Jukes and Cantor (JC) correction and 1000 bootstrap replications [
44].
Haplotype network
Haplotype diversity (Hd) and number of haplotypes (H) were determined using DnaSP software. Genealogical relationships between the pkmsp1p-19 haplotypes were constructed using the median-joining method in NETWORK software (version 4.6.1.2, Fluxus Technology Ltd, Suffolk, UK).
Population genetic structure analysis
To define genetic structure of the P. knowlesi parasite population in Malaysia based on the msp1p, STRUCTURE software (version 2.3.4) was used that deploys the Bayesian model based clustering approach. The most probable number of populations (K) was determined using an admixture model. Since the 19 kDa domain is largely conserved in all Plasmodium species, the 42 kDa domain (19 and 33 kDa) was used for population structure analysis. All sample data were run for values K = 1–6, each with a total of 15 iterations, 100,000 Markov Chain Monte Carlo (MCMC) generations for each run after a burn-in of 50,000 steps. The most likely number K in the data was estimated by calculating ΔK values and identifying the K value that maximizes the log probability of data, lnP(D). The most probable K value was then calculated according to Evanno’s method by using the webpage interface STRUCTURE Harvester. The ARLEQUIN software (version 3.5.1.3, University of Berne, Berne, Switzerland) was used to compute pairwise differences (F
ST
) between populations (i.e., Sarikei, Betong, Kapit and Peninsular Malaysia) with 10,100 permutations. F
ST
is a comparison of the sum of genetic variability within and between populations on the basis of the differences in allelic frequencies. F
ST
values are interpreted as no (0), low (> 0–0.05), moderate (0.05–0.15), and high (0.15–0.25) genetic differentiation.
Discussion
The PvMSP1P-19 is a novel vaccine candidate and has low polymorphism in field samples as well as generates protective immune response in patient serum using recombinant expressed proteins [
35,
36,
39]. Thus in the present study the objective was to genetically characterize the
pkmsp1p gene and study the level of genetic diversity, natural selection acting at the full-length PkMSP1P and 19 kDa domain. Sequence alignment of 36 full-length amino acid sequences of
pkmsp1p genes from Malaysia showed that it shares approximately 70% sequence identity with its ortholog
pvmsp1p and sequence identity is high (86%) towards the C-terminal 19 kDa domain. This feature is similar to the other studies on MSPs where the conservation (low levels of polymorphisms) of the EGF domains in 19 kDa domain has been reported [
45]. The genetic diversity of the full-length
pkmsp1p was high (π = 0.00941 ± SD 0.0005) compared to its ortholog
pvmsp1p [
39]. This is probably due to host immune pressure and dimorphism in
P. knowlesi compared to its ortholog in
P. vivax. Despite having 339 SNPs, the number synonymous and nonsynonymous substitutions was not of vast difference. This was because of high number of low frequency polymorphisms (singleton sites) at each domain leading to high haplotype diversity. The 165 singleton variable sites were detected indicated that new and rare variant were present indicating population expansion however, Li and Fu’s D* and F* did not show significant negative values. Overall, the full-length gene showed significant negative selection (− 7.43,
P < 0.000) and similar significant negative selection has been recently reported for invasion genes
pknbpxa,
pkmsp3, and
pkmsp1 [
27‐
29]. The phylogenetic tree showed separation of the
P. knowlesi MSP1P isolates from Malaysian Borneo into 2 clusters while the laboratory lines from Peninsular Malaysia formed a third cluster. Studies on
P. knowlesi proteins such as the DBPαII (PkDBPαII) [
46], PkNBPXa [
29], PkAMA1 [
47] and a genomic study [
23] from Borneo have also reported bifurcation of trees, indicating dimorphism of the genes at the genomic level.
The genetic diversity at PkMSP1P-19 domain was low (π = 0.0061 ± 0.00072) compared to the full-length gene as well as the rest of the N terminal domains indicating conservation of the two EGF domains as reported in its ortholog PvMSP1P [
39]. Interestingly, the 12 cysteine residues within the two EGF domains were conserved in all
P. knowlesi isolates used in this study. Both PkMSP1P and PvMSP1P showed similar conservation of the cysteine residues indicating they might share similar protein structure and function. This is a significant finding as the binding site of PvMSP1P and PkMSP1P are within these EGF domains and antigenicity has been observed in both
P. vivax and
P. knowlesi patient serum samples (Muh et al., unpublished data). Significant negative/purifying selection was observed within the 19 kDa domain indicating functional constraints within the parasite population.
Haplotype network analysis identified 3 predominantly shared
pkmsp1p haplotypes (between Kapit, Betong and Sarikei) but no shared haplotypes with P. Malaysia. This finding is significant as similar geographical sub-population cluster was noticed for recent
P. knowlesi mitochondrial
cox 1,
ssrRNA and
csp genetic study in Malaysia [
24,
48]. Interestingly, in this study some unique haplotypes (H_1, from Peninsular Malaysia) which did not cluster within the predominant haplotypes (Fig.
5) were also noted. H_1 and H_8 from Peninsular Malaysia were distantly apart probably because these laboratory lines were isolated from distant geographical locations. Additional population structure analyses showed moderate genetic differentiation between parasite populations originating from Sarikei and Betong (Fst = 0.14,
P > 0.05) and very high between P. Malaysia and Kapit (Fst = 0.20,
P < 0.05) (Additional file
10). Results were significant with robust Bayesian structure analysis where 4 different sub-populations were identified however, higher sample number are required from Sarikei and Peninsular Malaysia for accurate determination of the population structure. A recent genomic study identified host-specific sub-populations of
P. knowlesi infections and has indicated a recombination event in the sexual stages of the parasite [
49]. These observations might indicate that humans are susceptible to infection by any of the
P. knowlesi populations circulating in these regions.
Authors’ contributions
MAA and ETH designed the study and wrote the manuscript. FM helped in data preparation and analysis. MAA performed genetic analysis. All authors contributed to writing the manuscript. All authors read and approved the final manuscript.