Background
Malaria is a major public health threat throughout the globe and according to the World Malaria Report, 216 million cases of malaria occurred globally in 2016, with nearly a half a million deaths [
1]. The simian malaria parasite
Plasmodium knowlesi is now considered as the fifth
Plasmodium species infecting humans and high number of cases has been reported from most Southeast Asian countries [
2‐
6]. Highest case reports in humans due to
P. knowlesi have been reported from Malaysia [
4,
7,
8], while low number of cases have been reported from most of the Southeast Asian countries like Singapore [
9], Myanmar [
10], Vietnam [
11], Indonesia [
12,
13], Philippines [
14], Cambodia [
15] and Thailand [
16]. Human cases of
P. knowlesi have been on the rise since 2004 and increasing number of cases have been reported from both Peninsular Malaysia and Malaysian Borneo [
4,
8,
17] and very recently from Indonesia [
13,
18], thus highlighting the need for effective control measures and vaccine development. The parasite has a 24-h erythrocytic cycle and rapid increase in parasitaemia were documented to be correlated with severe malaria development in humans, which could be fatal [
3,
19‐
21]. Though human-to-human transmission has not been reported, approximately 70–78% of malaria cases reported from Sarawak and Sabah in Malaysian Borneo are due to
P. knowlesi [
8,
19]. Recently conducted genomic and microsatellite-based investigations on
P. knowlesi from Sarawak, Malaysian Borneo have revealed that there are 3 or more sub-clusters or sub-populations of the parasite which are associated with the two natural hosts; long-tailed (
Macaca fascicularis) and pig-tailed (
Macaca nemestrina) macaques [
22‐
24]. Humans are susceptible to infections through both the associated hosts and some infections are very virulent leading to severe and fatal outcome in some patients [
3,
25]. Evolutionary genes like ssrRNA and mitochondrial genes
cox 1 in
P. knowlesi isolates from patients and macaques also showed two distinct clusters which clustered geographically to Malaysian mainland (Peninsular Malaysia) and Malaysian Borneo [
26].
Extensive sequence diversity observed within candidate antigens has hindered the malaria vaccine development, thus highlighting the necessity for determining the level of polymorphisms, natural selection and population structure of the parasite populations under study. A recent genetic association study on
P. knowlesi invasion genes
nbpxa and
nbpxb (normocyte binding protein xa and xb) showed that some SNPs were strongly associated with high parasitaemia and disease severity in human infections [
25].
Plasmodium knowlesi orthologous antigens of known vaccine candidates such as Duffy binding protein (DBP), merozoite surface protein (MSP) 1, 1P and 3, normocyte binding protein xa have recently been studied from
P. knowlesi clinical isolates [
27‐
30]. Merozoite surface protein 1 (MSP1), a important blood stage antigen which is localized on the merozoite surface, and the C-terminus 19 kDa domain of the antigen has been found to adhere to host erythrocyte and antigenicity against the 19 kDa domain has been observed in patient serum [
31‐
33]. In
P. knowlesi, it is synthesized as a precursor of the 200 kDa protein during asexual stages, and through processing (proteolytic cleavage) produces four polypeptides of approximately 83, 30, 38 and 42 kDa [
34]. During the invasion process, the C-terminal 42 kDa is further processed into two fragments of 33 kDa (MSP-133) and 19 kDa (MSP-119), however, only the 19 kDa fragment remains on the merozoite surface [
35]. From an evolutionary point of view, all MSPs in
Plasmodium falciparum (e.g., MSP1, MSP2, MSP4, MSP5, MSP8, and MSP10) contain an epidermal growth factor (EGF)-like domain in 1 or 2 copies at the carboxyl terminal (19 kDa domain) which is highly conserved among the family and they are attached to the membrane via glycosylphosphatidylinositol (GPI) membrane anchor [
36,
37]. This conservation of the 19 kDa domain and the processing events have been observed in all human malaria species [
34]. The PvMSP1-19 is found to be immunogenic and high antigenicity has been reported from patients infected with
Plasmodium vivax [
38].
Despite the fact that
pkmsp1 being an important immunogenic antigen, very few studies have genetically characterized it from the clinical isolates of Malaysia, especially from Malaysian Borneo where 80% of the natural infections in humans are reported. To date, only 12 isolates (7 from Peninsular Malaysia and 5 from Sabah, Malaysian Borneo) from Malaysia have been genetically characterized at
pkmsp-
142 domain [
27]. Thus, in this study firstly, 11 full-length
pkmsp-
1 sequences from Malaysia were analysed to determine the level of diversity and natural selection at the conserved domains as demarcated by Putaporntip et al. [
39]. In order to determine the intra and inter population diversity and relationship between the
msp alleles from varied geographical isolates,
pkmsp-
142 sequences from Malaysian Borneo (Sarawak and Sabah), Peninsular Malaysia and Thailand were obtained from the database (along with the H-strain). Level of sequence diversity, haplotypes circulating in each region, natural selection, phylogenetic relationships and the overall population structure were determined. Results of the present study may be beneficial for future rational design and formulation of a PkMSP1 based vaccine against
P. knowlesi, in addition to enhancing the current knowledge pertaining to transmission dynamics of
P. knowlesi within Malaysia and Thailand.
Discussion
The PkMSP1-42 has been studied as a novel vaccine candidate and generation of protective immune response from patient serum using recombinant expressed proteins has been reported [
45]. However, very limited clinical isolates have been characterized genetically at this domain to evaluate the polymorphisms at the population level, which is most critical in terms of feasibility of a vaccine candidate. Thus, purpose of the current study was to genetically characterize the
pkmsp1 gene from Malaysia and assess the level of genetic diversity, natural selection acting upon the full-length PkMSP1 and 42 kDa domain. Sequence alignment of 11 full-length sequences of
pkmsp1 genes from Malaysia illustrated that it has extensive polymorphisms across the gene, mostly due to the variable regions II, IV, VI and VIII. Among the conserved domains, the C-terminal domain IX (42 kDa) had the lowest nucleotide diversity, a phenomenon observed in all MSPs specifically in the 19 kDa domain [
27,
30,
39]. Interestingly, all of the conserved domains I, III, V, VII and IX exhibited high haplotype diversity and it is due to the presence of high number of singleton sites low frequency polymorphisms (Si = 107). Presence of high number of low frequency polymorphism was observed in a number of merozoite invasions genes in
P. knowlesi from clinical isolates [
22,
25,
29]. The presence of 107 singleton variable sites detected across the full-length gene revealed that new and rare variants were present, suggesting population expansion but only domains V, VII and IX with negative values for Li and Fu’s D* and F*. However, overall, the full-length gene did not show significant values for Li and Fu’s statistic probably due the presence of hyper variable domains. The negative selection pressure and population expansion observed in each of the conserved domains indicate that the parasite population might be under strong functional constrains.
Inter population diversity indices based on the
Pkmsp1-
42 indicated that irrespective of geographical origin of the parasite populations, the haplotype diversities were of similar range, implying no population wise variations despite the high number of cases in Malaysian Borneo. Moderately higher nucleotide diversity was observed for samples originating from Peninsular Malaysia and Thailand. It is interesting to note that despite the presence of extensive polymorphism and high nucleotide diversity in other domains of the gene, the 42 kDa domain had low diversity in the intra-population level (π = 0.009). Similar low levels of intra-population diversities have been observed for isolates from Thailand [
39] and other apical proteins in
P. knowlesi [
46]. Significant negative/purifying selection was observed within the 42 kDa domain, denoting functional constraints were present within the parasite populations of all the four geographical locations in this study. All statistics like Taj D, Li and Fu’s D* and F* values were negative indicating population expansion and negative natural selection within the 42 kDa domain. Within 76 PkMSP1-42 sequences, only 25 amino acid haplotypes were identified of which highest cluster was from Sarawak, Malaysian Borneo (Hap 6, n = 23) indicating low variations within isolates from Sarawak compared to other regions. Comparison of amino acid and nucleotide haplotypes from each region indicated that almost each population had similar number of samples, i.e. Peninsular Malaysia sample size (n = 11, 9 nucleotide haplotypes vs 9 amino acid haplotypes); for Thailand sample size (n = 23, 14 nucleotide haplotypes vs 13 amino acid haplotypes), and Sabah (n = 5, 5 nucleotide haplotypes vs 3 amino acid haplotypes). However, for Sarawak, there were 32 nucleotide haplotypes vs 10 amino acid haplotypes with a sample size n = 37. This was probably due to higher number of singleton sites in samples from Sarawak indicating population expansion (higher negative values for Li and Fu’s F* and D*). It is interesting to note that the polymorphisms towards the 19 kDa domain was limited to only one site (S178Y) with minor allele frequency > 10%. Also, variations within the 19 kDa domain were mostly observed within isolates originating from Peninsular Malaysia and Thailand. All isolates originating from Malaysian Borneo had conserved 19 kDa domains indicating conserved functional activity.
The median-joining based haplotype network analysis did not show separation of the
P. knowlesi msp1-
42 into two sub-populations as observed for other invasion genes such as
nbpxa, msp1p, dbpII etc. where deep dimorphism was noted due to host associated factors [
22,
25,
30,
47,
48]. Instead, the MSP1 haplotypes revealed geographical clustering, indicating an evolutionary conservation based on sample origin. Similar feature was observed in other evolutionary genes, including but not limited to
PkssrRNA and
Pkmt [
26]. However, one haplotype from Peninsular Malaysia grouped together with haplotypes from Malaysian Borneo, signifying historical common origin which may be attributed to evolution of the parasites and apparent sea level rise during ice age leading to separation [
26]. However, higher number of samples from Peninsular Malaysia and Thailand would be necessary for accurate assessment.
Population differentiation analyses also showed high genetic differentiation between parasite populations originating from Peninsular Malaysia and Malaysian Borneo, which can be attributed to geographical separation of the populations due to the South China Sea. Similarly, high FST values were also observed for parasite populations from Thailand and Malaysian Borneo. However, moderate genetic differentiation was observed for parasite populations from Thailand and Peninsular Malaysia probably because of shared landmass. These observations may suggest human susceptibility to infection with any one of the P. knowlesi populations circulating in these regions. It is also not known if some are more susceptible than others. However, higher number of human and macaque samples from Peninsular Malaysia as well as Thailand would be necessary to accurately ascertain the transmission routes of P. knowlesi.