A study of genetic diversity in the gene encoding the circumsporozoite protein (CSP) of Plasmodium falciparum from different transmission areas—XVI. Asembo Bay Cohort Project
Introduction
The studies conducted on the genetic diversity of malaria parasites could be categorized into two major groups: (a) those directed to understand the population structure, specifically, the extent and role of recombination in natural parasite populations [1], [2]; and (b) studies assessing the diversity of specific genes encoding antigens [3], [4]. In addition to yielding information on evolutionary genetics, these studies answer key questions related to the development and predictive effectiveness of intervention strategies for malaria control programs.
Studies of population structure provide an overall picture of the origin, dispersal and stability of multi-locus genotypes. This information is essential in predicting effectiveness of intervention strategies, such as usefulness of drugs and the emergence multi-drug resistant strains [5], long-term impact of physical interventions such as bed nets, and the origin of genotypes that may elude the immune response elicited by vaccines.
Studies of the diversity of specific genes, on the other hand, provide information about how alleles are generated and maintained in the population. Specifically it addresses the relevance of factors like intragenic recombination and natural selection. It can also assess geographic differentiation to explain the spatial distribution of alleles at a given loci. This approach answers fundamental questions related with the association of specific alleles with drug resistance [6], and how the genetic variation may affect vaccine development [7] and deployment.
As part of our program to investigate the genetic diversity of human malaria parasites, with a particular focus on genes encoding vaccine antigens, we have conducted a detailed study of the diversity of the Circumsporozoite protein (CS). The CS is the predominant protein found on the surface of the CS; it has approximately 420 residues and a molecular weight of 58 kDa. The CSP can be subdivided into two non-repetitive regions (5′ and 3′ ends) and a variable central region consisting of multiple repeats of four-residues-long motifs [8]. There is substantial point mutation polymorphism in the 3′ region of the protein where T-cell epitopes have been identified [9], [10]; this polymorphism has been explained as consequence of positive natural selection by the host immune system [4], [11].
Polymorphism in the CS protein is also observed in the number of tandem repeats in the central region. The extensive diversity in the tandem repeat region is even more evident when the nucleotide sequences of the tandem repeats, or so called repeat allotypes (RATs sensu [12]), are taken into account. It has been proposed that these repeats may be generated by sexual intragenic recombination [8]; however, slipped-strand mismatch repair during mitosis has been suggested [12]. Molecular epidemiologists have used these polymorphisms in malarial antigens as genetic markers even when the available data suggest that the distribution of alleles is maintained by positive natural selection [13].
We report a comprehensive study on the genetic diversity of the gene encoding the CS protein using complete and partial sequences from field isolates from Kenya, India, Cameroon, and Venezuela. We find that African isolates are more polymorphic as compared with parasites from other geographic regions. We conclude that the uneven geographic polymorphism may have an adverse impact on the effectiveness of CS-based vaccines. We explore the linkage and recombination events among the polymorphic sites. We find that putative recombination events overlap with linked sites. We discuss how this pattern is explained by the action of positive natural selection, where the recombination events detected are convergent mutations. In order to explore how the protein structure may impose restrictions in the number of repeats polymorphisms, we have simulated the stability of the structure of the tandem repeat region. Our analysis suggests that the protein structure may play an important role in the observed polymorphism in the number of CS repeats in Plasmodium falciparum.
Section snippets
Material and methods
A total of 48 complete new sequences are reported: 18 from Western Kenya, 11 from India (isolates collected in Delhi, Jabalpur, and Baroda), 10 from Venezuela (Bolivar and Amazonas states), and 9 from Cameroon (Yaounde). The sequences are identified with the accession numbers AF540441–AF540488. In the same analysis, we included 15 complete sequences previously reported from field isolates, collected in Thailand [14], as well as, complete sequences from laboratory adapted isolates: T9/94, T9-98,
Statistical analysis
The genetic polymorphism in complete sequences in the 5′ and 3′ ends was estimated by the statistics π, which is the average number of substitutions between any two sequences [23]. The polymorphism at the Th2R and Th3R epitopes was explored separately by using the statistics π and haplotype diversity [23].
Evidence of positive natural selection was explored by comparing the rate of synonymous and nonsynonymous substitutions [24]. The numbers of synonymous and nonsynonymous substitutions was
Structure calculations for the number of repeats in the central region of the CS
The effect of the protein structure on the number of repeats observed in the CS protein. A series of simulations were performed in order to test if the proposed type-I β-turn structure is stable [32] and if it can explain the range in the observed number of repeats. We simulated peptides with different lengths based on the tetrapeptide Asn–Ala–Asn–Pro (NANP)n, where ‘n’ is the number of units from 1 to 60. The calculations were conducted with a cut-off distance for non-bonded van der Waals
Results
The alignment of complete sequences includes only the 5′ and 3′ ends since the repetitive region could not be accurately aligned. A qualitative description of the aligned complete sequences is as follow: There are 44 segregating sites in the sample of 75 complete sequences. We found two indels previously reported [17], [18], a 30 bp insertion and a 57 bp deletion. The insertion of 30 bp is found in the laboratory isolates T9/94, Wellcome and a field isolate from India. It is found in the
Discussion
Investigating the extent of the genetic polymorphism in malarial antigens and the processes that maintain the observed diversity is matter of epidemiological importance. Previous studies have addressed the diversity and maintenance of malarial vaccine candidates [3], [4].
Epidemiologic and immunologic studies suggest that the onset of natural immunity is affected by the complexity of infection [37], which correlates with the overall genetic diversity of the parasite population locally
Acknowledgements
This research is supported in part by grants from the ‘Consejo Venezolano de Investigaciones Cientı́ficas’, (G97000634) and The National Institutes of Health (R01 GM60740-01) to A.A. Escalante. This work was supported in part by the US Agency for International Development grant HRN-60010-A-00-4010-00 to A.A. Lal. R. Isea is supported by funds from Camara Venezolana de Fabricantes de Cerveza (CAVEFACE).
References (45)
- et al.
Current views on the population structure of Plasmodium falciparum: implications for control
Parasitol. Today
(1997) - et al.
Natural selection on Plasmodium surface proteins
Mol. Biochem. Parasitol.
(1995) - et al.
Wild isolates of Plasmodium falciparum show extensive polymorphism in T cell epitopes of the circumsporozoite protein
Mol. Biochem. Parasitol.
(1989) - et al.
Field studies of cytotoxic T lymphocytes in malaria infections: implications for malaria vaccine development
Parasitol. Today
(2000) Natural selection on polymorphic malaria antigens and the search for a vaccine
Parasitol. Today
(1997)- et al.
Strain variation in the circumsporozoite protein gene of Plasmodium falciparum
Mol. Biochem. Parasitol.
(1987) Clonal variation in the Plasmodium falciparum circumsporozoite protein gene
Mol. Biochem. Parasitol.
(1991)- et al.
High-throughput sequence typing of T-cell epitope polymorphisms in Plasmodium falciparum circumsporozoite protein
Mol. Biochem. Parasitol.
(2000) - et al.
Conservation and heterogeneity of the glutamate-rich protein (GLURP) among field isolates and laboratory lines of Plasmodium falciparum
Mol. Biochem. Parasitol.
(2000) - et al.
Evolution of protein molecules