Background
Plasmodium falciparum is the most virulent of the four species causing malaria and responsible for most malarial deaths. The particular virulence of
P. falciparum is partly due to the ability of infected erythrocytes to adhere to a variety of host receptors and avoid splenic clearance[
1,
2]. Unchecked growth and the accumulation of sequestered parasites in vital organs such as the brain[
3] or placenta[
4] are crucial elements in the pathogenesis of severe malaria[
5]. CD36 is considered to be the major endothelial receptor for infected erythrocytes[
6], but several other ligands have been identified, in particular ICAM-1[
7], which has been associated with cerebral malaria and chondroitin sulfate A (CSA) associated with binding in the placenta and pregnancy-associated malaria (PAM) [
8].
Plasmodium falciparum erythrocyte membrane protein-1 (PfEMP1) is a polymorphic family of high molecular weight adhesion antigens expressed on the surface of infected erythrocytes[
9]. The accumulation of antibodies against a broad repertoire of PfEMP1s is probably the functional basis for the natural acquisition of immunity to malaria [
10‐
13].
PfEMP1 antigens are encoded by the
var gene family in two exon units[
9,
14,
15]. Exon I codes for the extracellular and variable part of the protein as well as a transmembrane region and Exon II encodes the intracellular and relatively conserved acidic terminal segment (ATS). The most variable part of the protein contains a N-terminal segment followed by segments composed of three domain types; Duffy binding-like domains (DBL-domains): Cysteine-rich inter-domain regions (CIDRs) and C2 [
16]. Besides the 59 full-length
var genes found in the newly sequenced genome of
P. falciparum clone 3D7 [
17], the complete domain structures of PfEMP1s are only available in the databases for a handful of
var genes from other
P. falciparum isolates. The extent to which we can extrapolate from the organisation of 3D7
var genes, to the total diversity of PfEMP1 in the diverse global population of
P. falciparum therefore remains somewhat uncertain.
P. falciparum genomes are estimated to contain 50 to 60
var genes. In the case of 3D7 these have been grouped into three major types based on sequence analysis of the intron and 5' and 3' un-translated regions (UTR) [
17‐
19]. In a recent functional study, it was shown that the ability of CIDR domains to bind CD36 could be predicted on the basis of sequence analysis and that binding and non-binding domains fell into two separate CIDR clusters [
20]. The
rif genes constitute another multigene family which has 149 members in the 3D7 genome. They encode proteins (RIFINs) exposed on the surface of infected erythrocytes [
21,
22]. The functions of these proteins are not known and they have not been shown to mediate binding.
With the completion of the 3D7 genome[
17], it has become possible to study a complete PfEMP1 and RIFIN repertoire of a single genome. We have analysed both coding and non-coding regions of 3D7
var and
rif genes and assigned the
var genes into different groups. These groups appear evolutionarily conserved, possibly because selection favours gene segments 'shuffling' within particular groups, but not exchanges between different groups. We speculate that these PfEMP1 groups have arisen as a result of diversifying selection for antigenic divergence being superimposed on strong selective constraints maintaining a particular ligand-receptor binding interaction.
Discussion
The publication of the
P. falciparum genome divided the
var genes in different types according to the domain structure of the encoded proteins [
17]. Other groups have described semi-conserved regions upstream from the translation initiation sites, and grouped
var genes on this basis [
18,
19,
29]. We have synthesised the available information and suggest a somewhat different division of the
var genes into three major groups (A-C) and two intermediate groups (B/A and B/C), which represent transitions between A, B, and C. The genes were grouped according to chromosomal location and transcription direction, domain structure of the encoded proteins, and sequence similarities in coding and non-coding regions.
Group A consists of ten genes consistently identified as a distinct group by sequence analysis. Interestingly, recombinant CIDR domains based on the group A sequences do not bind CD36, by contrast to CIDR domains produced on the basis of groups B and C [
20]. Group A
var genes mainly encode large PfEMP1s with complex multi-domain structure. Nine of the Group A
var genes are flanked by a
rif gene, which is transcribed in the opposite direction. Thus, the 5' regions of the
rif and
var genes merge. The fact that this organisation has been maintained in the 3D7 genome indicates that the DNA between the coding regions constitutes a functional unit, possibly regulating either recombination or transcription. If the latter is the case the genes could be co-regulated and there might be a functional relationship between the encoded PfEMP1s and RIFINs.
The largest
var group in 3D7, group B, comprise 22 genes sharing 5' upsB region. All genes but one are located in the telomeric region. The encoded proteins typically have the characteristic four-domain structure, DBLα-CIDRα-DBLδ-CIDR2. The 13 genes of group C are centromeric. The genes all share 5' upsC region and 12 of them encode proteins with the common four-domain structure. Genes of the B/A and B/C groups have characteristics indicating that they constitute intermediate forms between groups A and B, and groups B and C, respectively. Two genes, which have previously been shown to be present in most
P. falciparum genomes, did not fit into any of the groups. Compared to other
var genes they appear to be unusually conserved [
28,
24,
37]and it has been suggested that they belong to
var gene subfamilies named
var1 and
var2, respectively [
24,
28].
To investigate whether the proposed groupings of 3D7 var genes could be used as a general classification of var genes, the available database sequences from other parasite isolates were analysed. Sufficient sequence data was only available for 11 genes, and with regard to domain structure of the predicted proteins, they were not particularly representative of the PfEMP1 repertoire in 3D7. Analysis of the 5' regions allocated ten of the genes to the upsB 3D7 cluster, and they could therefore be classified as group B or group B/A var genes. Further analysis of sequence and predicted domain structure showed that all the genes shared characteristics with at least one group B 3D7 var gene, and none of them shared characteristics with the 3D7 var genes belonging to group A. The upstream region identified one gene as belonging to group C. This encoded a protein with a domain structure typical of 3D7 group C PfEMP1s. Thus, although the data are limited, analysis of non 3D7 var genes suggested that the proposed nomenclature could be used in a general classification of var genes.
The suggested grouping of var genes is operational and based on best judgement. It is likely that future work will change the classification and move genes between groups, nevertheless we believe that this grouping is helpful as starting point for understanding the evolution of the var gene repertoire and developing hypotheses about their function.
The fact that 5' regions predict var gene chromosomal organisation and domain structure, and sequence similarities in coding and non-coding regions several thousand bases downstream from the translation initiation site implies that recombination, or other mechanisms of homogenizing exchange is much more likely to occur between var genes within a group than between var genes of different groupings. It can be proposed that an original ancestral var gene has been duplicated and diverged in the three main types, and each of these have then diverged into the genes of each group. In this process information may also have been exchanged between genes of different groupings. The data suggests that some exchange have taken place between groups B and C and some characteristics of group A have leaked into these groups, but that characteristics from groups B and C have not gained access to group A. It is tempting to speculate that distinct chromosomal organisation patterns restrict recombination and that the conserved flanking regions serve to align genes of similar group for recombination. The fact that a putative boundary of the upstream sequence could be determined for most var genes may suggest that these sites also serve as splicing sites for insertion of larger gene fragments or whole genes.
Why then are
var genes structured into different groups? By mediating parasite binding to endothelium, PfEMP1 enables the parasite to sequester and avoid filtering through the spleen. Thus, parasites expressing PfEMP1, which are most effective in sequestering infected erythrocytes, will obtain the highest growth rates. How effective a given PfEMP1 is in binding in a particular host will depend on the binding characteristics of the PfEMP1, on the ligands that are available in the host [
38], and the anti-PfEMP1 antibody repertoire in the infected individual [
11,
39,
40]. Parasites causing severe malaria express phenotypes that are more often recognised by antibodies in children's plasma than the phenotypes expressed by parasites causing uncomplicated disease [
41];[
40]. The phenotypes associated with severe disease also tend to be serologically cross-reactive (Nielsen
et al., in preparation). Given that immunity to severe malaria is developed relatively early in life, it is possible to speculate that the most severe forms of malaria are caused by fast growing parasites expressing PfEMP1s optimized to mediate a very effective binding in non immune hosts. To maintain effective binding these PfEMP1 types are probably functionally constrained, and consequently have tight limits to the degree to which they can vary. The fact that recombination within
var genes of group A appear to be the most constrained, suggests that the PfEMP1s associated with severe malaria will be encoded by group A
var genes. This hypothesis is in agreement with findings from China indicating that parasites from individuals suffering from cerebral malaria compared with cases of non-severe malaria expressed high molecular weight PfEMP1s [
42] and a study from Brazil where expression of DBLα domains lacking 1–2 cysteine residues in DBLα homology block G were mainly found among severe malaria cases [
43]. In 3D7 this is a feature of all genes of group DBLα-CIDR1 group A (
var gene group A).
In most endemic settings transmission does not occur continuously, but is highly seasonal and in some areas restricted to a few months of every year [
44]. In such a situation the ability to establish chronic infections is important for parasite survival and transmission. Chronic human malaria infections are associated with 'shift' in PfEMP1 expression [
45] and it has been proposed that such shifts are driven by antibody forcing parasites to express PfEMP-1 molecules which are less optimal for adhesion, but not recognised by cross reactive antibodies. It is possible to speculate that PfEMP1s of groups B and C could serve this function.
In areas of high malaria endemicity, women who have acquired malaria immunity during childhood become susceptible to malaria during their first pregnancies [
46] and are infected by parasites expressing antigens that mediate binding to CSA in placenta [
8]. Parasites of this phenotype can apparently only expand and establish infection in individuals carrying a placenta and these parasites do not cross-react serologically with non-placental parasites [
38]. It has been recently reported that PFL0030c is the dominant
var gene transcribed in parasites selected for CSA binding and that most parasite genomes carry very similar genes, the
var2 family [
28], Interestingly, the
var2 upstream region (upsE) is markedly different from the 5' regions of the other
var genes and appears to be conserved. The upstream upsE region of
var2 is also the only such region containing an ORF. Upstream ORFs are uncommon in known genomes, and primarily described in association with genes that are under tight translational control, such as oncogenes and genes involved in cellular differentiation (reviewed by Kozak, 2002). The function of the uORF 5' of
var2 remains unclear.
Authors' contributions
TL collected the sequences and performed the cluster analysis. AS did the laboratory experiments on the var2 uORF. All authors participated in the analysis and interpretation of data. TL produced the first draft. All authors contributed to writing the manuscript.