Introduction

Adaptive immunity in which the lymphocyte receptors and the major histocompatibility complex (MHC) molecules play a central role in the recognition of foreign molecules, is specific to the jawed vertebrates, suggesting that its evolutionary origin was around 600 million years ago (MYA) (Flajnik and Kasahara 2001). The origin of innate immunity is believed to be even more ancient, although innate immunity is composed of many heterogeneous systems and in most cases their evolutionary origins are poorly defined. The complement system, one of the most sophisticated innate immune systems of mammals (Volanakis 1998; Walport 2001a,b), was studied intensively from an evolutionary viewpoint because researchers are keen to know how such a sophisticated biological reaction system was established. More than 30 years ago, phylogenetic studies of the complement system were performed, mainly using hemolytic activity in body fluids as the criterion for the presence of the complement system. These studies identified complement-like hemolytic activities, not only from various vertebrates but also from invertebrates (Gigli and Austen 1971). The intrinsic problem with this approach, however, was that it was difficult to discriminate hemolytic activity due to the complement system from hemolytic activities due to other factors. For example, the complement-like activity once reported from arthropod hemolymph, which can be rendered hemolytic after activation by cobra venom factor (Day et al. 1970), turned out to be lecithin that was converted to lysolecithin by phospholipase A present in the cobra venom factor preparations (Hall et al. 1972). Phylogenetic studies of the complement system were then performed mainly at the protein level (Nonaka et al. 1981), leading to identification of complement components from most classes of vertebrates, including agnatha (Nonaka et al. 1984). At that time, it was generally believed that the complement system was a unique property of the vertebrates because all attempts to identify complement components from invertebrates had failed. Over the past 10 years, DNA-level analysis, including genome analysis and EST analysis, has significantly extended our knowledge about the evolutionary origin of the complement system. The initial phase of DNA level analysis revealed the presence of the complement genes in invertebrate deuterostomes such as sea urchins (Al-Sharif et al. 1998) and ascidians (Ji et al. 1997). In contrast, no complement gene was found in the genomes of Drosophila melanogaster (Adams et al. 2000) or Caenorhabditis elegans (The C. elegans Sequencing Consortium 1998), suggesting that the complement system was established in the deuterostome lineage. However, recent reports on the horseshoe crab C3, factor B (Bf) (Zhu et al. 2005), and coral C3 (Dishaw et al. 2005) and a sea anemone genome analysis indicate that the complement system is of a much more ancient origin. In this review, we examine a current assessment of the evolution of the complement system revealed mainly by the genome and by other DNA-level analyses.

Phylogeny of animals

As molecular research proceeds, the evolutionary origin of the complement system was revealed to be increasingly ancient. Hence, it is necessary to understand a wider range of animal phylogeny to follow the evolutionary process of the complement system. The current view of animal phylogeny and estimated divergence times among major animal groups based on the recent molecular clock analyses (Blair and Hedges 2005a,b; Hedges et al. 2004) is summarized in Fig. 1. As shown in this figure, molecular data suggest that eumetazoa diverged into Cnidaria and Bilateralia about 1,300 MYA. At approximately 1,000 MYA, Bilateralia then diverged into Deuterostomia and Protostomia, and the latter diverged further into Ecdysozoa and Lophotrochozoa. In the Deuterostomia lineage, Chordata diverged from Echinodermata/Hemichordata around 900 MYA. Among three Chordata subphyla, Cephalochordata first diverged 890 MYA, and Urochordata and Vertebrata diverged 790 MYA. From the main Vertebrata lineage, Cyclostomata diverged 650 MYA and Chondrichthyes diverged 530 MYA. This phylogenetic tree, however, is still not conclusive; a recent report has suggested a close relationship between Cephalochordata and Echinodermata (Delsuc et al. 2006). The adaptive immunity based on lymphocytes and MHC is present in Chondrichthyes and other jawed vertebrates, but not in Cyclostomata. Thus, adaptive immunity most probably appeared between 530 and 650 MYA.

Fig. 1
figure 1

Phylogenetic relationship among animals. Phylogenetic relationship among multicellular animals elucidated by molecular clock methods based on protein sequence data is shown. Only animal groups relevant to this review are included. The divergence times for the Arthropod/Nematoda or Mollusca/Annelida were not analyzed by this method and are arbitrarily shown in this figure

Presence and absence of the complement genes in various animal genomes

To trace the evolution of the complement system, we searched the genome data of chicken (Gallus gallus, http://www.ncbi.nlm.nih.gov/genome/guide/chicken/), clawed frog (Xenopus tropicalis, http://genome.jgi-psf.org/Xentr4/Xentr4.home.html), pufferfish (Takifugu rubripes, http://genome.jgi-psf.org/Takru4/Takru4.home.html), and sea anemone (Nematostella vectensis, http://www.stellabase.org/) for the presence of the complement genes. Because five complement gene families, C3/C4/C5, Bf/C2, MASP/C1r/s, C6/C7/C8A/C8B/C9, and Factor I (I), have a unique domain combination found only among complement genes in the human genome, identification was carried out based merely on the predicted domain structures. For other complement genes, however, the same domain combination is also found in noncomplement genes. In these cases, phylogenetic tree analysis was performed to confirm the orthologous relationship between the possible complement genes of various animals and their mammalian counterparts. Figure 2 summarizes the current status of the presence/absence of the complement genes judged by these searches, the published results, and our unpublished experimental data.

Fig. 2
figure 2

Presence or absence of complement component genes in various animal groups. All complement components and related genes of human, as a representative of Mammalia, are shown, and the presence of the orthologous genes reported from the other animal groups are indicated by the reference numbers. Plus and minus indicate the presence and absence, respectively, of the orthologous genes in the assembled genome sequences of at least one representative species of each group. Genes located outside of the complement gene clusters in the phylogenetic tree, showing an uncertain orthologous relationship with complement genes, are indicated in red. Literatures cited here are: 1 Mavroidis et al. 1995; 2 Fritzinger et al. 1992; 3 Kaufman et al. 1999; 4 Kjalke et al. 1993; 5 Laursen et al. 1998; 6 Lynch et al. 2005; 7 Oshiumi et al. 2005; 8 Mahon et al. 1999; 9 Grossberger et al. 1989; 10 Mo et al. 1996; 11 Kato et al. 1995; 12 Kato et al. 1994; 13 Endo et al. 1998 and Kakinuma et al. 2003; 14 Endo et al. 1998; 15 Kunnath-Muglia et al. 1993; 16 Boshra et al. 2005; 17 Abelseth et al. 2003; 18 Samonte et al. 2002; 19 Zarkadis et al. 2001; 20 Nakao et al. 2000; 21 Kuroda et al. 2000; 22 Sato et al. 1999; 23 Sunyer et al. 1997b; 24 Sunyer et al. 1997a; 25 Sunyer et al. 1996; 26 Lambris et al. 1993; 27 Boshra et al. 2004a; 28 Wang and Secombes 2003; 29 Sambrook et al. 2003; 30 Kato et al. 2003; 31 Franchini et al. 2001; 32 Nakao et al. 2002; 33 Sunyer et al. 1998; 34 Nakao et al. 1998; 35 Gongora et al. 1998; 36 Seeger et al. 1996; 37 Kuroda et al. 1996; 38 Yano and Nakao 1994; 39 Vitved et al. 2000; 40 Nakao et al. 2001; 41 Chondrou et al. 2006; 42 Zarkadis et al. 2005; 43 Papanastasiou and Zarkadis 2005; 44 Uemura et al. 1996; 45 Katagiri et al. 1999; 46 Kazantzi et al. 2003; 47 Yeo et al. 1997; 48 Tomlinson et al. 1993; 49 Nakao et al. 2003a; 50 Kemper et al. 1998; 51 Boshra et al. 2005; 52 Boshra et al. 2004b; 53 Fujiki et al. 2003; 54 Dodds et al. 1998; 55 Terado et al. 2003; 56 Smith 1998; 57 Terado et al. 2002; 58 Ishiguro et al. 1992; 59 Nonaka et al. 1984 and Nonaka and Takahashi 1992; 60 Nonaka et al. 1994; 61 Matsushita et al. 2004; 62 Takahashi et al. 2006; 63 Song et al. 2005; 64 Kimura et al. 2004; 65 dos Remedios et al. 1999; 66 Suzuki et al. 2002; 67 Endo et al. 2003; 68 Raftos et al. 2002; 69 Marino et al. 2002; 70 Nonaka et al. 1999; 71 Yoshizaki et al. 2005; 72 Azumi et al. 2003; 73 Dehal et al. 2002; 74 Kenjo et al. 2001; 75 Sekine et al. 2001; 76 Ji et al. 1997; 77 Miyazawa and Nonaka 2004; 78 Miyazawa et al. 2001; 79 Al-Sharif et al. 1998; 80 Smith et al. 1998; 81 Zhu et al. 2005; 82 Adams et al. 2000; 83 The C. elegans Sequencing Consortium 1998; 84 Dishaw et al. 2005; *1 H. Nagumo et al., unpublished data; and *2 A. Kimura and M. Nonaka, unpublished data

Mammals, Aves, Amphibia, and Teleostei seem to have the full set of complement genes except for factor D and the regulators of complement activation (RCAs) family members and additional sporadic absences such as chicken C2 and C9, properdin, amphibian C1 inhibitor, and teleost MASP-1 and MASP-3. Although these sporadic absences are most probably due to secondary loss in each animal lineage, the absence of factor D and RCA may reflect technical problems in identifying them. Factor D has only a serine protease domain (Volanakis and Arlaud 1998), and its domain structure seems to be too simple to be used for identification. Thus, it is possible that the D gene is present in the chicken and clawed frog genomes, but is overlooked by present analysis. Similarly, all RCA members have a simple domain structure composed of repeats of a single domain termed short consensus repeat (SCR) (Hourcade et al. 1989). Because the primary structures of the RCA member SCRs are poorly conserved even among mammalian species and many noncomplement genes are also composed of SCRs, it is difficult to identify the RCA genes on the basis of their domain structure. Therefore, it is highly probable that these genes are present but not identifiable in some animal genomes. Thus, most of the gene duplications that played a significant role in establishing the modern complement system of higher vertebrates seem to have occurred before the divergence of teleosts and tetrapods, which is estimated to have been about 500 MYA.

The lack of genome-wide information in Chondrichthyes and Agnatha makes it difficult to evaluate the evolutionary stage of their complement systems. However, earlier functional analysis of the shark and lamprey complement systems indicated that the former possesses hemolytic activity, while the latter lacks it (Jensen et al. 1981; Nonaka et al. 1984). Later reports on their individual complement component genes supported the idea that the complement system of Chondrichthyes may be similar to that of higher vertebrates, whereas the complement system of Agnatha shows some crucial differences from it. Thus, the C3/C4/C5 and Bf/C2 gene duplications seem to have occurred in the jawed vertebrate lineage after the divergence of Agnatha. Moreover, not only gene duplications among C6, C7, C8A, C8B, and C9 but also the presence of any of them were not identified in Agnatha (A. Kimura and M. Nonaka, unpublished data). Therefore, the vertebrate complement system seems to have experienced a drastic change after the divergence of Agnatha but before the divergence of Chondrichthyes. Although this point is still to be confirmed by a genome analysis of species of these groups, it is possible that the drastic changes in the complement system occurred simultaneously with the appearance of adaptive immunity.

The urochordate genome analysis performed on Ciona intestinalis has demonstrated that most complement gene families are present in Urochordata and many of them have multiple members (Azumi et al. 2003). However, these multiple members do not show a one-to-one orthologous relationship with members of the same gene family in higher vertebrates, indicating that the gene duplications among members of each gene family occurred independently in Urochordata and Vertebrata. No complement gene sequence from Hemichordata was reported and only fragmental information is available from Cephalochordata and Echinodermata. However, ongoing amphioxus and sea urchin genome projects should reveal the early evolution of the deuterostome complement system.

Because the protostome genomes analyzed first in D. melanogaster and C. elegans contained no complement genes, the complement system was believed to be a unique property of deuterostomes. However, the recent identification of some complement genes from horseshoe crab (Zhu et al. 2005) and Cnidaria (Dishaw et al. 2005) has indicated that the origin of the complement system is extremely ancient. Therefore, the absence of the complement genes in D. melanogaster and C. elegans seems to be due to secondary loss. These two model animals have a very short generation time, and it is tempting to speculate that their genomes were streamlined, thus getting rid of the complement genes.

The sea anemone (N. vectensis) genome contained only the two complement genes, C3 and Bf. This result suggests that the sea anemone complement system is simple, composed of only two central components. However, we cannot rule out the possibility that other complement genes are present in sea anemone, but their sequences are much too diverged to be detected by Basic Local Alignment Search Tool (BLAST).

In the following sections, we discuss the individual evolution of each complement gene family.

C3/C4/C5

In contrast to the other complement components, C3, C4, and C5 were considered to be unique in that they do not have an obvious domain structure. Recent elucidation of the crystal structure of C3 (Janssen et al. 2005), however, has revealed that human C3 is composed of 13 domains: eight macroglobulin domains, a linker domain, an anaphylatoxin domain, a CUB domain, a thioester-containing domain, and a C345C domain. Although steric structures of the eight macroglobulin domains are similar to each other, there is almost no similarity in their amino acid sequences, explaining why this repeating structure had not been recognized until the crystal structure was elucidated. The primary structures of complement components C3, C4, and C5 show a weak but significant similarity to those of a serum protease inhibitor alpha2-macroglobulin (A2M) and a glycosylphosphatidylinositol-anchored cell surface molecule CD109 (Solomon et al. 2004; Sottrup-Jensen et al. 1985). Moreover, they share a unique structure, an intramolecular thioester bond, except for C5, which is believed to have lost it secondarily. Therefore, the family composed of these genes is called the thioester-containing protein (TEP) gene family. Elucidation of the domain structure of C3 has suggested that the ancestral molecule of the TEP proteins has a simple repeating structure composed of eight macroglobulin domains, and that the other domains were inserted later.

An increasing number of TEP family genes are being identified from various animal phyla, making it clearer that this family is divided into two subfamilies: the C3 subfamily comprising C3, C4, and C5 and the A2M subfamily comprising A2M, CD109, and insect TEP. Because only the latter subfamily members were identified in the genomes of D. melanogaster and C. elegans, whereas both subfamily members were identified from all analyzed deuterostomes, the C3 subfamily was considered to be established by gene duplication from A2M in the deuterostome lineage (Nonaka 2001). However, recent identification of the C3 subfamily members in arthropodian horseshoe crab (Zhu et al. 2005) and cnidarian coral (Dishaw et al. 2005) has indicated that the origin of the C3 gene is traced back to before the divergence of Cnidaria and Bilateralia, and is estimated to be about 1,300 MYA. Moreover, the members of both C3 and A2M subfamilies are present in the nematostella genome (N. vectensis, http://www.stellabase.org/). Thus, the emergence of the TEP molecules and TEP differentiation into the C3 and A2M subfamilies seems to have predated the divergence of Cnidaria and Bilateralia. Although the evolutionary origin of the TEP gene is still to be clarified, our preliminary reverse transcriptase polymerase chain reaction search for the TEP genes in species of sponges failed (S. Sugimoto and M. Nonaka, unpublished data). The presence of TEP family genes in prokaryotes was reported (Budd et al. 2004). However, the distribution of TEP genes in various bacteria does not fit with their phylogeny, leading the authors to conclude that they were obtained secondarily by a horizontal gene transfer from eukaryotes.

At least one A2M subfamily member was identified from all eumetazoa searched for TEP genes, although the C3 subfamily member was identified only from deuterostomes, a protostome, horseshoe crab, and cnidarians. Thus, the C3 gene that appeared before the Cnidaria/Bilateralia divergence seems to have been lost many times at various stages during the protostome evolution. In the deuterostome lineage, the C3 gene multiplied independently at least twice in the urochordate and vertebrate lineages. In the vertebrate lineage, the C3 multiplication that gave rise to C3, C4, and C5 occurred before the emergence of cartilaginous fish because all three genes are present in sharks (Terado et al. 2003; H. Nagumo et al, unpublished data). In contrast, it is not clear whether this multiplication occurred before or after the divergence of Agnatha; upon phylogenetic tree analysis the agnathan genes isolated from lamprey (Nonaka and Takahashi 1992) and hagfish (Ishiguro et al. 1992) were located in the C3 clade rather than outside of the C3/C4/C5 cluster, although no additional member of this gene family was identified from Agnatha.

Bf/C2

The domain structure of this family gene, composed of three SCR domains, a von Willebrand factor type A domain and a serine protease domain, is unique among the higher vertebrate genes. Thus, there is no doubt that the genes with essentially the same domain structure found in horseshoe crab and sea anemone are orthologs of the mammalian Bf and C2, indicating that the origin of this gene family is more ancient than the divergence of Cnidaria and Bilateralia. From all the deuterostomes analyzed so far, at least one member of this family was identified. In some cases, extra domains were added at the N terminus: the ascidian Bf has extra SCR and low-density lipoprotein receptor (LDLR) domains (Yoshizaki et al. 2005), and the sea urchin Bf has an extra SCR domain (Smith et al. 1998). The presence of both Bf and C2 was reported in amphibia (Ohta et al. 2006) and mammals, indicating that the Bf/C2 gene duplication predated the amphibia/mammal divergence. On the other hand, lamprey (Nonaka et al. 1994) and invertebrate Bf (Smith et al. 1998; Yoshizaki et al. 2005) are located outside of the jawed vertebrate Bf and C2 in the phylogenetic tree, suggesting that the Bf/C2 gene duplication occurred in the jawed vertebrate lineage. However, the Bf/C2 genes of the bony and cartilaginous fish (Kuroda et al. 1996; Nakao et al. 1998; Seeger et al. 1996; Terado et al. 2001) show almost the same degree of similarity to the tetrapod Bf and C2 genes, making it difficult to further define the timing of the Bf/C2 gene duplication.

C1q/MBP/ficolin

All these molecules have the collagen domain at their N terminus. In contrast, they have their respective globular domains at their C terminus. The overall domain structure of these molecules is relatively simple, composed of only two domains. In addition, these domain structures are shared not only by the complement components but also by a number of noncomplement proteins in mammals. Thus, phylogenetic tree analyses are required to assess the orthologous relationships between the mammalian and nonmammalian genes. The phylogenetic tree analysis of the C1q and related genes has indicated that the lamprey C1q gene forms a clade with the higher vertebrate C1qA, C1qB, and C1qC genes (Matsushita et al. 2004). In contrast, the sea urchin and ascidian C1q-like genes are located outside of this clade together with the related mammalian and fish genes. These results indicate that C1q most probably emerged at an early stage of vertebrate evolution before the establishment of adaptive immunity and the emergence of immunoglobulin. Thus, original C1q seems to have recognized foreign molecules independent from immunoglobulins.

The lamprey mannan-binding protein (MBP) genes were reported recently (Takahashi et al. 2006). Together with the previous reports on MBP of mammals, birds, and teleosts, this report indicates that the origin of MBP can be traced back to an early stage of vertebrate evolution. However, mammalian and bird lung surfactant protein genes, SP-A and SP-D, seem to have diverged from MBP after the divergence of Agnatha. Ascidian glucose-binding lectin and MBPs are located outside of the vertebrate MBP, surfactant protein, and other collectin clusters, thus rendering the orthologous relationship between the vertebrate and ascidian MBP genes to be doubtful.

In contrast to C1q and MBP, ficolin orthologs have so far been identified only from tetrapods. In the phylogenetic tree, genes with the same domain structure reported in the teleost, ascidian, and sea urchin are grouped together with nonficolin genes of higher vertebrates. Thus, the origin of ficolin seems to be much younger than that of C1q or MBP.

MASP-1, MASP-2, MASP-3, C1r, and C1s

The evolution of this gene family was reviewed several times because of its unique and interesting processes, including gene duplication, retrotransposition, and modification of the serine protease-encoding regions (Fujita 2002; Fujita et al. 2004; Nonaka and Miyazawa 2002; Nonaka and Yoshizaki 2004). Therefore, we discuss here only one of this gene family’s major evolutionary events: the origin of the apparently modern MASP-2, MASP-3, C1r, and C1s genes. From the structural comparison of the mammalian and various chordate MASP-1, MASP-3, MASP-2, C1r, and C1s genes, it is suggested that the ancestral MASP-2, MASP-3, C1r, and C1s genes were generated by the insertion of a new serine protease-encoding exon into an ancient MASP-1 gene. Only the MASP-1 type genes are present in ascidians, and both the MASP-1 type and MASP-2, MASP-3, C1r, and C1s type genes are present in amphioxus. Based on the previous understanding of animal phylogeny that cephalochordates and vertebrates are more closely related to each other than either is to urochordates, this result seemed to suggest that the retrotransposition to generate the MASP-2, MASP-3, C1r, and C1s type genes occurred after the divergence of urochordates but before the divergence of cephalochordates in the lineage leading to vertebrates. However, following upon the revision of chordate evolution, a new interpretation is that this retrotransposition occurred in the common ancestor of chordates, and urochordates secondarily lost the MASP-2, MASP-3, C1r, and C1s type genes. Similarly, loss of the MASP-1 type gene also occurred multiple times because lamprey, shark, carp, and chicken seem to have only the MASP-2, MASP-3, C1r, and C1s type genes. The evolutionary origin of the C1r and C1s genes remains to be clarified, although the presence of functional evidence in shark (Jensen et al. 1981) and functional and molecular evidence in carp (Nakao et al. 2003b) suggest that the C1r and Cls genes were established in the common ancestor of the jawed vertebrates.

Terminal components

The terminal complement components (TCCs) and C5b assemble to form the membrane attack complex (MAC), which forms pores on the plasma membrane of the target cell, disturbs the membrane potential, and finally leads to cell lysis. Mammalian TCCs, C6, C7, C8A, C8B, and C9 share a unique domain structure, composed of the TSP (thrombospondin type I) domain, the LDLR class A domain, the membrane attack complex/perforin (MACPF) domain, and the epidermal growth factor (EGF)-like domain, while C6 and C7 have additional domains: the complement control protein (CCP) domain and the factor I/membrane attack complex (FIM) domain at their C termini. All TCC genes are present in the mammalian, avian, and amphibian genomic sequences, except for the avian C9 gene, which is not found in the draft genome sequence of chicken (Fig. 2). Although the origin of the TCC genes can be traced back to before the divergence of urochordates, cephalochordates, and vertebrates, the gene duplications among the C6, C7, C8, and C9 genes seem to have occurred in the vertebrate lineage after its divergence from urochordates and cephalochordates. It is interesting to note that it is possible that primitive ascidian and amphioxus TCCs are not activated through the complement system because they lack the FIM domain responsible for the interaction with the C345C domain of C5 (Thai and Ogata 2004) and the CCP domain, which potentially interact with other complement molecules. Thus, although the orthologous relationship between ascidian/amphioxus TCCs and mammalian TCCs is well supported by their domain structures, the biological function and activation mechanism of ascidian and amphioxus TCCs could be quite different. Despite an earlier report on the presence of the TCC in sharks, no gene sequence was published to date. In addition, there is no information at all on the TCC of cyclostomes. Identification and structural characterization of the TCC genes in these animals will clarify the evolution of the TCC genes involved in the MAC formation.

On the other hand, proteins containing the MACPF domain, but lacking the other TCC-specific domains, are found in many organisms belonging to the broad range of phyla and even kingdoms, though many of them are merely predicted genes from the draft genomes and have no known function. They include (1) invertebrates: sea urchin (Haag et al. 1999), abalones (Mah et al. 2004), venomous sea anemone (Nagai et al. 2002; Oshiro et al. 2004), and Drosophila melanogaster (Martin et al. 1994); (2) protozoans: malarial parasite Plasmodium (Kaiser et al. 2004), bovine parasite Theileria annulata, and Tetrahymena thermophila; (3) plants: Arabidopsis thaliana (Morita-Yamamuro et al. 2005) and Oryza sativa; (4) (pathogenic) fungi: Emericella nidulans; (5) bacteria: Chlamydias (Ponting 1999), luminescent bacterium, and intraintestinal bacterium.

Some of these MACPF domain-containing molecules are known to have a toxic function or are implicated in pathogenesis or developmental pathways. In addition, astrotactin proteins of vertebrates composed of one MACPF, one fibronectin type 3, and three EGF-like domains are implicated in neuronal migration along glial fibers (Zheng et al. 1996). Among these, non-TCC MACPF molecules, toxins of the venomous sea anemone possessing the MACPF and EGF domains and a very high hemolytic potential, are possibly closest to the TCCs. However, the mechanism to avoid host damage is quite different; the hemolytic toxins of sea anemone are enclosed in the nematocyst and are released only upon stinging the target body, while the TCCs are serum proteins whose hemolytic activity is regulated by interactions with the complement system through the additional TCC-specific domains.

Taken together, in the common ancestor of chordates TCC molecules seem to have been tuned for extremely effective, targeted, and regulated hemolytic function by the addition of the extra domains to the MACPF domain.

Other complement components

As shown in Fig. 2, most of the other component genes are present in the teleost genomes but not in the ascidian genome, suggesting that either these genes emerged in the vertebrate lineage or that they are evolving too quickly, making it difficult to detect the ascidian counterparts by BLAST search using vertebrate sequences as the query. The absence of genome-wide information in cartilaginous fish and agnatha prevents a further definition of the evolutionary origin of these genes. For the complement regulators or receptors with the SCR domains, RCA, the structural and functional counterpart was reported from lamprey (Kimura et al. 2004), although the biological function of the structural orthologs in ascidians is yet to be clarified (Azumi et al. 2003). For the CR3 and CR4 genes, which encode integrin complement receptors composed of the alpha and beta chains, the presence of the structural orthologs is confirmed in the draft genome of X. tropicalis, although four copies of the alpha genes show a similar level of similarity to CR3 and CR4. In contrast, ascidian genes, whose products were shown to play a role as a C3 receptor, did not show an orthologous relationship with the mammalian functional counterparts (Miyazawa et al. 2001; Miyazawa and Nonaka 2004). Both the alpha and beta chain genes seem to have expanded in the vertebrate lineage after the divergence of urochordates, suggesting that the functional and structural diversification of integrins occurred in the vertebrate lineage, although C3 was one of the original ligands.

Evolutionarily conserved linkage

The most curious genetic linkage among the mammalian complement genes may be that among the C4, Bf, and C2 genes found in the MHC (Carroll et al. 1984; Chaplin et al. 1983). The X. tropicalis genome analysis has indicated that these genes are also tightly linked to each other in the frog MHC (Ohta et al. 2006). Although the C4 and Bf/C2 genes are not linked to each other or to MHC class I or II genes in teleost (Kuroda et al. 1996, 2000; Samonte et al. 2002), this may be due to teleost-specific extensive genomic rearrangement, and the shark C4 and Bf/C2 genes are linked to each other in its MHC (Terado et al. 2003). Thus, the basic genome structure of the MHC complement gene region seems to have been established early in the evolution of jawed vertebrates. Using the entire genome information on an ascidian urochordate, C. intestinalis, the possibility was analyzed that the origin of this linkage was more ancient and that the common ancestor of C3, C4, and C5 and the common ancestor of Bf and C2 were linked to each other before the establishment of the MHC. C. intestinalis has two C3 genes on two different chromosomes and three Bf genes arranged in tandem on another chromosome (Yoshizaki et al. 2005). Thus, it is likely that the linkage between the C4 and Bf/C2 genes was simultaneously established in the jawed vertebrate lineage with the establishment of the classical pathway or the adaptive immune system. It is tempting to speculate that the close linkage between these complement genes played some role in establishing the classical pathway by promoting coevolution of these genes. Because the gene duplication that gave rise to Bf and C2 genes was most probably a tandem type, the Bf/C2 and C3/C4/C5 gene duplications are considered to be independent events. It is likely that these duplications occurred in the vertebrate lineage after the divergence of cyclostomes but before the divergence of cartilaginous fish, and then the linkages between one of the duplicated C3/C4/C5 genes, the Bf and C2 genes, and between these genes and the MHC class I and II genes were established before the emergence of cartilaginous fish. Because the MHC was established just before the divergence of cartilaginous fish and higher vertebrates (Flajnik and Kasahara 2001), the C4, Bf, and C2 genes seem to be original members of the MHC.

Another curious linkage among the mammalian complement genes involves the RCA genes (Hourcade et al. 1992). The human RCA genes, composed of the SCR domains, are tightly clustered on the long arm of chromosome 1, at 1q32, suggesting that they were generated by recurrent tandem duplications. Similar clustering of the RCA genes is also found in chicken (Oshiumi et al. 2005) and frog, although the latter contains only two genes. No clear linkage between the SCR genes was observed in the fugu genome. The RCA genes display a rapid evolution in both the primary structure and the number of SCR domains, and it is not easy to determine the orthologous relationships between the genes from different animal classes. Even between human and mouse, the counterparts of certain genes are missing from the other species. Despite the difficulty in lineage identification of the RCA genes, it is conceivable that at least one round of tandem duplication predated the emergence of amphibia because X. tropicalis has linked C4BP and DAF genes. These two genes are considered to be the founding members of the vertebrate RCA gene cluster.

The other linkages between the complement genes recognized in the mammalian genome are those between C6 and C7, C8A and C8B, and C1r and C1s. All of these linkages are conserved in the chicken, frog, and fugu genomes, and only a few tandem duplications (Hosa C1r, Xetr C1r, Gaga C7, and Xetr C6) and one inversion (Taru C1s) are observed (Fig. 3). A high degree of conservation of these linkages, even in teleosts believed to have experienced an extensive genome rearrangement, suggests the presence of selective pressure to keep them together, most probably facilitating the coevolution of the linked genes. It is interesting to note that the sizes of the genes and the intergenic regions showed species-specific variation almost proportional to their genome sizes, despite a high degree of conservation of the basic gene organization.

Fig. 3
figure 3

Evolutionary conservation of genetic linkages between complement genes. Genomic organization of four sets of the linked complement genes in four species, human (Hosa), chicken (Gaga), clawed frog (Xetr), and fugu (Taru), are shown to scale: a C6 and C7, b C8A and C8B, c C1r and C1s, and d RCA genes. Note that the relative orientations of these genes are perfectly conserved except for the fish C1s gene

Conclusion

The current view of the complement system evolutionary processes is summarized in Fig. 4. First, the primitive complement system, most likely composed of C3 and Bf and thus similar to the mammalian alternative pathway, emerged in the common ancestor of Cnidaria and Bilateralia more than 1,300 MYA. Structural features of these Cnidaria genes suggest that the ancestral C3 was proteolytically activated by Bf, and that it formed a covalent bond with nonself molecules using its intramolecular thioester bond. Whereas the C3 and Bf genes were retained by deuterostomes, they were lost many times independently in the protostome lineages. Second, with the emergence of chordates (900 MYA), the MASP, MBL, and ficolin genes were recruited to the complement system, establishing the lectin pathway. Finally, vertebrate-specific complement gene duplications, such as those among C3/C4/C5 and between Bf/C2 and MASP/C1r/s, occurred before the emergence of cartilaginous fish about 600 MYA, most probably contributing to the establishment of the third activation pathway, the classical pathway. Thus, the complement classical pathway seems to have been established simultaneously with the appearance of the lymphocyte-MHC-based adaptive immune system. Ancestral TCC genes appear to have been recruited by the complement system and duplicated to C6/C7/C8A/C8B/C9 before the appearance of the jawed vertebrates, although its timing still needs to be clarified in detail. The linkages between certain complement genes played a certain role in establishing the modern complement system by facilitating the coevolution of the linked genes.

Fig. 4
figure 4

Evolutionary processes of the complement system. Evolutionary origins of the three complement activation pathways are shown by the gray arrows. The origin and evolution of the major gene families of the complement system are shown by the colored arrows. Timings of the gene duplications that possibly contributed to the establishment of the classical pathway are shown by the double-headed arrows. Because the presence of the classical pathway was functionally demonstrated in sharks, it is likely that the Bf/C2 and MASP/C1r,s gene duplication occurred before the emergence of cartilaginous fish