Abstract
In the sequences released by the Arabidopsis Genome Initiative (AGI), we discovered a new and unexpectedly large family of orphan genes (127 genes by 01.08.99), named AtPCMP. The distribution of the AtPCMP genes on the five chromosomes suggests that the genome of Arabidopsis thaliana contains more than 200 genes of this family (1% of the whole genome). The deduced AtPCMP proteins are characterized by a surprising combinatorial organization of sequence motifs. The amino-terminal domain is made of a succession of three conserved motifs which generate an important diversity. These proteins are classified into three subfamilies based on the length and nature of their carboxy-terminal domain constituted by 1–6 motifs. All the motifs characterized have an important level of conservation in both sequence and spacing. A specific signature of this large family is defined. The presence of ESTs in databases and the detection of clones in A. thaliana cDNA libraries indicate that most of the genes of this family are expressed. The absence of similar sequences outside the plant kingdom strongly suggests that this unusually large orphan family is unique to plants. Features, the genesis, the potential function and the evolution of this plant combinatorial and modular protein family are discussed.
Similar content being viewed by others
References
Altshul, S.F., Stephen, F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403–410.
Aubourg, S., Takvorian, A., Chéron, A., Kreis, M. and Lecharny, A. 1997. Structure, organization and putative function of the genes identified within a 23.9 kb fragment from Arabidopsis thaliana chromosome IV. Gene 199: 241–253.
Aubourg, S., Kreis, M. and Lecharny, A. 1999a. The DEAD box RNA helicase family in Arabidopsis thaliana. Nucl. Acids Res. 27: 628–636.
Aubourg, S., Picaud, A., Kreis, M. and Lecharny, A. 1999b. Structure and expression of three src2 homologues and a novel subfamily of flavoprotein monooxygenase genes revealed by the analysis of a 25-kb fragment from Arabidopsis thaliana chromosome IV. Gene 230: 197–205.
Bailey, T.L. and Elkan, C. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: R. Altmann (Ed.) Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, AAAI Press, pp. 28–36.
Bailey, T.L. and Gribskov, M. 1998. Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14: 48–54.
Bevan, M., Bancroft, I., Bent, E., Love, K., Goodman, H., Dean, C., Bergkamp, R., Dirkse, W., van Staveren, M., Stiekema, W. et al.: Analysis of 1.9 Mb of contiguous sequence from chromosome 4 of Arabidopsis thaliana. Nature 391: 485- 488.
Bianchi, M.W., Guivarc'h, D., Thomas, M., Woodgett, J.R. and Kreis, M. 1994. Arabidopsis homologs of the shaggy and GSK-3 protein kinases / molecular cloning and functional expresssion in E. coli. Mol. Gen. Genet. 242: 337–345.
Chiapello, H., Lisacek, F., Caboche, M. and Hénaut, A. 1998. Codon usage and gene function are related in sequences of Arabidopsis thaliana. Gene 209: GC1–GC38.
Cooke, R., Raynal, M., Laudié, M., Grellet, F., Delseny, M., Morris, P.-C., Guerrier, D., G iraudat, J., Quigley, F., Clabault, G. et al. 1996. Further progress towards a catalogue of all Arabidopsis genes: analysis of a set of 5000 non-redundant ESTs. Plant J. 9: 101–124.
Cooke, J., Nowak, M.A., Boerlijst, M. and Maynard-Smith, J. 1997. Evolutionary origins and maintenance of redundant gene expression during metazoan development. Trends Genet. 13: 360–364.
Creusot, F., Fouilloux, E., Dron, M., Lafleuriel, J., Picard, G., Billaut, A., Le Paslier, D., Cohen, D., Chaboute, M.E. and Durr, A. et al. 1995. The CIC library: a large insert YAC library for genome mapping in Arabidopsis thaliana. Plant J. 8: 763–770.
Doerks, T., Bairoch, A. and Bork, P. 1998. Protein annotation: detective work for function prediction. Trends Genet. 14: 248–250.
Doolittle, R.F. 1997. Microbial genomes opened up. Nature 392: 339–342.
Dujon, B. 1996. The yeast genome project: what did we learn? Trends Genet.12: 263–270.
Geourjon, C. and Deléage, G. 1995. SOPMA: significant improvement in protein secondary structure prediction by consensus prediction from multiple alignments. CABIOS 11: 681–684.
Giraudat, J., Hauge, B.M., Valon, C., Smalle, J., Parcy, F. and Goodman, H.M. 1992. Isolation of the Arabidopsis ABI3 gene by positional cloning. Plant Cell 4: 1251–1261.
Hebsgaard, S.M., Korning, P.G., Tolstrup, N., Engelbrecht, J., Rouzé, P. and Brunak, S. 1996. Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucl. Acids Res. 24: 3439–3452.
Ito, M., Matsuo, Y. and Nishikawa, K. 1997. Prediction of protein secondary structure using the 3D-1D compatibility algorithm. CABIOS 13: 415–423.
Kieber, J.J., Rothenberg, M., Roman, G., Feldmann, K.A. and Ecker, J.R. 1993. CTR1, a negative regulator of the ethylene response pathway in Arabidopsis, encodes a member of the raf family protein kinases. Cell 72: 427–441.
Kranz, H.D., Denekamp, M., Greco, R., Jin, H., Leyva, A., Meissner, R.C., Petroni, K., Urzainqui, A., Bevan, M., Martin, C. et al. Towards functional characterisation of the members of the R2R3-MYB gene family from Arabidopsis thaliana. Plant J. 16: 263- 276.
Martinez, P., Martin, W. and Cerff, R. 1989. Structure, evolution and anaerobic regulation of a nuclear gene encoding cytosolic glyceraldhehyde 3-phosphate dehydrogenase from maize. J. Mol. Biol. 208: 551–565.
McGrath, J.M., Jancso, M.M. and Pichersky, E. 1993. Duplicate sequences with a similarity to expressed genes in the genome of Arabidopsis thaliana. Theor. Appl. Genet. 86: 880–888.
Mewes, H.W., Albermann, K., Bähr, M., Frishman, D., Gleissner, A., Hani, J., Heumann, K., Kleine, K., Maierl, A., Oliver, S.G., Pfeiffer, F. and Zollner, A. 1997. Overview of the yeast genome. Nature 387: 7–8.
Nakai, K. and Kanehisa, M. 1992. A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14: 897–911. auNewman, T., de Bruijn, F.J., Green, P., Keegstra, K., Kende, H., McIntosh, L., Ohlrogge, J., Raikhel, N., Somerville, S., Thomashow, M. et al. Genes galore: a summary of methods for accessing results from large-scale partial sequencing of anonymous Arabidopsis cDNA clones. Plant Physiol. 106: 1241- 1255.
Pedersen, A.G. and Nielsen, H. 1997. Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis. ISMB 5: 226–233.
Quigley, F., Dao, P., Cottet, A. and Mache, R. 1996. Sequence analysis of an 81 kb contig from Arabidopsis thaliana chromosome III. Nucl. Acids Res. 24: 4313–4318.
Rhee, S.Y., Weng, S., Bongard-Pierce, D.K., Garcia-Hernandez, M., Malekian, A., Flanders, D.J. and Cherry, M. 1999. Unified display of Arabidopsis thaliana physical maps from AtDB, the A. thaliana database. Nucl. Acids Res. 27: 79–84.
Ride, J.P., Davies, E.M., Franklin, F.C.H. and Marshall, D.F. 1999. Analysis of Arabidopsis genome sequence reveals a large new gene family in plants. Plant Mol. Biol. 39: 927–932.
Robertson, H.M. 1998. Two large families of chemoreceptor genes in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal extensive gene duplication, diversification, movement and intron loss. Genome Res. 8: 449–463.
Ruvkun, G. and Hobert, O. 1998. The taxonomy of developmental control in Caenorhabditis elegans. Science 282: 2033–2041.
Salamov, A.A., Nishikawa, T. and Swindells, M.B. 1998. Assessing protein coding region integrity in cDNA sequencing projects. Bioinformatics 14: 384–390.
Sato, S., Kotani, H., Nakamura, Y., Kaneko, T., Asamizu, E., Fukami, M., Miyajima, N. and Tabata, S. 1997. Structural analysis of Arabidopsis thaliana chromosome 5. I. Sequence features of the 1.6 Mb regions covered by twenty physically assigned P1 clones. DNA Res. 4: 215–230.
Tatusov, R.L., Koonin, E.V. and Lipman, D.J. 1997. A genomic perspective on protein families. Science 278: 631–637.
Weigel, D. and Meyerowitz, E.M. 1993. Activation of floral homeotic genes in Arabidopsis. Science 261: 1723–1726.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Aubourg, S., Boudet, N., Kreis, M. et al. In Arabidopsis thaliana, 1% of the genome codes for a novel protein family unique to plants. Plant Mol Biol 42, 603–613 (2000). https://doi.org/10.1023/A:1006352315928
Issue Date:
DOI: https://doi.org/10.1023/A:1006352315928