Skip to main content
Log in

In Arabidopsis thaliana, 1% of the genome codes for a novel protein family unique to plants

  • Published:
Plant Molecular Biology Aims and scope Submit manuscript

Abstract

In the sequences released by the Arabidopsis Genome Initiative (AGI), we discovered a new and unexpectedly large family of orphan genes (127 genes by 01.08.99), named AtPCMP. The distribution of the AtPCMP genes on the five chromosomes suggests that the genome of Arabidopsis thaliana contains more than 200 genes of this family (1% of the whole genome). The deduced AtPCMP proteins are characterized by a surprising combinatorial organization of sequence motifs. The amino-terminal domain is made of a succession of three conserved motifs which generate an important diversity. These proteins are classified into three subfamilies based on the length and nature of their carboxy-terminal domain constituted by 1–6 motifs. All the motifs characterized have an important level of conservation in both sequence and spacing. A specific signature of this large family is defined. The presence of ESTs in databases and the detection of clones in A. thaliana cDNA libraries indicate that most of the genes of this family are expressed. The absence of similar sequences outside the plant kingdom strongly suggests that this unusually large orphan family is unique to plants. Features, the genesis, the potential function and the evolution of this plant combinatorial and modular protein family are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Altshul, S.F., Stephen, F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403–410.

    Google Scholar 

  • Aubourg, S., Takvorian, A., Chéron, A., Kreis, M. and Lecharny, A. 1997. Structure, organization and putative function of the genes identified within a 23.9 kb fragment from Arabidopsis thaliana chromosome IV. Gene 199: 241–253.

    Google Scholar 

  • Aubourg, S., Kreis, M. and Lecharny, A. 1999a. The DEAD box RNA helicase family in Arabidopsis thaliana. Nucl. Acids Res. 27: 628–636.

    Google Scholar 

  • Aubourg, S., Picaud, A., Kreis, M. and Lecharny, A. 1999b. Structure and expression of three src2 homologues and a novel subfamily of flavoprotein monooxygenase genes revealed by the analysis of a 25-kb fragment from Arabidopsis thaliana chromosome IV. Gene 230: 197–205.

    Google Scholar 

  • Bailey, T.L. and Elkan, C. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: R. Altmann (Ed.) Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, AAAI Press, pp. 28–36.

  • Bailey, T.L. and Gribskov, M. 1998. Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14: 48–54.

    Google Scholar 

  • Bevan, M., Bancroft, I., Bent, E., Love, K., Goodman, H., Dean, C., Bergkamp, R., Dirkse, W., van Staveren, M., Stiekema, W. et al.: Analysis of 1.9 Mb of contiguous sequence from chromosome 4 of Arabidopsis thaliana. Nature 391: 485- 488.

  • Bianchi, M.W., Guivarc'h, D., Thomas, M., Woodgett, J.R. and Kreis, M. 1994. Arabidopsis homologs of the shaggy and GSK-3 protein kinases / molecular cloning and functional expresssion in E. coli. Mol. Gen. Genet. 242: 337–345.

    Google Scholar 

  • Chiapello, H., Lisacek, F., Caboche, M. and Hénaut, A. 1998. Codon usage and gene function are related in sequences of Arabidopsis thaliana. Gene 209: GC1–GC38.

    Google Scholar 

  • Cooke, R., Raynal, M., Laudié, M., Grellet, F., Delseny, M., Morris, P.-C., Guerrier, D., G iraudat, J., Quigley, F., Clabault, G. et al. 1996. Further progress towards a catalogue of all Arabidopsis genes: analysis of a set of 5000 non-redundant ESTs. Plant J. 9: 101–124.

    Google Scholar 

  • Cooke, J., Nowak, M.A., Boerlijst, M. and Maynard-Smith, J. 1997. Evolutionary origins and maintenance of redundant gene expression during metazoan development. Trends Genet. 13: 360–364.

    Google Scholar 

  • Creusot, F., Fouilloux, E., Dron, M., Lafleuriel, J., Picard, G., Billaut, A., Le Paslier, D., Cohen, D., Chaboute, M.E. and Durr, A. et al. 1995. The CIC library: a large insert YAC library for genome mapping in Arabidopsis thaliana. Plant J. 8: 763–770.

    Google Scholar 

  • Doerks, T., Bairoch, A. and Bork, P. 1998. Protein annotation: detective work for function prediction. Trends Genet. 14: 248–250.

    Google Scholar 

  • Doolittle, R.F. 1997. Microbial genomes opened up. Nature 392: 339–342.

    Google Scholar 

  • Dujon, B. 1996. The yeast genome project: what did we learn? Trends Genet.12: 263–270.

    Google Scholar 

  • Geourjon, C. and Deléage, G. 1995. SOPMA: significant improvement in protein secondary structure prediction by consensus prediction from multiple alignments. CABIOS 11: 681–684.

    Google Scholar 

  • Giraudat, J., Hauge, B.M., Valon, C., Smalle, J., Parcy, F. and Goodman, H.M. 1992. Isolation of the Arabidopsis ABI3 gene by positional cloning. Plant Cell 4: 1251–1261.

    Google Scholar 

  • Hebsgaard, S.M., Korning, P.G., Tolstrup, N., Engelbrecht, J., Rouzé, P. and Brunak, S. 1996. Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucl. Acids Res. 24: 3439–3452.

    Google Scholar 

  • Ito, M., Matsuo, Y. and Nishikawa, K. 1997. Prediction of protein secondary structure using the 3D-1D compatibility algorithm. CABIOS 13: 415–423.

    Google Scholar 

  • Kieber, J.J., Rothenberg, M., Roman, G., Feldmann, K.A. and Ecker, J.R. 1993. CTR1, a negative regulator of the ethylene response pathway in Arabidopsis, encodes a member of the raf family protein kinases. Cell 72: 427–441.

    Google Scholar 

  • Kranz, H.D., Denekamp, M., Greco, R., Jin, H., Leyva, A., Meissner, R.C., Petroni, K., Urzainqui, A., Bevan, M., Martin, C. et al. Towards functional characterisation of the members of the R2R3-MYB gene family from Arabidopsis thaliana. Plant J. 16: 263- 276.

  • Martinez, P., Martin, W. and Cerff, R. 1989. Structure, evolution and anaerobic regulation of a nuclear gene encoding cytosolic glyceraldhehyde 3-phosphate dehydrogenase from maize. J. Mol. Biol. 208: 551–565.

    Google Scholar 

  • McGrath, J.M., Jancso, M.M. and Pichersky, E. 1993. Duplicate sequences with a similarity to expressed genes in the genome of Arabidopsis thaliana. Theor. Appl. Genet. 86: 880–888.

    Google Scholar 

  • Mewes, H.W., Albermann, K., Bähr, M., Frishman, D., Gleissner, A., Hani, J., Heumann, K., Kleine, K., Maierl, A., Oliver, S.G., Pfeiffer, F. and Zollner, A. 1997. Overview of the yeast genome. Nature 387: 7–8.

    Google Scholar 

  • Nakai, K. and Kanehisa, M. 1992. A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14: 897–911. auNewman, T., de Bruijn, F.J., Green, P., Keegstra, K., Kende, H., McIntosh, L., Ohlrogge, J., Raikhel, N., Somerville, S., Thomashow, M. et al. Genes galore: a summary of methods for accessing results from large-scale partial sequencing of anonymous Arabidopsis cDNA clones. Plant Physiol. 106: 1241- 1255.

    Google Scholar 

  • Pedersen, A.G. and Nielsen, H. 1997. Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis. ISMB 5: 226–233.

    Google Scholar 

  • Quigley, F., Dao, P., Cottet, A. and Mache, R. 1996. Sequence analysis of an 81 kb contig from Arabidopsis thaliana chromosome III. Nucl. Acids Res. 24: 4313–4318.

    Google Scholar 

  • Rhee, S.Y., Weng, S., Bongard-Pierce, D.K., Garcia-Hernandez, M., Malekian, A., Flanders, D.J. and Cherry, M. 1999. Unified display of Arabidopsis thaliana physical maps from AtDB, the A. thaliana database. Nucl. Acids Res. 27: 79–84.

    Google Scholar 

  • Ride, J.P., Davies, E.M., Franklin, F.C.H. and Marshall, D.F. 1999. Analysis of Arabidopsis genome sequence reveals a large new gene family in plants. Plant Mol. Biol. 39: 927–932.

    Google Scholar 

  • Robertson, H.M. 1998. Two large families of chemoreceptor genes in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal extensive gene duplication, diversification, movement and intron loss. Genome Res. 8: 449–463.

    Google Scholar 

  • Ruvkun, G. and Hobert, O. 1998. The taxonomy of developmental control in Caenorhabditis elegans. Science 282: 2033–2041.

    Google Scholar 

  • Salamov, A.A., Nishikawa, T. and Swindells, M.B. 1998. Assessing protein coding region integrity in cDNA sequencing projects. Bioinformatics 14: 384–390.

    Google Scholar 

  • Sato, S., Kotani, H., Nakamura, Y., Kaneko, T., Asamizu, E., Fukami, M., Miyajima, N. and Tabata, S. 1997. Structural analysis of Arabidopsis thaliana chromosome 5. I. Sequence features of the 1.6 Mb regions covered by twenty physically assigned P1 clones. DNA Res. 4: 215–230.

    Google Scholar 

  • Tatusov, R.L., Koonin, E.V. and Lipman, D.J. 1997. A genomic perspective on protein families. Science 278: 631–637.

    Google Scholar 

  • Weigel, D. and Meyerowitz, E.M. 1993. Activation of floral homeotic genes in Arabidopsis. Science 261: 1723–1726.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aubourg, S., Boudet, N., Kreis, M. et al. In Arabidopsis thaliana, 1% of the genome codes for a novel protein family unique to plants. Plant Mol Biol 42, 603–613 (2000). https://doi.org/10.1023/A:1006352315928

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1006352315928

Navigation