Small open reading frames: Not so small anymore

  1. Richelle Sopko1 and
  2. Brenda Andrews1,2,3
  1. 1 Department of Medical Genetics and Microbiology, University of Toronto, Toronto, Ontario, Canada M5S 1A8
  2. 2 Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5G 1L6

This extract was created in the absence of an abstract.

Today, nearly 10 years after the publication of the complete sequence of the Saccharomyces cerevisiae genome, the total number of genes in this organism is largely considered resolved and currently stands at 5782 (http://www.yeastgenome.org/cache/genomeSnapshot.html). This gene sequence information has led to construction of numerous yeast strain and plasmid collections, including the two-hybrid (Ito et al. 2000, 2001; Uetz et al. 2000) viable haploid deletion (Winzeler et al. 1999; Giaever et al. 2002), titratable promoter allele (Mnaimneh et al. 2004), and chromosomally tagged green fluorescent protein and TAP fusion libraries (Ghaemmaghami et al. 2003; Huh et al. 2003). Because creation of these collections has relied on the accuracy of gene annotation at the time of library construction, each library has limitations. One major issue for most is the fact that initial annotation of the S. cerevisiae genome included only those regions consisting of at least 100 contiguous codons, and therefore small open reading frames (sORFs) encoding functional proteins were largely missed (Goffeau et al. 1996) and only considered later following detection of expression (Olivas et al. 1997; Velculescu et al. 1997; Kumar et al. 2002; Oshiro et al. 2002; Kessler et al. 2003). As such, the phenotypic consequences of gene disruption of more than half of these sORFs has not been assessed, and sORFs are underrepresented in genomic libraries and other collections.

In this issue of Genome Research, Kastenmayer et al. (2006) provide the first systematic analysis of the prevalence of sORFs in a eukaryotic genome. Of the 299 currently recognized sORFs in the S. cerevisiae genome, they discovered that more than half (170) have been annotated since the genome was sequenced. Of these, 12% were identified by homology, and an even greater fraction (74%) was identified by combining both homology and …

| Table of Contents

Preprint Server