Characterization and predictive discovery of evolutionarily conserved mammalian alternative promoters

  1. Daehyun Baek1,3,
  2. Colleen Davis2,
  3. Brent Ewing2,
  4. David Gordon2, and
  5. Phil Green2,3
  1. 1 Department of Bioengineering, University of Washington, Seattle, Washington 98195, USA;
  2. 2 Howard Hughes Medical Institute and Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA

Abstract

Recent studies suggest that surprisingly many mammalian genes have alternative promoters (APs); however, their biological roles, and the characteristics that distinguish them from single promoters (SPs), remain poorly understood. We constructed a large data set of evolutionarily conserved promoters, and used it to identify sequence features, functional associations, and expression patterns that differ by promoter type. The four promoter categories CpG-rich APs, CpG-poor APs, CpG-rich SPs, and CpG-poor SPs each show characteristic strengths and patterns of sequence conservation, frequencies of putative transcription-related motifs, and tissue and developmental stage expression preferences. APs display substantially higher sequence conservation than SPs and CpG-poor promoters than CpG-rich promoters. Among CpG-poor promoters, APs and SPs show sharply contrasting developmental stage preferences and TATA box frequencies. We developed a discriminator to computationally predict promoter type, verified its accuracy through experimental tests that incorporate a novel method for deconvolving mixed sequence traces, and used it to find several new APs. The discriminator predicts that almost half of all mammalian genes have evolutionarily conserved APs. This high frequency of APs, together with the strong purifying selection maintaining them, implies a crucial role in expanding the expression diversity of the mammalian genome.

Footnotes

  • 3 Corresponding authors.

    3 E-mail phg{at}u.washington.edu; fax (206) 685-9720.

    3 E-mail baek{at}u.washington.edu; fax (206) 685-9720.

  • [Supplemental material is available online at www.genome.org.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5872707

    • Received August 16, 2006.
    • Accepted November 29, 2006.
  • Freely available online through the Genome Research Open Access option.

Related Articles

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server