Skip to main content
Log in

A species-generalized probabilistic model-based definition of CpG islands

  • Published:
Mammalian Genome Aims and scope Submit manuscript

Abstract

The DNA of most vertebrates is depleted in CpG dinucleotides, the target for DNA methylation. The remaining CpGs tend to cluster in regions referred to as CpG islands (CGI). CGI have been useful as marking functionally relevant epigenetic loci for genome studies. For example, CGI are enriched in the promoters of vertebrate genes and thought to play an important role in regulation. Currently, CGI are defined algorithmically as an observed-to-expected ratio (O/E) of CpG greater than 0.6, G+C content greater than 0.5, and usually but not necessarily greater than a certain length. Here we find that the current definition leaves out important CpG clusters associated with epigenetic marks, relevant to development and disease, and does not apply at all to nonvertabrate genomes. We propose an alternative Hidden Markov model-based approach that solves these problems. We fit our model to genomes from 30 species, and the results support a new epigenomic view toward the development of DNA methylation in species diversity and evolution. The O/E of CpG in islands and nonislands segregated closely phylogenetically and showed substantial loss in both groups in animals of greater complexity, while maintaining a nearly constant difference in CpG O/E between islands and nonisland compartments. Lists of CGI for some species are available at http://www.rafalab.org.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Bird A (1986) CpG-rich islands and the function of DNA methylation. Nature 321:209–213

    Article  CAS  Google Scholar 

  • Churchill GA (1989) Stochastic models for heterogeneous DNA sequences. Bull Math Biol 51:79–94

    Article  CAS  Google Scholar 

  • Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B et al (2008) Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452:215–219

    Article  CAS  Google Scholar 

  • Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge, UK: Cambridge University Press

    Book  Google Scholar 

  • Elango N, Hunt BG, Goodisman MA, Yi SV (2009) DNA methylation is widespread and associated with differential gene expression in castes of the honeybee, Apis mellifera. Proc Natl Acad Sci U S A 106:11206–11211

    Article  CAS  Google Scholar 

  • Feinberg AP (2007) Phenotypic plasticity and the epigenetics of human disease. Nature 447:433–440

    Article  CAS  Google Scholar 

  • Gardiner-Garden M, Frommer M (1987) CpG islands in vertebrate genomes. J Mol Biol 196:261–282

    Article  CAS  Google Scholar 

  • Glass J, Thompson RF, Khulan B, Figueroa ME, Olivier EN et al (2007) CG dinucleotide clustering is a species-specific property of the genome. Nucleic Acids Res 35:6798–6807

    Article  CAS  Google Scholar 

  • Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C et al (2009) Genome-wide methylation analysis of human colon cancer reveals similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat Genet 41(2):246–250

    Article  Google Scholar 

  • Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH et al (2002) The human genome browser at UCSC. Genome Res 12:996–1006

    Article  CAS  Google Scholar 

  • Kucharski R, Maleszka J, Foret S, Maleszka R (2008) Nutritional control of reproductive status in honeybees via DNA methylation. Science 319:1827–1830

    Article  CAS  Google Scholar 

  • Larsen F, Gundersen G, Lopez R, Prydz H (1992) CpG islands as gene markers in the human genome. Genomics 13:1095–1107

    Article  CAS  Google Scholar 

  • Lyko F, Ramsahoye B, Jaenisch R (2000) DNA methylation in Drosophila melanogaster. Nature 408:538–540

    Article  CAS  Google Scholar 

  • Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77:257–286

    Article  Google Scholar 

  • Takai D, Jones P (2002) Comprehensive analysis of CpG islands in human chromosomes 21 and 22. Proc Natl Acad Sci U S A 99:3740–3745

    Article  CAS  Google Scholar 

  • Wu H, Caffo B, Jaffee HA, Feinberg AP, Irizarry RA (2009) Redefining CpG Islands using a Hideen Markov model. Johns Hopkins University, Dept. of Biostatistics Working Papers. Working Paper 199

  • Yagi S, Hirabayashi K, Sato S, Li W, Takahashi Y et al (2008) DNA methylation profile of tissue-dependent and differentially methylated regions (T-DMRs) in mouse promoter regions demonstrating tissue-specific gene expression. Genome Res 18:1969–1978

    Article  CAS  Google Scholar 

Download references

Acknowledgments

NIH grants P50HG003233 and R01GM083084 supported this work. We also thank Harris Jaffee and Brian Caffo for their input and the reviewers for useful comments that reshaped the manusctipt.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Rafael A. Irizarry or Andrew P. Feinberg.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Irizarry, R.A., Wu, H. & Feinberg, A.P. A species-generalized probabilistic model-based definition of CpG islands. Mamm Genome 20, 674–680 (2009). https://doi.org/10.1007/s00335-009-9222-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00335-009-9222-5

Keywords

Navigation