Abstract
It is understood that DNA and amino acid substitution rates are highly sequence context-dependent, e.g., C→T substitutions in vertebrates may occur much more frequently at CpG sites and that cysteine substitution rates may depend on support of the context for participation in a disulfide bond. Furthermore, many applications rely on quantitative models of nucleotide or amino acid substitution, including phylogenetic inference and identification of amino acid sequence positions involved in functional specificity. We describe quantification of the context dependence of nucleotide substitution rates using baboon, chimpanzee, and human genomic sequence data generated by the NISC Comparative Sequencing Program. Relative mutation rates are reported for the 96 classes of mutations of the form 5′αβγ3′ → 5′αδγ3′, where α, β, γ, and δ are nucleotides and β ≠ δ, based on maximum likelihood calculations. Our results confirm that C→T substitutions are enhanced at CpG sites compared with other transitions, relatively independent of the identity of the preceding nucleotide. While, as expected, transitions generally occur more frequently than transversions, we find that the most frequent transversions involve the C at CpG sites (CpG transversions) and that their rate is comparable to the rate of transitions at non-CpG sites. A four-class model of the rates of context-dependent evolution of primate DNA sequences, CpG transitions > non-CpG transitions ≈ CpG transversions > non-CpG transversions, captures qualitative features of the mutation spectrum. We find that despite qualitative similarity of mutation rates among different genomic regions, there are statistically significant differences.
Similar content being viewed by others
References
Agnez-Lima LF, Napolitano RL, Fuchs RP, Mascio PD, Muotri AR, Menck CF (2001) DNA repair and sequence context affect (1)O(2)-induced mutagenesis in bacteria. Nucleic Acids Res 29(13):2899–2903
Aquadro CF, Greenberg BD (1983) Human mitochondrial DNA variation and evolution: Analysis of nucleotide sequences from seven individuals. Genetics 103:287–312
Berg JM, Tymoczko JL, Stryer L (2002) Biochemistry, 5th ed. W.H.Freeman, New York
Blake RD, Hess ST, Nicholson-Tuell J (1992) The influence of nearest neighbors on the rate and pattern of spontaneous point mutations. J Mol Evol 34:189–200
Bouffard GG, Idol JI, Braden VV, Iyer LM, Cunningham AF, Weintraub LA, Touchman JW, Mohr-Tidwell RM, Peluso DC, Fulton RS, Ueltzen MS,Weissenbach J, Magness CL, Green ED (1997) A physical map of human chromosome7: an integrated YAC contig map with average STS spacing of 79 kb. Genome Res 7:673–692
Brown WM, Prager EM, Wilson AC (1982) Mitochondrial DNA sequences of primates, tempo and mode of evolution. J Mol Evol 18:225–239
Coulondre C, Miller JH, Farabaugh PJ, Gilbert W (1978) Molecular basis of base substitution hotspots in Escherichia coli. Nature 274:775–778
Curtis SE, Clegg MT (1984) Molecular evolution of chloroplast DNA sequences. Mol Biol Evol 1:291–301
Delany JC, Essigmann JM (2001) Effect of sequence context on O(6)-methylguanine repair and replication in vivo. Biochemistry 40(49):14968–14975
Denissenko MF, Chen JX, Tang M, Pfeifer GP (1997) Cytosine methylation determines hot spots of DNA damage in the human P53 gene. Proc Natl Acad Sci U S A 94(8):3893–3898
Duncan BK, Weiss B (1982) Specific mutator effects of ung (uracil-DNA glycosylase) mutation in Escherichia coli. J Bacteriol 151:750–755
Felsenstein J (1981) Evolutionary trees from DNA sequences: A maximum likelihood approach. J Mol Evol 17:368–376
Felsenstein J (2004) Inferring Phylogenies. Sinauer Associates, Sunderland, MA
Evans J, Maccabee M, Hatahet Z, Courcelle J, Bockrath R, Ide H, Wallace S (1993) Thymine ring saturation and fragmentation products: lesion bypass, misinsertion and implications for mutagenesis. Mutat Res 299:147–156
Fryxell KJ, Zuckerkandl E 2000) Cytosine deamination plays a primary role in the evolution of mammalian isochors. Mol Biol Evol 17(9):1371–1383
Gojobori T, Li W, Graur D (1982) Patterns of nucleotide substitution in pseudogenes and functional genes. J Mol Evol 18:360–369
Green PM, Montandon AJ, Bentley DR, Liung R, Nilsson IM, Giannelli F (1990) The incidence and distribution of CpG—-TpG transitions in the coagulation factor IX gene: A fresh look at CpG mutational hotspots. Nucleic Acids Res 18(11):3227–3231
Green P, Ewing B, Miller W, Thomas PJ, NISC Comparative Sequencing Program, Green ED (2003) Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet 33:514–517
Hatahet Z, Wallace SS (1998) Translesion DNA Repair, DNA Damage and Repair – Vol 1: DNA Repair in Prokaryotes and Lower Eukaryotes. Humana Press, Totowa, NJ
Hatahet Z, Zhou M, Reha-Krantz LJ, Morrical SW, Wallace SS (1998) In search of a mutational hotspot. Proc Natl Acad Sci U S A 95:8556–8561
Hatahet Z, Zhou M, Reha-Krantz LJ, Ide H, Morrical SW, Wallace SS (1999) In vitro selection of sequence contexts which enhance bypass of abasic sites and tetrahydrofuran by T4 DNA polymerase holoenzyme. J Mol Biol 286:1045–1057
Hatsukami DK, Slade J, Benowitz NL, Giovino GA, Gritz ER, Leischow S, Warner KE (2002) Reducing tobacco harm: Research challenges and issues. Nicotine Tob Res 4 Suppl 2:S89–S101
Hayes RC, LeClerc JE (1986) Sequence dependence for bypass of thymine glycols in DNA by DNA polymerase I. Nucleic Acids Res 14:1045–1061
Hess ST, Blake JD, Blake RD (1994) Wide variation in neighbor-dependent substitution rates. J Mol Biol 236:1022–1033
Ide H, Kow YW, Wallace SS (1985) Thymine glycols and urea residues in M13 DNA constitute replicative blocks in vitro. Nucleic Acids Res 13:8035–8052
Jones M, Wagner R, Radman M (1987) Repair of a mismatch is influenced by the base composition of the surrounding nucleotide sequence. Genetics 115:605–610
Jukes TH, Cantor CR (1969) Evolution of protein molecules. In Munro HN (ed) Mammalian Protein Metabolism, Academic Press, New York pp 21–123
Ketterling RP, Veilhaber E, Sommer SS (1994) The rates of G:C→T:A and G:C→C:G transversions at CpG dinucleotides in the human factor IX gene. Am J Hum Genet 54(5):832–835
Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120
Krawczak M, Ball EV, Cooper DN (1998) Neighboring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes. Am J Hum Genet 63(2):474–488
Li W (1997) Molecular Evolution. Sinauer Associates, Sunderland, MA
Maddison DR, Maddison WP, Schulz KS, Wheeler T, Frumkin J (2001) The Tree of Life Web Project. Available at http://www.tolweb.org
Mancini D, Singh S, Ainsworth P, Rodenhiser D (1997) Constitutively methylated cpG dinucleotides as mutation hot spots in the retinoblastoma gene (RB1). Am J Hum Genet 61(1):80–87
Morton BR (2003) The role of context-dependent mutations in generating compositional and codon usage bias in grass chloroplast DNA. J Mol Evol 56:616–629
Morton BR, Clegg MT (1995) Neighboring base composition is strongly correlated with base substitution bias in a region of the chloroplast genome. J Mol Evol 41(5):597–603
Morton BR, Oberholzer VM, Clegg MT (1997) The influence of specific neighboring bases on substitution bias in noncoding regions of the plant chloroplast genome. J Mol Evol 45(3):227–231
Mund C, Musch T, Strodicke M, Assmann B, Li E, Lyko F (2004) Comparative analysis of DNA methylation patterns in transgenic Drosophila overexpressing mouse DNA methyltransferases. Biochem J 378(Pt 3):763–768
Nachman MW, Crowell SL (2000) Estimate of the mutation rate per nucleotide in humans. Genetics 156:297–304
Ng PC, Henikoff S (2003) SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 31:3812–3814
Ota R, Penny D (2003) Estimating changes in mutational mechanisms of evolution. J Mol Evol 57:S23–S240
Peltonen L, McKusick VA (2001) Genomics and medicine. Dissecting human disease in the postgenomic era. Science 291:1224–1229
Petruska J, Goodman MF (1985) Influence of neighboring bases on DNA polymerase insertion and proofreading fidelity. J Biol Chem 260:7533–7539
Purmal AA, Kow YW, Wallace SS (1994) Major oxidative products of cytosine, 5-hydroxycytosine and 5-hydroxyuracil, exhibit sequence context-dependent mispairing in vitro. Nucleic Acids Res 22:72–78
Radman M, Wagner R (1986) Mismatch repair in Escherichia coli. Annu Rev Genet 20:523–538
Razin A, Riggs AD (1980) DNA methylation and gene function. Science 210:604–610
Rice P, Longden I, Bleasby A (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16(6):276–277
Santalucia J Jr, Allawi HT, Seneviratne PA (1996) Improved nearest-neighbor parameters for predicting DNA duplex stability. Biochemistry 35:3555–3562
Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, Miller M (2000) PipMaker–A web server for aligning two genomic DNA sequences. Genome Res 10:577–586
Schwartz S, Elnitski L, Li M, Weirauch M, Riemer C, Smit A, NISC Comparative Sequencing Program, Green ED, Hardison RC, Miller M (2003a) MultiPipMaker and supporting tools: alignments and analysis of multiple genomic DNA sequences. Nucleic Acids Res 31:3518–3524
Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W (2003b) Human–mouse alignments with BLASTZ. Genome Res 13:103–107
Seibert E, Ross JB, Osman R (2002) Role of DNA flexibility in sequence-dependent activity of uracil DNA glycosylase. Biochemistry 41(36):10976–10984
Shen JC, Rideout WM 3rd, Jones PA (1994) The rate of hydrolytic deamination of 5-methylcytosine in double stranded DNA. Nucleic Acids Res 22(6):972–976
Shiraishi M, Oates AJ, Sekiya T (2002) An overview of the analysis of DNA methylation in mammalian genomes. Biol Chem 383(6):893–906
Siepel A, Haussler D (2004) Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol 21:468–488
Skopek T, Marino D, Kort K, Miller J, Trumbauer M, Gopal S, Chen H (1998) Effects of target genes CpG content on spontaneous mutations in 299 transgenic mice. Mutat Res 400(1–2):77–88
Thomas JW, Summers TJ, Lee-Lin SQ, Braden Maduro VV, Idol JR, Mastrian SD, Ryan JF, Jamison DC, Green ED (2000) Comparative genome mapping in the sequence-based era: early experience with human chromosome 7. Genome Res 10:624–633
Thomas JW, Touchman JW, Blakesley RW, Bouffard GG, Beckstrom-Sternberg SM, Margulies EH, Blanchette M, Siepel AC, Thomas PJ, McDowell JC, Maskeri B, Hansen NF, Schwartz MS, Weber RJ, Kent WJ, Karolchik D, Bruen TC, Bevan R, Cutler DJ, Schwartz S, Elnitski L, Idol JR, Prasad AB, Lee-Lin S-Q, Maduro VVB, Summers TJ, Portnoy ME, Dietrich NL, Akhter N, Ayele K, Benjamin B, Cariaga K, Brinkley CP, Brooks SY, Granite S, Guan X, Gupta J, Haghighi P, Ho S-L, Huang MC, Karlins E, Laric PL, Legaspi R, Lim MJ, Maduro QL, Masiello CA, Mastrian SD, McCloskey JC, Pearson R, Strantripop S, Tiongson EE, Tran JT, Tsurgeon C, Vogt JL, Walker MA, Wetherby KD, Wiggins LS, Young AC, Zhang L-H, Osoegawa K, Zhu B, Zhao B, Shu CL, De Jong PJ, Lawrence CE, Smit AF, Chakravarti A, Haussler D, Green P, Miller W, Green ED (2003) Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424:788–793
Tornaletti S, Pfeifer GP (1995) Complete and tissue-independent methylation of CpG sites in the p53 gene:implications for mutations in human cancers. Oncogene 10(8):1493–1499
Vigilant L, Stoneking M, Harpending H, Hawkes K, Wilson AC (1991) African populations and the evolution of human mitochondrial DNA. Science 253:1503–1507
Wakeley J (1994) Substitution rate variation among sites and the estimation of transition bias. Mol Biol Evol 11:436–442
Wakeley J (1996) The excess of transitions among nucleotide substitutions: New methods of estimating transition bias underscore its significance. TREE 11:158–163
Wallace SS (2002) Biological consequences of free radical-damaged DNA bases. Free Rad Biol Med 33(1):1–14
Walsh CP, Bestor TH (1999) Cytosine methylation and mammalian development. Genes Dev 13(1):26–34
Weisenberger DJ, Romano LJ (1999) Cytosine methylation in a CpG sequence leads to enhanced reactivity with benzo[a]pyrene diol epoxide that correlates with a conformational change. J Bio Chem 274:23948–23955
Yang AS, Gonzalgo ML, Zingg JM, Millar RP, Buckley JD, Jones PA (1996) The rate of CpG mutation in Alu repetitive elements within the p53 tumor suppressor gene in the primate germline. J Mol Evol 258(2):240–250
Yang Z (1994) Estimating the pattern of nucleotide substitution. J Mol Evol 39:105–111
Yang Z, Yoder AD (1999) Estimation of transition/transversion rate bias and species sampling. J Mol Evol 48:274–283
Yang Z, Ro S, Rannala B (2003) Likelihood models of somatic mutation and codon substitution in cancer genes. Genetics 165:695–705
Zharkikh A (1994) Estimation of evolutionary distances between nucleotide sequences. J Mol Evol 39:315–329
Acknowledgments
Dr. Eric Green of NISC generously provided us with the completed sequence contigs of both targets. W. Zhang was supported by a Graduate Research Fellowship from a Vermont EPSCoR grant awarded by the U.S. Department of Energy. The computational studies used infrastructure provided by the Vermont Cancer Center, the Vermont Genetics Network, the DOE EPSCoR initiative in structural and computational biology, and the Vermont Advanced Computing Center.
Author information
Authors and Affiliations
Consortia
Corresponding author
Additional information
Reviewing Editor: Dr. Brian Morten
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Zhang, W., Bouffard, G.G., Wallace, S.S. et al. Estimation of DNA Sequence Context-dependent Mutation Rates Using Primate Genomic Sequences. J Mol Evol 65, 207–214 (2007). https://doi.org/10.1007/s00239-007-9000-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-007-9000-5