A Broad Overview of Computational Methods for Predicting the Pathophysiological Effects of Non-synonymous Variants

Castellana, Stefano; Fusilli, Caterina; Mazza, Tommaso

doi:10.1007/978-1-4939-3572-7_22

Stefano Castellana⁴,
Caterina Fusilli⁵ &
Tommaso Mazza⁵

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1415))

4324 Accesses
3 Citations

Abstract

Next-generation sequencing has provided extraordinary opportunities to investigate the massive human genetic variability. It helped identifying several kinds of genomic mismatches from the wild-type reference genome sequences and to explain the onset of several pathogenic phenotypes and diseases susceptibility. In this context, distinguishing pathogenic from functionally neutral amino acid changes turns out to be a task as useful as complex, expensive, and time-consuming.

Here, we present an exhaustive and up-to-dated survey of the algorithms and software packages conceived for the estimation of the putative pathogenicity of mutations, along with a description of the most popular mutation datasets that these tools used as training sets. Finally, we present and describe software for the prediction of cancer-related mutations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Marth GT, Yu F, Indap AR, Garimella K, Gravel S, Leong WF, Tyler-Smith C, Bainbridge M, Blackwell T, Zheng-Bradley X, Chen Y, Challis D, Clarke L, Ball EV, Cibulskis K, Cooper DN, Fulton B, Hartl C, Koboldt D, Muzny D, Smith R, Sougnez C, Stewart C, Ward A, Yu J, Xue Y, Altshuler D, Bustamante CD, Clark AG, Daly M, DePristo M, Flicek P, Gabriel S, Mardis E, Palotie A, Gibbs R, Genomes P (2011) The functional spectrum of low-frequency coding variation. Genome Biol 12(9):R84. doi:10.1186/gb-2011-12-9-r84
Article PubMed PubMed Central Google Scholar
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29(1):308–311
Article CAS PubMed PubMed Central Google Scholar
Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A (2010) Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 20(1):110–121. doi:10.1101/gr.097857.109
Article CAS PubMed PubMed Central Google Scholar
Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S (2010) Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol 6(12):e1001025. doi:10.1371/journal.pcbi.1001025
Article PubMed PubMed Central Google Scholar
Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, Fagan P, Marvin J, Padilla D, Ravichandran V, Schneider B, Thanki N, Weissig H, Westbrook JD, Zardecki C (2002) The Protein Data Bank. Acta Crystallogr D Biol Crystallogr 58(Pt 6 No 1):899–907
Article PubMed Google Scholar
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer EL, Tate J, Punta M (2014) Pfam: the protein families database. Nucleic Acids Res 42(Database issue):D222–D230. doi:10.1093/nar/gkt1223
Article CAS PubMed PubMed Central Google Scholar
UniProt C (2015) UniProt: a hub for protein information. Nucleic Acids Res 43(Database issue):D204–D212. doi:10.1093/nar/gku989
Google Scholar
Li MX, Kwan JS, Bao SY, Yang W, Ho SL, Song YQ, Sham PC (2013) Predicting mendelian disease-causing non-synonymous single nucleotide variants in exome sequencing studies. PLoS Genet 9(1):e1003143. doi:10.1371/journal.pgen.1003143
Article CAS PubMed PubMed Central Google Scholar
Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA, 1000 Genomes Project Consortium (2010) A map of human genome variation from population-scale sequencing. Nature 467(7319):1061–1073. doi:10.1038/nature09534
Article PubMed Google Scholar
Mottaz A, David FP, Veuthey AL, Yip YL (2010) Easy retrieval of single amino-acid polymorphisms and phenotype information using SwissVar. Bioinformatics 26(6):851–852. doi:10.1093/bioinformatics/btq028
Article CAS PubMed PubMed Central Google Scholar
Bendl J, Stourac J, Salanda O, Pavelka A, Wieben ED, Zendulka J, Brezovsky J, Damborsky J (2014) PredictSNP: robust and accurate consensus classifier for prediction of disease-related mutations. PLoS Comput Biol 10(1):e1003440. doi:10.1371/journal.pcbi.1003440
Article PubMed PubMed Central Google Scholar
Thusberg J, Olatubosun A, Vihinen M (2011) Performance of mutation pathogenicity prediction methods on missense variants. Hum Mutat 32(4):358–368. doi:10.1002/humu.21445
Article PubMed Google Scholar
Giardine B, Riemer C, Hefferon T, Thomas D, Hsu F, Zielenski J, Sang Y, Elnitski L, Cutting G, Trumbower H, Kern A, Kuhn R, Patrinos GP, Hughes J, Higgs D, Chui D, Scriver C, Phommarinh M, Patnaik SK, Blumenfeld O, Gottlieb B, Vihinen M, Valiaho J, Kent J, Miller W, Hardison RC (2007) PhenCode: connecting ENCODE data with mutations and phenotype. Hum Mutat 28(6):554–562. doi:10.1002/humu.20484
Article CAS PubMed Google Scholar
Grimm DG, Azencott CA, Aicheler F, Gieraths U, MacArthur DG, Samocha KE, Cooper DN, Stenson PD, Daly MJ, Smoller JW, Duncan LE, Borgwardt KM (2015) The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Hum Mutat 36(5):513–523. doi:10.1002/humu.22768
Article PubMed Google Scholar
Yip YL, Famiglietti M, Gos A, Duek PD, David FP, Gateau A, Bairoch A (2008) Annotating single amino acid polymorphisms in the UniProt/Swiss-Prot knowledgebase. Hum Mutat 29(3):361–366. doi:10.1002/humu.20671
Article CAS PubMed Google Scholar
Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46(3):310–315. doi:10.1038/ng.2892
Article CAS PubMed PubMed Central Google Scholar
Schaafsma GC, Vihinen M (2015) VariSNP, a benchmark database for variations from dbSNP. Hum Mutat 36(2):161–166. doi:10.1002/humu.22727
Article CAS PubMed Google Scholar
Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR (2014) ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 42(Database issue):D980–D985. doi:10.1093/nar/gkt1113
Article CAS PubMed PubMed Central Google Scholar
Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M (2005) The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol 6(5):R44. doi:10.1186/gb-2005-6-5-r44
Article PubMed PubMed Central Google Scholar
Ruiz-Pesini E, Lott MT, Procaccio V, Poole JC, Brandon MC, Mishmar D, Yi C, Kreuziger J, Baldi P, Wallace DC (2007) An enhanced MITOMAP with a global mtDNA mutational phylogeny. Nucleic Acids Res 35(Database issue):D823–D828. doi:10.1093/nar/gkl927
Article CAS PubMed PubMed Central Google Scholar
Castellana S, Ronai J, Mazza T (2015) MitImpact: an exhaustive collection of pre-computed pathogenicity predictions of human mitochondrial non-synonymous variants. Hum Mutat 36(2):E2413–E2422. doi:10.1002/humu.22720
Article CAS PubMed Google Scholar
Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR (2010) A method and server for predicting damaging missense mutations. Nat Methods 7(4):248–249. doi:10.1038/nmeth0410-248
Article CAS PubMed PubMed Central Google Scholar
Ng PC, Henikoff S (2003) SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res 31(13):3812–3814
Article CAS PubMed PubMed Central Google Scholar
Reva B, Antipin Y, Sander C (2011) Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res 39(17):e118. doi:10.1093/nar/gkr407
Article CAS PubMed PubMed Central Google Scholar
Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E (2009) EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 19(2):327–335. doi:10.1101/gr.073585.107
Article CAS PubMed PubMed Central Google Scholar
McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F (2010) Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26(16):2069–2070. doi:10.1093/bioinformatics/btq330
Article CAS PubMed PubMed Central Google Scholar
Consortium EP (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306(5696):636–640. doi:10.1126/science.1105136
Article Google Scholar
Schwarz JM, Cooper DN, Schuelke M, Seelow D (2014) MutationTaster2: mutation prediction for the deep-sequencing age. Nat Methods 11(4):361–362. doi:10.1038/nmeth.2890
Article CAS PubMed Google Scholar
Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GL, Edwards KJ, Day IN, Gaunt TR (2013) Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum Mutat 34(1):57–65. doi:10.1002/humu.22225
Article CAS PubMed PubMed Central Google Scholar
Hicks S, Wheeler DA, Plon SE, Kimmel M (2011) Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed. Hum Mutat 32(6):661–668. doi:10.1002/humu.21490
Article CAS PubMed PubMed Central Google Scholar
Mistry J, Finn RD, Eddy SR, Bateman A, Punta M (2013) Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res 41(12):e121. doi:10.1093/nar/gkt263
Article CAS PubMed PubMed Central Google Scholar
Mi H, Muruganujan A, Casagrande JT, Thomas PD (2013) Large-scale gene function analysis with the PANTHER classification system. Nat Protoc 8(8):1551–1566. doi:10.1038/nprot.2013.092
Article PubMed Google Scholar
Calabrese R, Capriotti E, Fariselli P, Martelli PL, Casadio R (2009) Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum Mutat 30(8):1237–1244. doi:10.1002/humu.21047
Article CAS PubMed Google Scholar
Zeng S, Yang J, Chung BH, Lau YL, Yang W (2014) EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome. BMC Genomics 15:455. doi:10.1186/1471-2164-15-455
Article PubMed PubMed Central Google Scholar
Tavtigian SV, Deffenbaugh AM, Yin L, Judkins T, Scholl T, Samollow PB, de Silva D, Zharkikh A, Thomas A (2006) Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral. J Med Genet 43(4):295–305. doi:10.1136/jmg.2005.033878
Article CAS PubMed PubMed Central Google Scholar
Luu TD, Rusu A, Walter V, Linard B, Poidevin L, Ripp R, Moulinier L, Muller J, Raffelsberger W, Wicker N, Lecompte O, Thompson JD, Poch O, Nguyen H (2012) KD4v: comprehensible knowledge discovery system for missense variant. Nucleic Acids Res 40(Web Server issue):W71–W75. doi:10.1093/nar/gks474
Article CAS PubMed PubMed Central Google Scholar
Li B, Krishnan VG, Mort ME, Xin F, Kamati KK, Cooper DN, Mooney SD, Radivojac P (2009) Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics 25(21):2744–2750. doi:10.1093/bioinformatics/btp528
Article CAS PubMed PubMed Central Google Scholar
Choi Y, Sims GE, Murphy S, Miller JR, Chan AP (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS One 7(10):e46688. doi:10.1371/journal.pone.0046688
Article CAS PubMed PubMed Central Google Scholar
Kumar S, Sanderford M, Gray VE, Ye J, Liu L (2012) Evolutionary diagnosis method for variants in personal exomes. Nat Methods 9(9):855–856. doi:10.1038/nmeth.2147
Article CAS PubMed Google Scholar
Venselaar H, Te Beek TA, Kuipers RK, Hekkelman ML, Vriend G (2010) Protein structure analysis of mutations causing inheritable diseases. An e-Science approach with life scientist friendly interfaces. BMC Bioinformatics 11:548. doi:10.1186/1471-2105-11-548
Article PubMed PubMed Central Google Scholar
Krieger E, Koraimann G, Vriend G (2002) Increasing the precision of comparative models with YASARA NOVA—a self-parameterizing force field. Proteins 47(3):393–402
Article CAS PubMed Google Scholar
Hekkelman ML, Te Beek TA, Pettifer SR, Thorne D, Attwood TK, Vriend G (2010) WIWS: a protein structure bioinformatics Web service collection. Nucleic Acids Res 38(Web Server issue):W719–W723. doi:10.1093/nar/gkq453
Article CAS PubMed PubMed Central Google Scholar
De Baets G, Van Durme J, Reumers J, Maurer-Stroh S, Vanhee P, Dopazo J, Schymkowitz J, Rousseau F (2012) SNPeffect 4.0: on-line prediction of molecular and structural effects of protein-coding variants. Nucleic Acids Res 40(Database issue):D935–D939. doi:10.1093/nar/gkr996
Article PubMed PubMed Central Google Scholar
Carter H, Douville C, Stenson PD, Cooper DN, Karchin R (2013) Identifying Mendelian disease genes with the variant effect scoring tool. BMC Genomics 14(Suppl 3):S3. doi:10.1186/1471-2164-14-S3-S3
Article PubMed PubMed Central Google Scholar
Douville C, Carter H, Kim R, Niknafs N, Diekhans M, Stenson PD, Cooper DN, Ryan M, Karchin R (2013) CRAVAT: cancer-related analysis of variants toolkit. Bioinformatics 29(5):647–648. doi:10.1093/bioinformatics/btt017
Article CAS PubMed PubMed Central Google Scholar
Yue P, Moult J (2006) Identification and analysis of deleterious human SNPs. J Mol Biol 356(5):1263–1274. doi:10.1016/j.jmb.2005.12.025
Article CAS PubMed Google Scholar
Liu X, Jian X, Boerwinkle E (2011) dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat 32(8):894–899. doi:10.1002/humu.21517
Article CAS PubMed PubMed Central Google Scholar
Chun S, Fay JC (2009) Identification of deleterious mutations within three human genomes. Genome Res 19(9):1553–1561. doi:10.1101/gr.092619.109
Article CAS PubMed PubMed Central Google Scholar
Dong C, Wei P, Jian X, Gibbs R, Boerwinkle E, Wang K, Liu X (2015) Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Hum Mol Genet 24(8):2125–2137. doi:10.1093/hmg/ddu733
Article CAS PubMed PubMed Central Google Scholar
Pesole G, Saccone C (2001) A novel method for estimating substitution rate variation among sites in a large dataset of homologous DNA sequences. Genetics 157(2):859–865
CAS PubMed PubMed Central Google Scholar
Gonzalez-Perez A, Lopez-Bigas N (2011) Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am J Hum Genet 88(4):440–449. doi:10.1016/j.ajhg.2011.03.004
Article CAS PubMed PubMed Central Google Scholar
Clifford RJ, Edmonson MN, Nguyen C, Buetow KH (2004) Large-scale analysis of non-synonymous coding region single nucleotide polymorphisms. Bioinformatics 20(7):1006–1014. doi:10.1093/bioinformatics/bth029
Article CAS PubMed Google Scholar
Stone EA, Sidow A (2005) Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res 15(7):978–986. doi:10.1101/gr.3804205
Article CAS PubMed PubMed Central Google Scholar
Lopes MC, Joyce C, Ritchie GR, John SL, Cunningham F, Asimit J, Zeggini E (2012) A combined functional annotation score for non-synonymous variants. Hum Hered 73(1):47–51. doi:10.1159/000334984
Article CAS PubMed PubMed Central Google Scholar
Frousios K, Iliopoulos CS, Schlitt T, Simpson MA (2013) Predicting the functional consequences of non-synonymous DNA sequence variants--evaluation of bioinformatics tools and development of a consensus strategy. Genomics 102(4):223–228. doi:10.1016/j.ygeno.2013.06.005
Article CAS PubMed Google Scholar
Olatubosun A, Valiaho J, Harkonen J, Thusberg J, Vihinen M (2012) PON-P: integrated predictor for pathogenicity of missense variants. Hum Mutat 33(8):1166–1174. doi:10.1002/humu.22102
Article CAS PubMed Google Scholar
Capriotti E, Calabrese R, Casadio R (2006) Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics 22(22):2729–2734. doi:10.1093/bioinformatics/btl423
Article CAS PubMed Google Scholar
Bromberg Y, Rost B (2007) SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res 35(11):3823–3835. doi:10.1093/nar/gkm238
Article CAS PubMed PubMed Central Google Scholar
Limongelli I, Marini S, Bellazzi R (2015) PaPI: pseudo amino acid composition to score human protein-coding variants. BMC Bioinformatics 16:123. doi:10.1186/s12859-015-0554-8
Article PubMed PubMed Central Google Scholar
Capriotti E, Altman RB, Bromberg Y (2013) Collective judgment predicts disease-associated single nucleotide variants. BMC Genomics 14(Suppl 3):S2. doi:10.1186/1471-2164-14-S3-S2
Article PubMed PubMed Central Google Scholar
Carter H, Chen S, Isik L, Tyekucheva S, Velculescu VE, Kinzler KW, Vogelstein B, Karchin R (2009) Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer Res 69(16):6660–6667. doi:10.1158/0008-5472.CAN-09-1133
Article CAS PubMed PubMed Central Google Scholar
Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung K, Menzies A, Teague JW, Campbell PJ, Stratton MR, Futreal PA (2011) COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res 39(Database issue):D945–D950. doi:10.1093/nar/gkq929
Article CAS PubMed PubMed Central Google Scholar
Gonzalez-Perez A, Deu-Pons J, Lopez-Bigas N (2012) Improving the prediction of the functional impact of cancer mutations by baseline tolerance transformation. Genome Med 4(11):89. doi:10.1186/gm390
Article PubMed PubMed Central Google Scholar
Mao Y, Chen H, Liang H, Meric-Bernstam F, Mills GB, Chen K (2013) CanDrA: cancer-specific driver missense mutation annotation with optimized features. PLoS One 8(10):e77945. doi:10.1371/journal.pone.0077945
Article CAS PubMed PubMed Central Google Scholar
Cancer Genome Atlas Research Network (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455(7216):1061–1068. doi:10.1038/nature07385
Article Google Scholar
Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehar J, Kryukov GV, Sonkin D, Reddy A, Liu M, Murray L, Berger MF, Monahan JE, Morais P, Meltzer J, Korejwa A, Jane-Valbuena J, Mapa FA, Thibault J, Bric-Furlong E, Raman P, Shipway A, Engels IH, Cheng J, Yu GK, Yu J, Aspesi P Jr, de Silva M, Jagtap K, Jones MD, Wang L, Hatton C, Palescandolo E, Gupta S, Mahan S, Sougnez C, Onofrio RC, Liefeld T, MacConaill L, Winckler W, Reich M, Li N, Mesirov JP, Gabriel SB, Getz G, Ardlie K, Chan V, Myer VE, Weber BL, Porter J, Warmuth M, Finan P, Harris JL, Meyerson M, Golub TR, Morrissey MP, Sellers WR, Schlegel R, Garraway LA (2012) The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483(7391):603–607. doi:10.1038/nature11003
Article CAS PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini 1, 71013, San Giovanni Rotondo (FG), Italy
Stefano Castellana
IRCCS Casa Sollievo della Sofferenza, Viale Regina Margherita 261, 00198, Rome, Italy
Caterina Fusilli & Tommaso Mazza

Authors

Stefano Castellana
View author publications
You can also search for this author in PubMed Google Scholar
Caterina Fusilli
View author publications
You can also search for this author in PubMed Google Scholar
Tommaso Mazza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tommaso Mazza .

Editor information

Editors and Affiliations

Max F. Perutz Laboratories GmbH, Universität Wien, Wien, Austria
Oliviero Carugo
Technology and Research (A*STAR), Agency for Science, Singapore, Singapore
Frank Eisenhaber

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Castellana, S., Fusilli, C., Mazza, T. (2016). A Broad Overview of Computational Methods for Predicting the Pathophysiological Effects of Non-synonymous Variants. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 1415. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3572-7_22

Download citation

DOI: https://doi.org/10.1007/978-1-4939-3572-7_22
Published: 27 April 2016
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3570-3
Online ISBN: 978-1-4939-3572-7
eBook Packages: Springer Protocols

Publish with us

Policies and ethics