Skip to main content

Statistical Analysis Principles for Omics Data

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 719))

Abstract

In Omics experiments, typically thousands of hypotheses are tested simultaneously, each based on very few independent replicates. Traditional tests like the t-test were shown to perform poorly with this new type of data. Furthermore, simultaneous consideration of many hypotheses, each prone to a decision error, requires powerful adjustments for this multiple testing situation. After a general introduction to statistical testing, we present the moderated t-statistic, the SAM statistic, and the RankProduct statistic which have been developed to evaluate hypotheses in typical Omics experiments. We also provide an introduction to the multiple testing problem and discuss some state-of-the-art procedures to address this issue. The presented test statistics are subjected to a comparative analysis of a microarray experiment comparing tissue samples of two groups of tumors. All calculations can be done using the freely available statistical software R. Accompanying, commented code is available at: http://www.meduniwien.ac.at/msi/biometrie/MIMB.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Harrel, F.E. (2001) Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression and Survival Analysis. Springer Verlag, New York.

    Google Scholar 

  2. Smyth, G.K. (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3, 3.

    Google Scholar 

  3. Tusher, V.G., Tibshirani, R.J., and Chu, G. (2001) Significance analysis of microarrays applied to the ionizing radiation response. PNAS 98, 5116–21.

    Article  PubMed  CAS  Google Scholar 

  4. Benjamini, Y., and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to Multiple testing. J R Stat Soc Series B Stat Methodol 57, 375–86.

    Google Scholar 

  5. Storey, J.D. (2002) A direct approach to false discovery rates. J R Stat Soc Series B Stat Methodol B 64, 479–98.

    Article  Google Scholar 

  6. Boulesteix, A.L. (2004) PLS dimension reduction for classification in microarray data. Stat Appl Genet Mol Biol 3, 33.

    Google Scholar 

  7. Tibshirani, R.J. (1997) The LASSO method for variable selection in the Cox model. Stat Med 16, 385–95.

    Article  PubMed  CAS  Google Scholar 

  8. Dettling, M. (2005) Classification with gene expression data, pp. 421–430. In Gentleman, R., Carey, V., Huber, W., Irizarry, R.A., and Dudoit, S. (ed.), Bioinformatics and Computational Biology Solution Using R and Bioconductor. Springer, New York.

    Chapter  Google Scholar 

  9. Richardson, A.L., Wang, Z.C., De Nicolo, A., Lu, X., Brown, M., Miron, A., Liao, X., Iglehart, J.D., Livingston, D.M., and Ganesan, S. (2006) X chromosomal abnormalities in basal-like human breast cancer. Cancer Cell 9, 121–32.

    Article  PubMed  CAS  Google Scholar 

  10. Efron, B., Tibshirani, R.J., Storey, J.D., and Tusher, V.G. (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96, 1151–60.

    Article  Google Scholar 

  11. Chu, G., Narasimhan, B., Tibshirani, R.J., and Tusher, V.G. (2009) Significance Analysis of Microarrays – User’s Guide and Technical Document.

    Google Scholar 

  12. Breitling, R., Armengaud, P., Amtmann, A., and Hercyz, P. (2004) Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 573, 83–92.

    Article  PubMed  CAS  Google Scholar 

  13. Reiner, A., Yekutieli, D., and Benjamini, Y. (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19, 368–75.

    Article  PubMed  CAS  Google Scholar 

  14. Hackstadt, A.J., and Hess, A.M. (2009) Filtering for increased power for microarray data analysis. BMC Bioinformatics 10, 11.

    Article  PubMed  Google Scholar 

  15. McClintick, J.N., and Edenberg, H.J. (2006) Effects of filtering by present call on analysis of microarray experiments. BMC Bioinfor­matics 7, 49.

    Article  PubMed  Google Scholar 

  16. Lusa, L., Korn, E.L., and McShane, L.M. (2009) A class comparison method with filtering-enhanced variable selection for high-dimensional data sets. Stat Med 27, 5834–49.

    Article  Google Scholar 

  17. Gentleman, R., Carey, V., Bates, D., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J., and Zhang, J. (2004) Bioconductor: open software development for computational bio­logy and bioinformatics. Genome Biol 5, R80.

    Article  PubMed  Google Scholar 

  18. R Development Core Team (2009) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

    Google Scholar 

  19. Irizarry, R.A., Hobbs B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., and Speed, T.P. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249–64.

    Article  PubMed  Google Scholar 

  20. Bolstad, B.M., Irizarry, R.A., Astrand, M., and Speed, T.P. (2003) A comparison of normalization methods for high density oligonucleotide array data based on bias and variance. Bioinformatics 19, 185–93.

    Article  PubMed  CAS  Google Scholar 

  21. Holder, D., Raubertas, R.F., Pikounis, V.B., Svetnik, V., and Soper, K. (2001) Statistical analysis of high density oligonucleotide arrays: a SAFER approach. Proceedings of the ASA Annual Meeting, Atlanta, GA.

    Google Scholar 

  22. Amaratunga, D., and Cabrera, J. (2004) Exploration and Analysis of DNA Microarray and Protein Array Data. Wiley, Hoboken, NJ.

    Google Scholar 

  23. Taylor, S., and Pollard, K.S. (2009) Hypothesis tests for point-mass mixture data with application to Omics data with many zero values. Stat Appl Genet Mol Biol 8, 8.

    Google Scholar 

  24. Hallstrom, A.P. (2010) A modified Wilcoxon test for non-negative distributions with a clump of zeros. Stat Med 29, 391–400.

    PubMed  Google Scholar 

  25. Dakna, M., Harris, K., Kalousis, A., Carpentier, S., Kolch, W., Schanstra, J.M., Haubitz, M., Vlahou, A., Mischak, H., and Girolami, M. (2010) Addressing the challenge of defining valid proteomic biomarkers and classifiers. BMC Bioinformatics 11, 594.

    Article  PubMed  Google Scholar 

  26. Kerr, K.F. (2009) Comments on the analysis of unbalanced microarray data. Bioinformatics 25, 2035–41.

    Article  PubMed  CAS  Google Scholar 

  27. Gentleman, R., Carey, V.J., Huber, W., Irizarry, R.A., and Dudoit, S. (2005) Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, New York.

    Book  Google Scholar 

Download references

Acknowledgments

This work was partially supported by the European Union FP7 project “SysKid,” project number 241544.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georg Heinze .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Dunkler, D., Sánchez-Cabo, F., Heinze, G. (2011). Statistical Analysis Principles for Omics Data. In: Mayer, B. (eds) Bioinformatics for Omics Data. Methods in Molecular Biology, vol 719. Humana Press. https://doi.org/10.1007/978-1-61779-027-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-027-0_5

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-61779-026-3

  • Online ISBN: 978-1-61779-027-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics