Statistical Analysis Principles for Omics Data

Dunkler, Daniela; Sánchez-Cabo, Fátima; Heinze, Georg

doi:10.1007/978-1-61779-027-0_5

Statistical Analysis Principles for Omics Data

Daniela Dunkler²,
Fátima Sánchez-Cabo³ &
Georg Heinze²

Protocol
First Online: 01 January 2011

6230 Accesses
12 Citations

Part of the book series: Methods in Molecular Biology ((MIMB,volume 719))

Abstract

In Omics experiments, typically thousands of hypotheses are tested simultaneously, each based on very few independent replicates. Traditional tests like the t-test were shown to perform poorly with this new type of data. Furthermore, simultaneous consideration of many hypotheses, each prone to a decision error, requires powerful adjustments for this multiple testing situation. After a general introduction to statistical testing, we present the moderated t-statistic, the SAM statistic, and the RankProduct statistic which have been developed to evaluate hypotheses in typical Omics experiments. We also provide an introduction to the multiple testing problem and discuss some state-of-the-art procedures to address this issue. The presented test statistics are subjected to a comparative analysis of a microarray experiment comparing tissue samples of two groups of tumors. All calculations can be done using the freely available statistical software R. Accompanying, commented code is available at: http://www.meduniwien.ac.at/msi/biometrie/MIMB.

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

Harrel, F.E. (2001) Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression and Survival Analysis. Springer Verlag, New York.
Google Scholar
Smyth, G.K. (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3, 3.
Google Scholar
Tusher, V.G., Tibshirani, R.J., and Chu, G. (2001) Significance analysis of microarrays applied to the ionizing radiation response. PNAS 98, 5116–21.
Article PubMed CAS Google Scholar
Benjamini, Y., and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to Multiple testing. J R Stat Soc Series B Stat Methodol 57, 375–86.
Google Scholar
Storey, J.D. (2002) A direct approach to false discovery rates. J R Stat Soc Series B Stat Methodol B 64, 479–98.
Article Google Scholar
Boulesteix, A.L. (2004) PLS dimension reduction for classification in microarray data. Stat Appl Genet Mol Biol 3, 33.
Google Scholar
Tibshirani, R.J. (1997) The LASSO method for variable selection in the Cox model. Stat Med 16, 385–95.
Article PubMed CAS Google Scholar
Dettling, M. (2005) Classification with gene expression data, pp. 421–430. In Gentleman, R., Carey, V., Huber, W., Irizarry, R.A., and Dudoit, S. (ed.), Bioinformatics and Computational Biology Solution Using R and Bioconductor. Springer, New York.
Chapter Google Scholar
Richardson, A.L., Wang, Z.C., De Nicolo, A., Lu, X., Brown, M., Miron, A., Liao, X., Iglehart, J.D., Livingston, D.M., and Ganesan, S. (2006) X chromosomal abnormalities in basal-like human breast cancer. Cancer Cell 9, 121–32.
Article PubMed CAS Google Scholar
Efron, B., Tibshirani, R.J., Storey, J.D., and Tusher, V.G. (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96, 1151–60.
Article Google Scholar
Chu, G., Narasimhan, B., Tibshirani, R.J., and Tusher, V.G. (2009) Significance Analysis of Microarrays – User’s Guide and Technical Document.
Google Scholar
Breitling, R., Armengaud, P., Amtmann, A., and Hercyz, P. (2004) Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 573, 83–92.
Article PubMed CAS Google Scholar
Reiner, A., Yekutieli, D., and Benjamini, Y. (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19, 368–75.
Article PubMed CAS Google Scholar
Hackstadt, A.J., and Hess, A.M. (2009) Filtering for increased power for microarray data analysis. BMC Bioinformatics 10, 11.
Article PubMed Google Scholar
McClintick, J.N., and Edenberg, H.J. (2006) Effects of filtering by present call on analysis of microarray experiments. BMC Bioinformatics 7, 49.
Article PubMed Google Scholar
Lusa, L., Korn, E.L., and McShane, L.M. (2009) A class comparison method with filtering-enhanced variable selection for high-dimensional data sets. Stat Med 27, 5834–49.
Article Google Scholar
Gentleman, R., Carey, V., Bates, D., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J., and Zhang, J. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5, R80.
Article PubMed Google Scholar
R Development Core Team (2009) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Google Scholar
Irizarry, R.A., Hobbs B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., and Speed, T.P. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249–64.
Article PubMed Google Scholar
Bolstad, B.M., Irizarry, R.A., Astrand, M., and Speed, T.P. (2003) A comparison of normalization methods for high density oligonucleotide array data based on bias and variance. Bioinformatics 19, 185–93.
Article PubMed CAS Google Scholar
Holder, D., Raubertas, R.F., Pikounis, V.B., Svetnik, V., and Soper, K. (2001) Statistical analysis of high density oligonucleotide arrays: a SAFER approach. Proceedings of the ASA Annual Meeting, Atlanta, GA.
Google Scholar
Amaratunga, D., and Cabrera, J. (2004) Exploration and Analysis of DNA Microarray and Protein Array Data. Wiley, Hoboken, NJ.
Google Scholar
Taylor, S., and Pollard, K.S. (2009) Hypothesis tests for point-mass mixture data with application to Omics data with many zero values. Stat Appl Genet Mol Biol 8, 8.
Google Scholar
Hallstrom, A.P. (2010) A modified Wilcoxon test for non-negative distributions with a clump of zeros. Stat Med 29, 391–400.
PubMed Google Scholar
Dakna, M., Harris, K., Kalousis, A., Carpentier, S., Kolch, W., Schanstra, J.M., Haubitz, M., Vlahou, A., Mischak, H., and Girolami, M. (2010) Addressing the challenge of defining valid proteomic biomarkers and classifiers. BMC Bioinformatics 11, 594.
Article PubMed Google Scholar
Kerr, K.F. (2009) Comments on the analysis of unbalanced microarray data. Bioinformatics 25, 2035–41.
Article PubMed CAS Google Scholar
Gentleman, R., Carey, V.J., Huber, W., Irizarry, R.A., and Dudoit, S. (2005) Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, New York.
Book Google Scholar

Download references

Acknowledgments

This work was partially supported by the European Union FP7 project “SysKid,” project number 241544.

Author information

Authors and Affiliations

Section of Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, Austria
Daniela Dunkler & Georg Heinze
Genomics Unit, Centro Nacional de Investigaciones Cardiovasculares, Madrid, Spain
Fátima Sánchez-Cabo

Authors

Daniela Dunkler
View author publications
You can also search for this author in PubMed Google Scholar
Fátima Sánchez-Cabo
View author publications
You can also search for this author in PubMed Google Scholar
Georg Heinze
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georg Heinze .

Editor information

Editors and Affiliations

emergentec biodevelopment GmbH, Gersthofer Strasse 29-31, Vienna, 1180, Austria
Bernd Mayer

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Dunkler, D., Sánchez-Cabo, F., Heinze, G. (2011). Statistical Analysis Principles for Omics Data. In: Mayer, B. (eds) Bioinformatics for Omics Data. Methods in Molecular Biology, vol 719. Humana Press. https://doi.org/10.1007/978-1-61779-027-0_5

Download citation

DOI: https://doi.org/10.1007/978-1-61779-027-0_5
Published: 29 January 2011
Publisher Name: Humana Press
Print ISBN: 978-1-61779-026-3
Online ISBN: 978-1-61779-027-0
eBook Packages: Springer Protocols

Publish with us

Policies and ethics