Abstract
In Omics experiments, typically thousands of hypotheses are tested simultaneously, each based on very few independent replicates. Traditional tests like the t-test were shown to perform poorly with this new type of data. Furthermore, simultaneous consideration of many hypotheses, each prone to a decision error, requires powerful adjustments for this multiple testing situation. After a general introduction to statistical testing, we present the moderated t-statistic, the SAM statistic, and the RankProduct statistic which have been developed to evaluate hypotheses in typical Omics experiments. We also provide an introduction to the multiple testing problem and discuss some state-of-the-art procedures to address this issue. The presented test statistics are subjected to a comparative analysis of a microarray experiment comparing tissue samples of two groups of tumors. All calculations can be done using the freely available statistical software R. Accompanying, commented code is available at: http://www.meduniwien.ac.at/msi/biometrie/MIMB.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Harrel, F.E. (2001) Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression and Survival Analysis. Springer Verlag, New York.
Smyth, G.K. (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3, 3.
Tusher, V.G., Tibshirani, R.J., and Chu, G. (2001) Significance analysis of microarrays applied to the ionizing radiation response. PNAS 98, 5116–21.
Benjamini, Y., and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to Multiple testing. J R Stat Soc Series B Stat Methodol 57, 375–86.
Storey, J.D. (2002) A direct approach to false discovery rates. J R Stat Soc Series B Stat Methodol B 64, 479–98.
Boulesteix, A.L. (2004) PLS dimension reduction for classification in microarray data. Stat Appl Genet Mol Biol 3, 33.
Tibshirani, R.J. (1997) The LASSO method for variable selection in the Cox model. Stat Med 16, 385–95.
Dettling, M. (2005) Classification with gene expression data, pp. 421–430. In Gentleman, R., Carey, V., Huber, W., Irizarry, R.A., and Dudoit, S. (ed.), Bioinformatics and Computational Biology Solution Using R and Bioconductor. Springer, New York.
Richardson, A.L., Wang, Z.C., De Nicolo, A., Lu, X., Brown, M., Miron, A., Liao, X., Iglehart, J.D., Livingston, D.M., and Ganesan, S. (2006) X chromosomal abnormalities in basal-like human breast cancer. Cancer Cell 9, 121–32.
Efron, B., Tibshirani, R.J., Storey, J.D., and Tusher, V.G. (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96, 1151–60.
Chu, G., Narasimhan, B., Tibshirani, R.J., and Tusher, V.G. (2009) Significance Analysis of Microarrays – User’s Guide and Technical Document.
Breitling, R., Armengaud, P., Amtmann, A., and Hercyz, P. (2004) Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 573, 83–92.
Reiner, A., Yekutieli, D., and Benjamini, Y. (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19, 368–75.
Hackstadt, A.J., and Hess, A.M. (2009) Filtering for increased power for microarray data analysis. BMC Bioinformatics 10, 11.
McClintick, J.N., and Edenberg, H.J. (2006) Effects of filtering by present call on analysis of microarray experiments. BMC Bioinformatics 7, 49.
Lusa, L., Korn, E.L., and McShane, L.M. (2009) A class comparison method with filtering-enhanced variable selection for high-dimensional data sets. Stat Med 27, 5834–49.
Gentleman, R., Carey, V., Bates, D., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J., and Zhang, J. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5, R80.
R Development Core Team (2009) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Irizarry, R.A., Hobbs B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., and Speed, T.P. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249–64.
Bolstad, B.M., Irizarry, R.A., Astrand, M., and Speed, T.P. (2003) A comparison of normalization methods for high density oligonucleotide array data based on bias and variance. Bioinformatics 19, 185–93.
Holder, D., Raubertas, R.F., Pikounis, V.B., Svetnik, V., and Soper, K. (2001) Statistical analysis of high density oligonucleotide arrays: a SAFER approach. Proceedings of the ASA Annual Meeting, Atlanta, GA.
Amaratunga, D., and Cabrera, J. (2004) Exploration and Analysis of DNA Microarray and Protein Array Data. Wiley, Hoboken, NJ.
Taylor, S., and Pollard, K.S. (2009) Hypothesis tests for point-mass mixture data with application to Omics data with many zero values. Stat Appl Genet Mol Biol 8, 8.
Hallstrom, A.P. (2010) A modified Wilcoxon test for non-negative distributions with a clump of zeros. Stat Med 29, 391–400.
Dakna, M., Harris, K., Kalousis, A., Carpentier, S., Kolch, W., Schanstra, J.M., Haubitz, M., Vlahou, A., Mischak, H., and Girolami, M. (2010) Addressing the challenge of defining valid proteomic biomarkers and classifiers. BMC Bioinformatics 11, 594.
Kerr, K.F. (2009) Comments on the analysis of unbalanced microarray data. Bioinformatics 25, 2035–41.
Gentleman, R., Carey, V.J., Huber, W., Irizarry, R.A., and Dudoit, S. (2005) Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, New York.
Acknowledgments
This work was partially supported by the European Union FP7 project “SysKid,” project number 241544.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Dunkler, D., Sánchez-Cabo, F., Heinze, G. (2011). Statistical Analysis Principles for Omics Data. In: Mayer, B. (eds) Bioinformatics for Omics Data. Methods in Molecular Biology, vol 719. Humana Press. https://doi.org/10.1007/978-1-61779-027-0_5
Download citation
DOI: https://doi.org/10.1007/978-1-61779-027-0_5
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-61779-026-3
Online ISBN: 978-1-61779-027-0
eBook Packages: Springer Protocols