Skip to main content
Log in

Regularised PCA to denoise and visualise data

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Principal component analysis (PCA) is a well-established dimensionality reduction method commonly used to denoise and visualise data. A classical PCA model is the fixed effect model in which data are generated as a fixed structure of low rank corrupted by noise. Under this model, PCA does not provide the best recovery of the underlying signal in terms of mean squared error. Following the same principle as in ridge regression, we suggest a regularised version of PCA that essentially selects a certain number of dimensions and shrinks the corresponding singular values. Each singular value is multiplied by a term which can be seen as the ratio of the signal variance over the total variance of the associated dimension. The regularised term is analytically derived using asymptotic results and can also be justified from a Bayesian treatment of the model. Regularised PCA provides promising results in terms of the recovery of the true signal and the graphical outputs in comparison with classical PCA and with a soft thresholding estimation strategy. The distinction between PCA and regularised PCA becomes especially important in the case of very noisy data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Bartholomew, D.: Latent Variable Models and Factor Analysis. Charles Griffin and Company Limited, London (1987)

    MATH  Google Scholar 

  • Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053–2080 (2009)

    Article  Google Scholar 

  • Candès, E.J., Sing-Long, C.A., Trzasko, J.D.: Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Trans. Signal Process. 61(19), 4643–4657 (2013)

    Article  MathSciNet  Google Scholar 

  • Caussinus, H.: Models and Uses of Principal Component Analysis (with Discussion) pp. 149–178. DSWO Press, Leiden (1986)

    Google Scholar 

  • Chikuse, Y.: Statistics on Special Manifolds. Springer, Berlin (2003)

    Book  MATH  Google Scholar 

  • Cornelius, P., Crossa, J.: Prediction assessment of shrinkage estimators of multiplicative models for multi-environment cultivar trials. Crop Sci. 39, 998–1009 (1999)

    Article  Google Scholar 

  • Denis, J.B., Gower, J.C.: Asymptotic covariances for the parameters of biadditive models. Util. Math. 193–205 (1994)

  • Denis, J.B., Gower, J.C.: Asymptotic confidence regions for biadditive models: interpreting genotype-environment interactions. J. R. Stat. Soc., Ser. C, Appl. Stat. 45, 479–493 (1996)

    Google Scholar 

  • Denis, J.B., Pázman, A.: Bias of least squares estimators in nonlinear regression models with constraints. Part ii: biadditive models. Appl. Math. 44, 359–374 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Désert, C., Duclos, M., Blavy, P., Lecerf, F., Moreews, F., Klopp, C., Aubry, M., Herault, F., Le Roy, P., Berri, C., Douaire, M., Diot, C., Lagarrigue, S.: Transcriptome profiling of the feeding-to-fasting transition in chicken liver. BMC Genomics (2008)

  • Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25), 863–868 (1998)

    Article  Google Scholar 

  • Gower, J.C., Dijksterhuis, G.B.: Procrustes Problems. Oxford University Press, London (2004)

    Book  MATH  Google Scholar 

  • Greenacre, M.J.: Biplots in practice. In: BBVA Fundation (2010)

    Google Scholar 

  • Hastie, T.J., Tibshirani, R.J., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, Berlin (2009)

    Book  Google Scholar 

  • Hoff, P.D.: Model averaging and dimension selection for the singular value decomposition. J. Am. Stat. Assoc. 102(478), 674–685 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Hoff, P.D.: Simulation of the matrix Bingham–von Mises–Fisher distribution, with applications to multivariate and relational data. J. Comput. Graph. Stat. 18(2), 438–456 (2009)

    Article  MathSciNet  Google Scholar 

  • Husson, F., Le, S., Pages, J.: Exploratory Multivariate Analysis by Example Using R, 1st edn. CRC Press, Boca Raton (2010)

    Book  Google Scholar 

  • Hwang, H., Tomiuk, M., Takane, Y.: In: Correspondence Analysis, Multiple Correspondence Analysis and Recent Developments, Sage Publications, pp. 243–263 (2009)

    Google Scholar 

  • Jolliffe, I.: In: Principal Component Analysis. Springer Series in Statistics (2002)

    Google Scholar 

  • Josse, J., Husson, F.: Selecting the number of components in pca using cross-validation approximations. Comput. Stat. Data Anal. 56, 1869–1879 (2011)

    Article  MathSciNet  Google Scholar 

  • Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 99, 2287–2322 (2010)

    MathSciNet  Google Scholar 

  • Papadopoulo, T., Lourakis, M.I.A.: Estimating the Jacobian of the singular value decomposition: theory and applications. In: Proceedings of the European Conference on Computer Vision, ECCV00, pp. 554–570. Springer, Berlin (2000)

    Google Scholar 

  • R Core Team: R: a language and environment for statistical computing. In: R Foundation for Statistical Computing, Vienna, Austria (2012). http://www.R-project.org/, ISBN 3-900051-07-0

    Google Scholar 

  • Robinson, G.K.: That BLUP is a good thing: the estimation of random effects. Stat. Sci. 6(1), 15–32 (1991)

    Article  MATH  Google Scholar 

  • Roweis, S.: Em algorithms for pca and spca. In: Advances in Neural Information Processing Systems, pp. 626–632. MIT Press, Cambridge (1998)

    Google Scholar 

  • Rubin, D.B., Thayer, D.T.: EM algorithms for ML factor analysis. Psychometrika 47(1), 69–76 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  • Sharif, B., Bresler, Y.: Physiologically improved NCAT phantom (PINCAT) enables in-silico study of the effects of beat-to-beat variability on cardiac MR. In: Proceedings of the Annual Meeting of ISMRM, Berlin, p. 3418 (2007)

    Google Scholar 

  • Takane, Y., Hwang, H.: Regularized Multiple Correspondence Analysis pp. 259–279. Chapman & Hall, Boca Raton (2006)

    Book  Google Scholar 

  • Tipping, M., Bishop, C.: Probabilistic principal component analysis. J. R. Stat. Soc. B 61, 611–622 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Witten, D., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–534 (2009)

    Article  Google Scholar 

  • Witten, D., Tibshirani, R., Gross, S., Narasimhan, B.: PMA: Penalized Multivariate Analysis (2011). http://CRAN.R-project.org/package=PMA, R package version 1.0.8

  • Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julie Josse.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Verbanck, M., Josse, J. & Husson, F. Regularised PCA to denoise and visualise data. Stat Comput 25, 471–486 (2015). https://doi.org/10.1007/s11222-013-9444-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-013-9444-y

Keywords

Navigation