Successful strategies for human microbiome data generation, storage and analyses

Holmes, Susan

doi:10.1007/s12038-019-9934-y

Successful strategies for human microbiome data generation, storage and analyses

Review
Published: 20 September 2019

Volume 44, article number 111, (2019)
Cite this article

Journal of Biosciences Aims and scope Submit manuscript

Susan Holmes¹

502 Accesses
2 Citations
4 Altmetric
Explore all metrics

Abstract

Current interest in the potential for clinical use of new tools for improving human health are now focused on techniques for the study of the human microbiome and its interaction with environmental and clinical covariates. This review outlines the use of statistical strategies that have been developed in past studies and can inform successful design and analyses of controlled perturbation experiments performed in the human microbiome. We carefully outline what the data are, their imperfections and how we need to transform, decontaminate and denoise them. We show how to identify the important unknown parameters and how to can leverage variability we see to produce efficient models for prediction and uncertainty quantification. We encourage a reproducible strategy that builds on best practice principles that can be adapted for effective experimental design and reproducible workflows. Nonparametric, data-driven denoising strategies already provide the best strain identification and decontamination methods. Data driven models can be combined with uncertainty quantification to provide reproducible aids to decision making in the clinical context, as long as careful, separate, registered confirmatory testing are undertaken. Here we provide guidelines for effective longitudinal studies and their analyses. Lessons learned along the way are that visualizations at every step can pinpoint problems and outliers, normalization and filtering improve power in downstream testing. We recommend collecting and binding the metadata and covariates to sample descriptors and recording complete computer scripts into an R markdown supplement that can reduce opportunities for human error and enable collaborators and readers to replicate all the steps of the study. Finally, we note that optimizing the bioinformatic and statistical workflow involves adopting a wait-and-see approach that is particularly effective in cases where the features such as ‘mass spectrometry peaks’ and metagenomic tables can only be partially annotated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A practical guide to amplicon and metagenomic analysis of microbiome data

Article Open access 11 May 2020

Sample size recommendations for studies on reliability and measurement error: an online application based on simulation studies

Article Open access 23 November 2022

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Article Open access 19 December 2014

Notes

See Don Knuth’s famous quote that ‘premature optimization is the root of all evil in computer programming’.

References

Callahan B, McMurdie P, Rosen M, Han A, Johnson A and Holmes S 2016a Dada2: high resolution sample inference from amplicon data. Nat. Methods 13 581
Article CAS Google Scholar
Callahan B, Proctor D, Relman D, Fukuyama J and Holmes S 2016b Reproducible research workflow in r for the analysis of personalized human microbiome data. In Biocomputing 2016: Proceedings of the Pacific Symposium (World Scientific) pp 183–194
Callahan BJ, Sankaran K, Fukuyama JA, McMurdie PJ and Holmes SP 2016c Bioconductor workflow for microbiome data analysis: from raw reads to community analyses. F1000Research 5
Callahan BJ, McMurdie PJ and Holmes SP 2017 Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 10.1038/ismej.2017.119
Article PubMed PubMed Central Google Scholar
Davis NM, Proctor D, Holmes SP, Relman DA and Callahan BJ 2018 Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data. Microbiome 6 226
Article Google Scholar
DiGiulio D, Callahan BJ, McMurdie PJ, Costello EK, Lyell DJ, Robaczewska A, Sun CL, Goltsman DSA, Wong RJ, Shaw G, Stevenson DK, Holmes S and Relman RDA 2015 Temporal and spatial variation of the human microbiota during pregnancy. PNAS 112 11060–11065
Article CAS Google Scholar
Fukuyama J 2017 Adaptive gpca: a method for structured dimensionality reduction arXiv:170200501
Fukuyama J, Rumker L, Sankaran K, Jeganathan P, Dethlefsen L, Relman DA and Holmes SP 2017 Multidomain analyses of a longitudinal human micro- biome intestinal cleanout perturbation experiment. PLOS Comput. Biol. https://doi.org/10.1371/journal.pcbi.1005706
Book Google Scholar
Holmes I, Harris K and Quince C 2012 Dirichlet multinomial mixtures: generative models for microbial metagenomics. PLoS ONE 7 e30126
Article CAS Google Scholar
Holmes S and Huber W 2019 Modern statistics for modern biology (Cambridge University Press, Cambridge, UK) http://web.stanford.edu/class/bios221/book/
Ioannidis JP 2005 Why most published research findings are false. PLoS Med. 2 e124
Article Google Scholar
Jeganathan P, Callahan BJ, Proctor DM, Relman DA and Holmes SP 2018 The block bootstrap method for longitudinal microbiome data. arXiv:180901832
Karstens L, Asquith M, Caruso V, Rosenbaum JT, Fair DA, Braun J, Gregory WT, Nardos R and McWeeney SK 2018 Community profiling of the urinary microbiota: considerations for low-biomass samples. Nat. Rev. Urol. 12 1
Google Scholar
Leek JT and Storey JD 2007 Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3 e161
Article Google Scholar
Love MI, Huber W and Anders S 2014 Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15 550
Article Google Scholar
McMurdie PJ and Holmes S 2012 Phyloseq: a bioconductor package for handling and analysis of high-throughput phylogenetic sequence data. Pac. Symp. Biocomput. 17 235–246
Google Scholar
McMurdie PJ and Holmes S 2013 Phyloseq: reproducible research platform for bacterial census data. Plos ONE 8 e61217
Article CAS Google Scholar
McMurdie PJ and Holmes S 2014 Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput. Biol. 10 e1003531
Article Google Scholar
Proctor DM, Fukuyama JA, Loomer PM, Armitage GC, Lee SA, Davis NM, Ryder MI, Holmes SP and Relman DA 2018 A spatial gradient of bacterial diversity in the human oral cavity shaped by salivary flow. Nat. Commun. 9 681
Article Google Scholar
Purdom E 2010 Analysis of a data matrix and a graph: metagenomic data and the phylogenetic tree. Ann. Appl. Stat. 5 2326–2358
Article Google Scholar
Ren B, Bacallado S, Favaro S, Holmes S and Trippa L 2017 Bayesian nonparametric ordination for the analysis of microbial communities. J. Am. Stat. Assoc. 112 1430–1442
Article CAS Google Scholar
Sankaran K and Holmes S 2018 Latent variable modeling for the microbiome. Biostatistics kxy018 31–47
Google Scholar

Download references

Acknowledgements

The work was partly supported by NIH Grant AI112401. The author is thankful to Dr. Yogesh Shouche and the team at ICMR2018 for the opportunity to provide this short personal review of the challenges in designing and analyzing microbiome studies.

Author information

Authors and Affiliations

Statistics Department, Sequoia Hall, Stanford, CA, 94305, USA
Susan Holmes

Authors

Susan Holmes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Susan Holmes.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Holmes, S. Successful strategies for human microbiome data generation, storage and analyses. J Biosci 44, 111 (2019). https://doi.org/10.1007/s12038-019-9934-y

Download citation

Published: 20 September 2019
DOI: https://doi.org/10.1007/s12038-019-9934-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Successful strategies for human microbiome data generation, storage and analyses

Abstract

Access this article

Similar content being viewed by others

A practical guide to amplicon and metagenomic analysis of microbiome data

Sample size recommendations for studies on reliability and measurement error: an online application based on simulation studies

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Successful strategies for human microbiome data generation, storage and analyses

Abstract

Access this article

Similar content being viewed by others

A practical guide to amplicon and metagenomic analysis of microbiome data

Sample size recommendations for studies on reliability and measurement error: an online application based on simulation studies

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation