Skip to main content

Clustering

  • Protocol
  • First Online:
Bioinformatics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1526))

Abstract

Clustering techniques are used to arrange genes in some natural way, that is, to organize genes into groups or clusters with similar behavior across relevant tissue samples (or cell lines). These techniques can also be applied to tissues rather than genes. Methods such as hierarchical agglomerative clustering, k-means clustering, the self-organizing map, and model-based methods have been used. Here we focus on mixtures of normals to provide a model-based clustering of tissue samples (gene signatures) and of gene profiles, including time-course gene expression data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alizadeh A, Eisen MB, Davis RE et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511

    Article  CAS  PubMed  Google Scholar 

  2. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95:14863–14868

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Reilly C, Wang C, Rutherford R (2005) A rapid method for the comparison of cluster analyses. Stat Sin 15:19–33

    Google Scholar 

  4. Coleman D, Dong XP, Hardin J, Rocke DM, Woodruff DL (1999) Some computational issues in cluster analysis with no a priori metric. Comput Stat Data Anal 31:1–11

    Article  Google Scholar 

  5. Everitt BS (1993) Cluster analysis, 3rd edn. Edward Arnold, London

    Google Scholar 

  6. Hartigan JA (1975) Clustering algorithms. Wiley, New York

    Google Scholar 

  7. Hastie T, Tibshirani RJ, Friedman JH (2001) The elements of statistical learning. Springer, New York

    Book  Google Scholar 

  8. Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York

    Book  Google Scholar 

  9. Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge

    Book  Google Scholar 

  10. Seber GAF (1984) Multivariate observations. Wiley, New York

    Book  Google Scholar 

  11. Kettenring JR (2006) The practice of cluster analysis. J Classif 23:3–30

    Article  Google Scholar 

  12. Marriott FHC (1974) The interpretation of multiple observations. Academic, London

    Google Scholar 

  13. Cormack RM (1971) A review of classification (with discussion). J R Stat Soc A 134:321–367

    Article  Google Scholar 

  14. Hand DJ, Heard NA (2005) Finding groups in gene expression data. J Biomed Biotechnol 2005:215–225

    Article  PubMed  PubMed Central  Google Scholar 

  15. Alon U, Barkai N, Notterman DA, Gish K et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci U S A 96:6745–6750

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Chipman H, Tibshirani R (2006) Hybrid hierarchical clustering with applications to microarray data. Biostatistics 7:286–301

    Article  PubMed  Google Scholar 

  17. Kohonen T (1989) Self-organization and associative memory, 3rd edn. Springer, Berlin

    Book  Google Scholar 

  18. Friedman HP, Rubin J (1967) On some invariant criteria for grouping data. J Am Stat Assoc 62:1159–1178

    Article  Google Scholar 

  19. Scott AJ, Symons MJ (1971) Clustering methods based on likelihood ratio criteria. Biometrics 27:387–397

    Article  Google Scholar 

  20. Hartigan JA (1975) Statistical theory in clustering. J Classif 2:63–76

    Article  Google Scholar 

  21. McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New York

    Google Scholar 

  22. McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    Book  Google Scholar 

  23. Aitkin M, Anderson D, Hinde J (1981) Statistical modelling of data on teaching styles (with discussion). J R Stat Soc A 144:419–461

    Article  Google Scholar 

  24. Pollard KS, van der Laan MJ (2002) Statistical inference for simultaneous clustering of gene expression data. Math Biosci 176:99–121

    Article  CAS  PubMed  Google Scholar 

  25. Getz G, Levine E, Domany E (2000) Coupled two-way clustering analysis of gene microarray data. Cell Biol 97:12079–12084

    CAS  Google Scholar 

  26. Ambroise C, Govaert G (2006) Model based hierarchical clustering. Unpublished manuscript

    Google Scholar 

  27. Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244

    Article  Google Scholar 

  28. Lance GN, Williams WT (1967) A generalized theory of classificatory sorting strategies: I. Hierarchical systems. Comput J 9:373–380

    Article  Google Scholar 

  29. Ghosh D, Chinnaiyan AM (2002) Mixture modelling of gene expression data from microarray experiments. Bioinformatics 18:275–286

    Article  CAS  PubMed  Google Scholar 

  30. Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17:977–987

    Article  CAS  PubMed  Google Scholar 

  31. McLachlan GJ, Bean RW, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18:413–422

    Article  CAS  PubMed  Google Scholar 

  32. Medvedovic M, Sivaganesan S (2002) Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics 18:1194–1206

    Article  CAS  PubMed  Google Scholar 

  33. Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821

    Article  Google Scholar 

  34. Friedman JH, Meulman JJ (2004) Clustering objects on subsets of attributes (with discussion). J R Stat Soc B 66:815–849

    Article  Google Scholar 

  35. Belitskaya-Levy I (2006) A generalized clustering problem, with application to DNA microarrays. Stat Appl Genet Mol Biol 5, Article 2

    Google Scholar 

  36. Ng SK, McLachlan GJ, Wang K, Ben-Tovim Jones L, Ng S-W (2006) A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics 22:1745–1752

    Article  CAS  PubMed  Google Scholar 

  37. Wang K, Ng SK, McLachlan GJ (2012) Clustering of time-course gene expression profiles using normal mixture models with autoregressive random-effects. BMC Bioinformatics 13:300

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Cho RJ, Huang M, Campbell MJ, Dong H, Steinmetz L, Sapinoso L, Hampton G, Elledge SJ, Davis RW, Lockhart DJ (2001) Transcriptional regulation and function during the human cell cycle. Nat Genet 27:48–54

    CAS  PubMed  Google Scholar 

  39. Kim BR, Zhang L, Berg A, Fan J, Wu R (2008) A computational approach to the functional clustering of periodic gene-expression profiles. Genetics 180:821–834

    Article  PubMed  PubMed Central  Google Scholar 

  40. Wong DSV, Wong FK, Wood GR (2007) A multi-stage approach to clustering and imputation of gene expression profiles. Bioinformatics 23:998–1005

    Article  CAS  PubMed  Google Scholar 

  41. McLachlan GJ (1987) On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Appl Stat 36:318–324

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. J. McLachlan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this protocol

Cite this protocol

McLachlan, G.J., Bean, R.W., Ng, S.K. (2017). Clustering. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1526. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6613-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6613-4_19

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6611-0

  • Online ISBN: 978-1-4939-6613-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics