Clustering

McLachlan, G. J.; Bean, R. W.; Ng, S. K.

doi:10.1007/978-1-4939-6613-4_19

G. J. McLachlan³,
R. W. Bean⁴ &
S. K. Ng⁵

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1526))

5525 Accesses
13 Citations

Abstract

Clustering techniques are used to arrange genes in some natural way, that is, to organize genes into groups or clusters with similar behavior across relevant tissue samples (or cell lines). These techniques can also be applied to tissues rather than genes. Methods such as hierarchical agglomerative clustering, k-means clustering, the self-organizing map, and model-based methods have been used. Here we focus on mixtures of normals to provide a model-based clustering of tissue samples (gene signatures) and of gene profiles, including time-course gene expression data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alizadeh A, Eisen MB, Davis RE et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511
Article CAS PubMed Google Scholar
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95:14863–14868
Article CAS PubMed PubMed Central Google Scholar
Reilly C, Wang C, Rutherford R (2005) A rapid method for the comparison of cluster analyses. Stat Sin 15:19–33
Google Scholar
Coleman D, Dong XP, Hardin J, Rocke DM, Woodruff DL (1999) Some computational issues in cluster analysis with no a priori metric. Comput Stat Data Anal 31:1–11
Article Google Scholar
Everitt BS (1993) Cluster analysis, 3rd edn. Edward Arnold, London
Google Scholar
Hartigan JA (1975) Clustering algorithms. Wiley, New York
Google Scholar
Hastie T, Tibshirani RJ, Friedman JH (2001) The elements of statistical learning. Springer, New York
Book Google Scholar
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Book Google Scholar
Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge
Book Google Scholar
Seber GAF (1984) Multivariate observations. Wiley, New York
Book Google Scholar
Kettenring JR (2006) The practice of cluster analysis. J Classif 23:3–30
Article Google Scholar
Marriott FHC (1974) The interpretation of multiple observations. Academic, London
Google Scholar
Cormack RM (1971) A review of classification (with discussion). J R Stat Soc A 134:321–367
Article Google Scholar
Hand DJ, Heard NA (2005) Finding groups in gene expression data. J Biomed Biotechnol 2005:215–225
Article PubMed PubMed Central Google Scholar
Alon U, Barkai N, Notterman DA, Gish K et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci U S A 96:6745–6750
Article CAS PubMed PubMed Central Google Scholar
Chipman H, Tibshirani R (2006) Hybrid hierarchical clustering with applications to microarray data. Biostatistics 7:286–301
Article PubMed Google Scholar
Kohonen T (1989) Self-organization and associative memory, 3rd edn. Springer, Berlin
Book Google Scholar
Friedman HP, Rubin J (1967) On some invariant criteria for grouping data. J Am Stat Assoc 62:1159–1178
Article Google Scholar
Scott AJ, Symons MJ (1971) Clustering methods based on likelihood ratio criteria. Biometrics 27:387–397
Article Google Scholar
Hartigan JA (1975) Statistical theory in clustering. J Classif 2:63–76
Article Google Scholar
McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New York
Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Book Google Scholar
Aitkin M, Anderson D, Hinde J (1981) Statistical modelling of data on teaching styles (with discussion). J R Stat Soc A 144:419–461
Article Google Scholar
Pollard KS, van der Laan MJ (2002) Statistical inference for simultaneous clustering of gene expression data. Math Biosci 176:99–121
Article CAS PubMed Google Scholar
Getz G, Levine E, Domany E (2000) Coupled two-way clustering analysis of gene microarray data. Cell Biol 97:12079–12084
CAS Google Scholar
Ambroise C, Govaert G (2006) Model based hierarchical clustering. Unpublished manuscript
Google Scholar
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244
Article Google Scholar
Lance GN, Williams WT (1967) A generalized theory of classificatory sorting strategies: I. Hierarchical systems. Comput J 9:373–380
Article Google Scholar
Ghosh D, Chinnaiyan AM (2002) Mixture modelling of gene expression data from microarray experiments. Bioinformatics 18:275–286
Article CAS PubMed Google Scholar
Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17:977–987
Article CAS PubMed Google Scholar
McLachlan GJ, Bean RW, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18:413–422
Article CAS PubMed Google Scholar
Medvedovic M, Sivaganesan S (2002) Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics 18:1194–1206
Article CAS PubMed Google Scholar
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821
Article Google Scholar
Friedman JH, Meulman JJ (2004) Clustering objects on subsets of attributes (with discussion). J R Stat Soc B 66:815–849
Article Google Scholar
Belitskaya-Levy I (2006) A generalized clustering problem, with application to DNA microarrays. Stat Appl Genet Mol Biol 5, Article 2
Google Scholar
Ng SK, McLachlan GJ, Wang K, Ben-Tovim Jones L, Ng S-W (2006) A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics 22:1745–1752
Article CAS PubMed Google Scholar
Wang K, Ng SK, McLachlan GJ (2012) Clustering of time-course gene expression profiles using normal mixture models with autoregressive random-effects. BMC Bioinformatics 13:300
Article CAS PubMed PubMed Central Google Scholar
Cho RJ, Huang M, Campbell MJ, Dong H, Steinmetz L, Sapinoso L, Hampton G, Elledge SJ, Davis RW, Lockhart DJ (2001) Transcriptional regulation and function during the human cell cycle. Nat Genet 27:48–54
CAS PubMed Google Scholar
Kim BR, Zhang L, Berg A, Fan J, Wu R (2008) A computational approach to the functional clustering of periodic gene-expression profiles. Genetics 180:821–834
Article PubMed PubMed Central Google Scholar
Wong DSV, Wong FK, Wood GR (2007) A multi-stage approach to clustering and imputation of gene expression profiles. Bioinformatics 23:998–1005
Article CAS PubMed Google Scholar
McLachlan GJ (1987) On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Appl Stat 36:318–324
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, The University of Queensland, Brisbane, QLD, Australia
G. J. McLachlan
Department of Health, The University of Queensland, Brisbane, QLD, Australia
R. W. Bean
School of Medicine, Griffith Health Institute, Griffith University, Brisbane, QLD, Australia
S. K. Ng

Authors

G. J. McLachlan
View author publications
You can also search for this author in PubMed Google Scholar
R. W. Bean
View author publications
You can also search for this author in PubMed Google Scholar
S. K. Ng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. J. McLachlan .

Editor information

Editors and Affiliations

Monash University, Melbourne, Victoria, Australia
Jonathan M. Keith

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

McLachlan, G.J., Bean, R.W., Ng, S.K. (2017). Clustering. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1526. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6613-4_19

Download citation

DOI: https://doi.org/10.1007/978-1-4939-6613-4_19
Published: 29 November 2016
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6611-0
Online ISBN: 978-1-4939-6613-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics