Finite mixtures of multivariate skew t-distributions: some recent and new results

Lee, Sharon; McLachlan, Geoffrey J.

doi:10.1007/s11222-012-9362-4

Finite mixtures of multivariate skew t-distributions: some recent and new results

Published: 20 October 2012

Volume 24, pages 181–202, (2014)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Sharon Lee¹ &
Geoffrey J. McLachlan¹

2561 Accesses
139 Citations
Explore all metrics

Abstract

Finite mixtures of multivariate skew t (MST) distributions have proven to be useful in modelling heterogeneous data with asymmetric and heavy tail behaviour. Recently, they have been exploited as an effective tool for modelling flow cytometric data. A number of algorithms for the computation of the maximum likelihood (ML) estimates for the model parameters of mixtures of MST distributions have been put forward in recent years. These implementations use various characterizations of the MST distribution, which are similar but not identical. While exact implementation of the expectation-maximization (EM) algorithm can be achieved for ‘restricted’ characterizations of the component skew t-distributions, Monte Carlo (MC) methods have been used to fit the ‘unrestricted’ models. In this paper, we review several recent fitting algorithms for finite mixtures of multivariate skew t-distributions, at the same time clarifying some of the connections between the various existing proposals. In particular, recent results have shown that the EM algorithm can be implemented exactly for faster computation of ML estimates for mixtures with unrestricted MST components. The gain in computational time is effected by noting that the semi-infinite integrals on the E-step of the EM algorithm can be put in the form of moments of the truncated multivariate non-central t-distribution, similar to the restricted case, which subsequently can be expressed in terms of the non-truncated form of the central t-distribution function for which fast algorithms are available. We present comparisons to illustrate the relative performance of the restricted and unrestricted models, and demonstrate the usefulness of the recently proposed methodology for the unrestricted MST mixture, by some applications to three real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of Mixture Models to Large Datasets

Unsupervised Component-Wise EM Learning for Finite Mixtures of Skew t-distributions

Flexible Modelling via Multivariate Skew Distributions

References

Akaike, H.: A new look at the statistical model identification. Autom. Control 19, 716–723 (1974)
Article MATH MathSciNet Google Scholar
Arellano-Valle, R., Bolfarine, H., Lachos, V.: Bayesian inference for skew-normal linear mixed models. J. Appl. Stat. 34(6), 663–682 (2007)
Article MathSciNet Google Scholar
Arellano-Valle, R.B., Azzalini, A.: On the unification of families of skew-normal distributions. Scand. J. Stat. 33, 561–574 (2006)
Article MATH MathSciNet Google Scholar
Arellano-Valle, R.B., Genton, M.G.: On fundamental skew distributions. J. Multivar. Anal. 96, 93–116 (2005)
Article MATH MathSciNet Google Scholar
Arnold, B.C., Beaver, R.J.: Skewed multivariate models related to hidden truncation and/or selective reporting. Test 11, 7–54 (2002)
Article MATH MathSciNet Google Scholar
Azzalini, A.: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1985)
MATH MathSciNet Google Scholar
Azzalini, A.: The skew-normal distribution and related multivariate families. Scand. J. Stat. 32, 159–188 (2005)
Article MATH MathSciNet Google Scholar
Azzalini, A., Capitanio, A.: Distribution generated by perturbation of symmetry with emphasis on a multivariate skew t distribution. J. R. Stat. Soc., Ser. B 65, 367–389 (2003)
Article MATH MathSciNet Google Scholar
Azzalini, A., Dalla, Valle A.: The multivariate skew-normal distribution. Biometrika 83(4), 715–726 (1996)
Article MATH MathSciNet Google Scholar
Banfield, J.D., Raftery, A.: Model-based gaussian and non-gaussian clustering. Biometrics 49, 803–821 (1993)
Article MATH MathSciNet Google Scholar
Basso, R.M., Lachos, V.H., Cabral, C.R.B., Ghosh, P.: Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput. Stat. Data Anal. 54, 2926–2941 (2010)
Article MathSciNet Google Scholar
Böhning, D.: Computer-Assisted Analysis of Mixtures and Applications: Meta-Analysis, Discase Mapping and Others. Chapman and Hall, New York (1999)
Google Scholar
Branco, M.D., Dey, D.K.: A general class of multivariate skew-elliptical distributions. J. Multivar. Anal. 79, 99–113 (2001)
Article MATH MathSciNet Google Scholar
Brinkman, R., Gaspareto, M., Lee, S.J., Ribickas, A., Perkins, J., Janssen, W., Smiley, R., Smith, C.: High content flow cytometry and temporal data analysis for defining a cellular signature of graft versus host disease. Biol. Blood Marrow Transplant. 13, 691–700 (2007)
Article Google Scholar
Cabral, C., Bolfarine, H., Pereira, J.: Bayesian density estimation using skew student-t-normal mixtures. Comput. Stat. Data Anal. 52, 5075–5090 (2008)
Article MATH MathSciNet Google Scholar
Cabral, C., Lachos, V., Prates, M.: Multivariate mixture modeling using skew-normal independent distributions. Comput. Stat. Data Anal. 56, 126–142 (2012)
Article MATH MathSciNet Google Scholar
Dempster, A., Laird, N.M., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc., Ser. B 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Everitt, B.S., Hand, D.J.: Finite Mixture Distributions. Chapman and Hall, London (1981)
Book MATH Google Scholar
Fraley, C., Raftery, A.E.: How many clusters? Which clustering methods? Answers via model-based cluster analysis. Comput. J. 41, 578–588 (1999)
Article Google Scholar
Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006)
MATH Google Scholar
Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. Biostatistics 11, 317–336 (2010)
Article Google Scholar
Genz, A., Bretz, F.: Methods for the computation of multivariate t-probabilities. J. Comput. Graph. Stat. 11, 950–971 (2002)
Article MathSciNet Google Scholar
Gómez, H., Venegas, O., Bolfarine, H.: Skew-symmetric distributions generated by the distribution function of the normal distribution. Environmetrics 18, 395–407 (2007)
Article MathSciNet Google Scholar
González-Farás, G., Domínguez-Molinz, J.A., Gupta, A.K.: Additive properties of skew normal random vectors. J. Stat. Plan. Inference 126, 521–534 (2004)
Article Google Scholar
Green, P.J.: On use of the em algorithm for penalized likelihood estimation. J. R. Stat. Soc. B 52, 443–452 (1990)
MATH Google Scholar
Gupta, A.K.: Multivariate skew-t distribution. Statistics 37, 359–363 (2003)
Article MATH MathSciNet Google Scholar
Ho, H., Lin, T., Chen, H., Wang, W.: Some results on the truncated multivariate t distribution. J. Stat. Plan. Inference 142, 25–40 (2012a)
Article MATH MathSciNet Google Scholar
Ho, H., Pyne, S., Lin, T.: Maximum likelihood inference for mixtures of skew student-t-normal distributions through practical em-type algorithms. Stat. Comput. 22, 287–299 (2012b)
Article MathSciNet Google Scholar
Karlis, D., Santourian, A.: Model-based clustering with non-elliptically contoured distributions. Stat. Comput. 19, 73–83 (2009)
Article MathSciNet Google Scholar
Karlis, D., Xekalaki, E.: Choosing initial values for the em algorithm for finite mixtures. Comput. Stat. Data Anal. 41, 577–590 (2003)
Article MATH MathSciNet Google Scholar
Kotz, S., Nadarajah, S.: Multivariate t Distributions and Their Applications. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Lachos, V.H., Ghosh, P., Arellano-Valle, R.B.: Likelihood based inference for skew normal independent linear mixed models. Stat. Sin. 20, 303–322 (2010)
MATH MathSciNet Google Scholar
Lee, S., McLachlan, G.: On the fitting of mixtures of multivariate skew t-distributions via the em algorithm (2011). arXiv:1109.4706 [statME]
Lin, T.I.: Maximum likelihood estimation for multivariate skew-normal mixture models. J. Multivar. Anal. 100, 257–265 (2009)
Article MATH Google Scholar
Lin, T.I.: Robust mixture modeling using multivariate skew t distribution. Stat. Comput. 20, 343–356 (2010)
Article MathSciNet Google Scholar
Lin, T.I., Lee, J.C., Hsieh, W.J.: Robust mixture modeling using the skew-t distribution. Stat. Comput. 17, 81–92 (2007a)
Article MathSciNet Google Scholar
Lin, T.I., Lee, J.C., Yen, S.Y.: Finite mixture modelling using the skew normal distribution. Stat. Sin. 17, 909–927 (2007b)
MATH MathSciNet Google Scholar
Lindsay, B.G.: Mixture Models: Theory, Geometry, and Applications. NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5. Institute of Mathematical Statistics, Hayward (1995)
MATH Google Scholar
Liseo, B., Loperfido, N.: A bayesian interpretation of the multivariate skew-normal distribution. Stat. Probab. Lett. 61, 395–401 (2003)
Article MATH MathSciNet Google Scholar
Liu, C., Rubin, D.: The ecme algorithm: a simple extension of the em and ecm with faster monotone convergence. Biometrika 81, 633–648 (1994)
Article MATH MathSciNet Google Scholar
Maier, L.M., Anderson, D.E., De Jager, P.L., Wicker, L., Hafler, D.A.: Allelic variant in ctla4 alters t cell phosphorylation patterns. Proc. Natl. Acad. Sci. USA 104, 18607–18612 (2007)
Article Google Scholar
McLachlan, G., Peel, D.: Robust cluster analysis via mixtures of multivariate t-distributions. In: Amin, A., Dori, D., Pudil, P., Freeman, H. (eds.) Lecture Notes in Computer Science, vol. 1451, pp. 658–666. Springer, Berlin (1998)
Google Scholar
McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Applications. Dekker, New York (1988)
MATH Google Scholar
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley Series in Probability and Statistics (2000)
Book MATH Google Scholar
O’Hagan, A.: Bayes estimation of a convex quadratic. Biometrika 60, 565–571 (1973)
Article MATH MathSciNet Google Scholar
O’Hagan, A.: Moments of the truncated multivariate-t distribution (1976). http://www.tonyohagan.co.uk/academic/pdf/trunc_multi_t.PDF
O’Hagan, A., Murphy, T., Gormley, I.: Computational aspects of fitting mixture models via the expectation-maximization algorithm. Comput. Stat. Data Anal. 56, 3843–3864 (2012)
Article MATH MathSciNet Google Scholar
Peel, D., McLachlan, G.: Robust mixture modelling using the t distribution. Stat. Comput. 10, 339–348 (2000)
Article Google Scholar
Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L.M., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., De Jager, P.L., Mesirow, J.P.: Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA 106, 8519–8524 (2009)
Article Google Scholar
Sahu, S., Dey, D., Branco, M.: A new class of multivariate skew distributions with applications to bayesian regression models. Can. J. Stat. 31, 129–150 (2003). Eratum: Can. J. Stat. 37, 301–302 (2009)
Article MATH MathSciNet Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Article MATH Google Scholar
Titterington, D.M., Smith, A.F.M., Markov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985)
MATH Google Scholar
Vrbik, I., McNicholas, P.: Analytic calculations for the em algorithm for multivariate skew t-mixture models. Stat. Probab. Lett. 82, 1169–1174 (2012)
Article MATH MathSciNet Google Scholar
Wang, K.: EMMIX-skew: EM algorithm for mixture of multivariate skew normal/t distributions (2009). http://www.maths.uq.edu.au/gjm/mix_soft/EMMIX-skew, R package version 1.0-12
Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew t mixture models: applications: applications to fluorescence-activated cell sorting data. In: Shi, H., Zhang, Y., Botema, M., Lovell, B., Maoder, A. (eds.) DICTA 2009 (Conference of Digital Image Computing: Techniques and Applications, Melbourne), pp. 526–531. IEEE Comput. Soc., Los Alamitos (2009)
Chapter Google Scholar

Download references

Acknowledgements

We would like to thank Professor Seung-Gu Kim for comments and corrections, and Drs. Kui (Sam) Wang and Saumyadipta Pyne for their helpful discussions on this topic.

Author information

Authors and Affiliations

Department of Mathematics, University of Queensland, St Lucia, 4072, Australia
Sharon Lee & Geoffrey J. McLachlan

Authors

Sharon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey J. McLachlan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Geoffrey J. McLachlan.

Appendices

Appendix A: The truncated multivariate t-distribution

In this appendix, we briefly describe the truncated multivariate t-distribution and provide some formulas for computing its moments (Lee and McLachlan 2011). These expressions are crucial for the swift evaluation of the conditional expectations on the E-step of the FM-uMST model discussed in Sect. 5. We follow the approach of Lee and McLachlan (2011). A alternative description is given by Ho et al. (2012a), which provides equivalent expressions for the doubly truncated case.

Let X be a p-dimensional random variable having a multivariate t-distribution with location vector μ, scale matrix Σ, and ν degrees of freedom. Truncating x to the hyperplane region $\mathbb{A} = \{\boldsymbol{x} \geq\boldsymbol{a}, \boldsymbol{a} \in\mathbb{R}^{p}\}$, where x≥a means each element x _i=(x)_i is greater than or equal to a _i=(a)_i for i=1,…,p, results in a left-truncated t-distribution whose density is given by

$$ f_{\mathbb{A}}(\boldsymbol{x}; \boldsymbol{\mu}, \boldsymbol{\varSigma }, \nu) = T_{p,\nu}^{-1} (\boldsymbol{a};\boldsymbol{\mu},\boldsymbol { \varSigma} ) t_{p,\nu} (\boldsymbol{x};\boldsymbol{\mu}, \boldsymbol{ \varSigma } ),\quad \boldsymbol{x} \in\mathbb{A}. $$

(69)

For a random vector X with density (69), we write $\boldsymbol{X} \sim tt_{p,\nu} (\boldsymbol{\mu}, \boldsymbol{\varSigma}; \mathbb{A})$. For our purposes, we will be concerned with the first two moments of X, specifically E(X) and E(XX ^T). Explicit formulas for the truncated central t-distribution in the univariate case $tt_{1,\nu}(0, \sigma^{2}; \mathbb {A})$ were provided by O’Hagan (1973), who expressed the moments in terms of the non-truncated t-distribution. The multivariate case was studied in O’Hagan (1976), but still considering the central case only. Here we describe a generalization of the results in O’Hagan (1976) to the multivariate non-central case and express them in a form suitable for undertaking the E-step in the direct application of the EM algorithm to the fitting of mixtures of MST distributions.

Before presenting the expressions, it will be convenient to introduce some notation. Let x be a vector, then

x _i :: denotes the ith element,
x _ij :: is a two-dimensional vector with elements x _i and x _j,
x _−i :: represents the (p−1)-dimensional vector with the ith element removed, and
x _−ij :: represents the (p−2)-dimensional vector with the ith and jth elements removed.

For a matrix X, let

x _ij :: denote the ijth element,
X _ij :: defines the 2×2 matrix consisting of the elements x _ii, x _ij, x _ji and x _jj,
X _−i :: be created by removing the ith row and column from X,
X _−ij :: be the (p−2) square matrix resulting from the removal of the ith and jth row and column from X, and
X _(ij) :: be the ith and jth column of X with the elements of X _ij removed, yielding a (p−2)×2 matrix.

We now proceed to the expressions for the first two moments of X.

One can show that the first moment of (69) is

$$ E (\boldsymbol{X} ) = \boldsymbol{\mu}+ \boldsymbol {\epsilon}, $$

(70)

where ϵ=c ⁻¹ Σξ and c=T _p,ν(μ−a;0,Σ), and ξ is a p×1 vector with elements

for i=1,…,p, and where

The second moment is given by

(71)

where H is a p×p matrix with off-diagonal elements

and diagonal elements,

It is worth noting that evaluation of the expressions (70) and (71) rely on algorithms for computing the multivariate central t-distribution function for which highly efficient procedures are readily available in many statistical packages. For example, an implementation of Genz’s algorithm (Genz and Bretz 2002; Kotz and Nadarajah 2004) is provided by the mvtnorm package available from the R website.

Appendix B: E-step for uMST

Derivations of $e_{1,j}^{(k)}$, $\boldsymbol{e}_{3,j}^{(k)}$ and $\boldsymbol{e}_{4,j}^{(k)}$ are detailed as follows.

2.1 B.1 Calculation of $e_{1,j}^{(k)}$

Concerning the calculation of the expectation $e_{1,j}^{(k)}$, the conditional density of W _j given y _j, is given by

(72)

where

and 0 is the zero vector of appropriate dimension.

The conditional expectation $E_{\boldsymbol{\theta}^{(k)}}\{\log(W_{j}) \mid\boldsymbol {y}_{j}\}$ can be reduced to

(73)

where

$$\boldsymbol{y}_{2j}^{(k)} = \boldsymbol{q}_j^{(k)} \sqrt{\frac{\nu ^{(k)}+p+2}{\nu^{(k)}+d^{(k)}(\boldsymbol{y}_j)}}, $$

and where the last term S is given by

(74)

and $S_{1,j}^{(k)}$ is an integral given by

(75)

Combining (73) and (74), $e_{1,j}^{(k)}$ can be reduced to

(76)

We note that the term $S_{1,j}^{(k)}$ will be very small in practice since it would be zero if we adopted an OSL EM algorithm. In which case, there would be no need to calculate the multiple integral $S_{1,j}^{(k)}$ in (74). Hence then, $e_{1,j}^{(k)}$ can be reduced to

(77)

2.2 B.2 Calculation of $\boldsymbol{e}_{3,j}^{(k)}$ and $\boldsymbol{e}_{4,j}^{(k)}$

To obtain $\boldsymbol{e}_{3,j}^{(k)}$ and $\boldsymbol{e}_{4,j}^{(k)}$, first note that the joint density of y _j, u _j, and w _j is given by

(78)

Using Bayes’ rule, the conditional density of u _j and w _j given y _j can be written as

(79)

From (79), standard conditional expectation calculations yield

(80)

where X _j is a p-dimensional t-variate truncated to the positive hyperplane ℝ⁺, which is conditionally distributed as

$$ \boldsymbol{X}_j \mid\boldsymbol{y}_j \sim tt_{p,\nu^{(k)}+p+2} \biggl(\boldsymbol{q}_j^{(k)}, \biggl( \frac{\nu^{(k)}+d^{(k)}(\boldsymbol{y}_j)}{\nu^{(k)}+p+2} \biggr) \boldsymbol{\varLambda}^{(k)}; \mathbb{R}^+ \biggr). $$

(81)

Analogously, $\boldsymbol{e}_{4,j}^{(k)}$ can be reduced to

$$ \boldsymbol{e}_{4,j}^{(k)} = e_{2,j}^{(k)} E \bigl(\boldsymbol{X}_j \boldsymbol{X}_j^T \mid \boldsymbol{y}_j\bigr). $$

(82)

The truncated moments E(X _j∣y _j) and $E(\boldsymbol{X}_{j} \boldsymbol{X}_{j}^{T} \mid\boldsymbol{y}_{j})$ can be swiftly evaluated using the expressions (70) and (71) in Sect. 3.2.

Appendix C: E-step for FM-uMST

The four conditional expectations $e_{1,hj}^{(k)}$, $e_{2,hj}^{(k)}$, $\boldsymbol{e}_{3,hj}^{(k)}$, and $\boldsymbol{e}_{4,hj}^{(k)}$ involved in the E-step are given by

(83)

(84)

(85)

(86)

where $S_{1,hj}^{(k)}$ is a scalar defined by

(87)

and X _hj is a truncated p-dimensional t-variate given by

$$\boldsymbol{X}_{hj} \mid\boldsymbol{y}_j \sim tt_{p,\nu_h+p+2} \biggl(\boldsymbol{q}_{hj}^{(k)}, \biggl( \frac{\nu_h^{(k)}+d_h^{(k)}(\boldsymbol{y}_j)}{ \nu_h^{(k)}+p+2} \biggr)\boldsymbol{\varLambda}_h^{(k)}, \mathbb{R}^+ \biggr). $$

The first two moments of X _hj can be implicitly expressed in terms of the parameters $\boldsymbol{q}_{hj}^{(k)}$, $d_{h}^{(k)}(\boldsymbol{y}_{j})$, $\boldsymbol{\varLambda}_{h}^{(k)}$, $\nu_{h}^{(k)}$ using results (70) and (71). It is worth emphasizing that computation of $\boldsymbol{e}_{3hj}^{(k)}$ and $\boldsymbol{e}_{4hj}^{(k)}$ depends on algorithms for evaluating the multivariate t-distribution function, for which fast procedures are available.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, S., McLachlan, G.J. Finite mixtures of multivariate skew t-distributions: some recent and new results. Stat Comput 24, 181–202 (2014). https://doi.org/10.1007/s11222-012-9362-4

Download citation

Received: 20 January 2012
Accepted: 06 October 2012
Published: 20 October 2012
Issue Date: March 2014
DOI: https://doi.org/10.1007/s11222-012-9362-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finite mixtures of multivariate skew t-distributions: some recent and new results

Abstract

Access this article

Similar content being viewed by others

Application of Mixture Models to Large Datasets

Unsupervised Component-Wise EM Learning for Finite Mixtures of Skew t-distributions

Flexible Modelling via Multivariate Skew Distributions

References

Acknowledgements