Abstract
Finite mixtures of multivariate skew t (MST) distributions have proven to be useful in modelling heterogeneous data with asymmetric and heavy tail behaviour. Recently, they have been exploited as an effective tool for modelling flow cytometric data. A number of algorithms for the computation of the maximum likelihood (ML) estimates for the model parameters of mixtures of MST distributions have been put forward in recent years. These implementations use various characterizations of the MST distribution, which are similar but not identical. While exact implementation of the expectation-maximization (EM) algorithm can be achieved for ‘restricted’ characterizations of the component skew t-distributions, Monte Carlo (MC) methods have been used to fit the ‘unrestricted’ models. In this paper, we review several recent fitting algorithms for finite mixtures of multivariate skew t-distributions, at the same time clarifying some of the connections between the various existing proposals. In particular, recent results have shown that the EM algorithm can be implemented exactly for faster computation of ML estimates for mixtures with unrestricted MST components. The gain in computational time is effected by noting that the semi-infinite integrals on the E-step of the EM algorithm can be put in the form of moments of the truncated multivariate non-central t-distribution, similar to the restricted case, which subsequently can be expressed in terms of the non-truncated form of the central t-distribution function for which fast algorithms are available. We present comparisons to illustrate the relative performance of the restricted and unrestricted models, and demonstrate the usefulness of the recently proposed methodology for the unrestricted MST mixture, by some applications to three real datasets.
Similar content being viewed by others
References
Akaike, H.: A new look at the statistical model identification. Autom. Control 19, 716–723 (1974)
Arellano-Valle, R., Bolfarine, H., Lachos, V.: Bayesian inference for skew-normal linear mixed models. J. Appl. Stat. 34(6), 663–682 (2007)
Arellano-Valle, R.B., Azzalini, A.: On the unification of families of skew-normal distributions. Scand. J. Stat. 33, 561–574 (2006)
Arellano-Valle, R.B., Genton, M.G.: On fundamental skew distributions. J. Multivar. Anal. 96, 93–116 (2005)
Arnold, B.C., Beaver, R.J.: Skewed multivariate models related to hidden truncation and/or selective reporting. Test 11, 7–54 (2002)
Azzalini, A.: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1985)
Azzalini, A.: The skew-normal distribution and related multivariate families. Scand. J. Stat. 32, 159–188 (2005)
Azzalini, A., Capitanio, A.: Distribution generated by perturbation of symmetry with emphasis on a multivariate skew t distribution. J. R. Stat. Soc., Ser. B 65, 367–389 (2003)
Azzalini, A., Dalla, Valle A.: The multivariate skew-normal distribution. Biometrika 83(4), 715–726 (1996)
Banfield, J.D., Raftery, A.: Model-based gaussian and non-gaussian clustering. Biometrics 49, 803–821 (1993)
Basso, R.M., Lachos, V.H., Cabral, C.R.B., Ghosh, P.: Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput. Stat. Data Anal. 54, 2926–2941 (2010)
Böhning, D.: Computer-Assisted Analysis of Mixtures and Applications: Meta-Analysis, Discase Mapping and Others. Chapman and Hall, New York (1999)
Branco, M.D., Dey, D.K.: A general class of multivariate skew-elliptical distributions. J. Multivar. Anal. 79, 99–113 (2001)
Brinkman, R., Gaspareto, M., Lee, S.J., Ribickas, A., Perkins, J., Janssen, W., Smiley, R., Smith, C.: High content flow cytometry and temporal data analysis for defining a cellular signature of graft versus host disease. Biol. Blood Marrow Transplant. 13, 691–700 (2007)
Cabral, C., Bolfarine, H., Pereira, J.: Bayesian density estimation using skew student-t-normal mixtures. Comput. Stat. Data Anal. 52, 5075–5090 (2008)
Cabral, C., Lachos, V., Prates, M.: Multivariate mixture modeling using skew-normal independent distributions. Comput. Stat. Data Anal. 56, 126–142 (2012)
Dempster, A., Laird, N.M., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc., Ser. B 39, 1–38 (1977)
Everitt, B.S., Hand, D.J.: Finite Mixture Distributions. Chapman and Hall, London (1981)
Fraley, C., Raftery, A.E.: How many clusters? Which clustering methods? Answers via model-based cluster analysis. Comput. J. 41, 578–588 (1999)
Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006)
Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. Biostatistics 11, 317–336 (2010)
Genz, A., Bretz, F.: Methods for the computation of multivariate t-probabilities. J. Comput. Graph. Stat. 11, 950–971 (2002)
Gómez, H., Venegas, O., Bolfarine, H.: Skew-symmetric distributions generated by the distribution function of the normal distribution. Environmetrics 18, 395–407 (2007)
González-Farás, G., Domínguez-Molinz, J.A., Gupta, A.K.: Additive properties of skew normal random vectors. J. Stat. Plan. Inference 126, 521–534 (2004)
Green, P.J.: On use of the em algorithm for penalized likelihood estimation. J. R. Stat. Soc. B 52, 443–452 (1990)
Gupta, A.K.: Multivariate skew-t distribution. Statistics 37, 359–363 (2003)
Ho, H., Lin, T., Chen, H., Wang, W.: Some results on the truncated multivariate t distribution. J. Stat. Plan. Inference 142, 25–40 (2012a)
Ho, H., Pyne, S., Lin, T.: Maximum likelihood inference for mixtures of skew student-t-normal distributions through practical em-type algorithms. Stat. Comput. 22, 287–299 (2012b)
Karlis, D., Santourian, A.: Model-based clustering with non-elliptically contoured distributions. Stat. Comput. 19, 73–83 (2009)
Karlis, D., Xekalaki, E.: Choosing initial values for the em algorithm for finite mixtures. Comput. Stat. Data Anal. 41, 577–590 (2003)
Kotz, S., Nadarajah, S.: Multivariate t Distributions and Their Applications. Cambridge University Press, Cambridge (2004)
Lachos, V.H., Ghosh, P., Arellano-Valle, R.B.: Likelihood based inference for skew normal independent linear mixed models. Stat. Sin. 20, 303–322 (2010)
Lee, S., McLachlan, G.: On the fitting of mixtures of multivariate skew t-distributions via the em algorithm (2011). arXiv:1109.4706 [statME]
Lin, T.I.: Maximum likelihood estimation for multivariate skew-normal mixture models. J. Multivar. Anal. 100, 257–265 (2009)
Lin, T.I.: Robust mixture modeling using multivariate skew t distribution. Stat. Comput. 20, 343–356 (2010)
Lin, T.I., Lee, J.C., Hsieh, W.J.: Robust mixture modeling using the skew-t distribution. Stat. Comput. 17, 81–92 (2007a)
Lin, T.I., Lee, J.C., Yen, S.Y.: Finite mixture modelling using the skew normal distribution. Stat. Sin. 17, 909–927 (2007b)
Lindsay, B.G.: Mixture Models: Theory, Geometry, and Applications. NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5. Institute of Mathematical Statistics, Hayward (1995)
Liseo, B., Loperfido, N.: A bayesian interpretation of the multivariate skew-normal distribution. Stat. Probab. Lett. 61, 395–401 (2003)
Liu, C., Rubin, D.: The ecme algorithm: a simple extension of the em and ecm with faster monotone convergence. Biometrika 81, 633–648 (1994)
Maier, L.M., Anderson, D.E., De Jager, P.L., Wicker, L., Hafler, D.A.: Allelic variant in ctla4 alters t cell phosphorylation patterns. Proc. Natl. Acad. Sci. USA 104, 18607–18612 (2007)
McLachlan, G., Peel, D.: Robust cluster analysis via mixtures of multivariate t-distributions. In: Amin, A., Dori, D., Pudil, P., Freeman, H. (eds.) Lecture Notes in Computer Science, vol. 1451, pp. 658–666. Springer, Berlin (1998)
McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Applications. Dekker, New York (1988)
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley Series in Probability and Statistics (2000)
O’Hagan, A.: Bayes estimation of a convex quadratic. Biometrika 60, 565–571 (1973)
O’Hagan, A.: Moments of the truncated multivariate-t distribution (1976). http://www.tonyohagan.co.uk/academic/pdf/trunc_multi_t.PDF
O’Hagan, A., Murphy, T., Gormley, I.: Computational aspects of fitting mixture models via the expectation-maximization algorithm. Comput. Stat. Data Anal. 56, 3843–3864 (2012)
Peel, D., McLachlan, G.: Robust mixture modelling using the t distribution. Stat. Comput. 10, 339–348 (2000)
Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L.M., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., De Jager, P.L., Mesirow, J.P.: Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA 106, 8519–8524 (2009)
Sahu, S., Dey, D., Branco, M.: A new class of multivariate skew distributions with applications to bayesian regression models. Can. J. Stat. 31, 129–150 (2003). Eratum: Can. J. Stat. 37, 301–302 (2009)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Titterington, D.M., Smith, A.F.M., Markov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985)
Vrbik, I., McNicholas, P.: Analytic calculations for the em algorithm for multivariate skew t-mixture models. Stat. Probab. Lett. 82, 1169–1174 (2012)
Wang, K.: EMMIX-skew: EM algorithm for mixture of multivariate skew normal/t distributions (2009). http://www.maths.uq.edu.au/gjm/mix_soft/EMMIX-skew, R package version 1.0-12
Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew t mixture models: applications: applications to fluorescence-activated cell sorting data. In: Shi, H., Zhang, Y., Botema, M., Lovell, B., Maoder, A. (eds.) DICTA 2009 (Conference of Digital Image Computing: Techniques and Applications, Melbourne), pp. 526–531. IEEE Comput. Soc., Los Alamitos (2009)
Acknowledgements
We would like to thank Professor Seung-Gu Kim for comments and corrections, and Drs. Kui (Sam) Wang and Saumyadipta Pyne for their helpful discussions on this topic.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: The truncated multivariate t-distribution
In this appendix, we briefly describe the truncated multivariate t-distribution and provide some formulas for computing its moments (Lee and McLachlan 2011). These expressions are crucial for the swift evaluation of the conditional expectations on the E-step of the FM-uMST model discussed in Sect. 5. We follow the approach of Lee and McLachlan (2011). A alternative description is given by Ho et al. (2012a), which provides equivalent expressions for the doubly truncated case.
Let X be a p-dimensional random variable having a multivariate t-distribution with location vector μ, scale matrix Σ, and ν degrees of freedom. Truncating x to the hyperplane region \(\mathbb{A} = \{\boldsymbol{x} \geq\boldsymbol{a}, \boldsymbol{a} \in\mathbb{R}^{p}\}\), where x≥a means each element x i =(x) i is greater than or equal to a i =(a) i for i=1,…,p, results in a left-truncated t-distribution whose density is given by
For a random vector X with density (69), we write \(\boldsymbol{X} \sim tt_{p,\nu} (\boldsymbol{\mu}, \boldsymbol{\varSigma}; \mathbb{A})\). For our purposes, we will be concerned with the first two moments of X, specifically E(X) and E(XX T). Explicit formulas for the truncated central t-distribution in the univariate case \(tt_{1,\nu}(0, \sigma^{2}; \mathbb {A})\) were provided by O’Hagan (1973), who expressed the moments in terms of the non-truncated t-distribution. The multivariate case was studied in O’Hagan (1976), but still considering the central case only. Here we describe a generalization of the results in O’Hagan (1976) to the multivariate non-central case and express them in a form suitable for undertaking the E-step in the direct application of the EM algorithm to the fitting of mixtures of MST distributions.
Before presenting the expressions, it will be convenient to introduce some notation. Let x be a vector, then
- x i :
-
denotes the ith element,
- x ij :
-
is a two-dimensional vector with elements x i and x j ,
- x −i :
-
represents the (p−1)-dimensional vector with the ith element removed, and
- x −ij :
-
represents the (p−2)-dimensional vector with the ith and jth elements removed.
For a matrix X, let
- x ij :
-
denote the ijth element,
- X ij :
-
defines the 2×2 matrix consisting of the elements x ii , x ij , x ji and x jj ,
- X −i :
-
be created by removing the ith row and column from X,
- X −ij :
-
be the (p−2) square matrix resulting from the removal of the ith and jth row and column from X, and
- X (ij) :
-
be the ith and jth column of X with the elements of X ij removed, yielding a (p−2)×2 matrix.
We now proceed to the expressions for the first two moments of X.
One can show that the first moment of (69) is
where ϵ=c −1 Σξ and c=T p,ν (μ−a;0,Σ), and ξ is a p×1 vector with elements
for i=1,…,p, and where
The second moment is given by
where H is a p×p matrix with off-diagonal elements
and diagonal elements,
It is worth noting that evaluation of the expressions (70) and (71) rely on algorithms for computing the multivariate central t-distribution function for which highly efficient procedures are readily available in many statistical packages. For example, an implementation of Genz’s algorithm (Genz and Bretz 2002; Kotz and Nadarajah 2004) is provided by the mvtnorm package available from the R website.
Appendix B: E-step for uMST
Derivations of \(e_{1,j}^{(k)}\), \(\boldsymbol{e}_{3,j}^{(k)}\) and \(\boldsymbol{e}_{4,j}^{(k)}\) are detailed as follows.
2.1 B.1 Calculation of \(e_{1,j}^{(k)}\)
Concerning the calculation of the expectation \(e_{1,j}^{(k)}\), the conditional density of W j given y j , is given by
where
and 0 is the zero vector of appropriate dimension.
The conditional expectation \(E_{\boldsymbol{\theta}^{(k)}}\{\log(W_{j}) \mid\boldsymbol {y}_{j}\}\) can be reduced to
where
and where the last term S is given by
and \(S_{1,j}^{(k)}\) is an integral given by
Combining (73) and (74), \(e_{1,j}^{(k)}\) can be reduced to
We note that the term \(S_{1,j}^{(k)}\) will be very small in practice since it would be zero if we adopted an OSL EM algorithm. In which case, there would be no need to calculate the multiple integral \(S_{1,j}^{(k)}\) in (74). Hence then, \(e_{1,j}^{(k)}\) can be reduced to
2.2 B.2 Calculation of \(\boldsymbol{e}_{3,j}^{(k)}\) and \(\boldsymbol{e}_{4,j}^{(k)}\)
To obtain \(\boldsymbol{e}_{3,j}^{(k)}\) and \(\boldsymbol{e}_{4,j}^{(k)}\), first note that the joint density of y j , u j , and w j is given by
Using Bayes’ rule, the conditional density of u j and w j given y j can be written as
From (79), standard conditional expectation calculations yield
where X j is a p-dimensional t-variate truncated to the positive hyperplane ℝ+, which is conditionally distributed as
Analogously, \(\boldsymbol{e}_{4,j}^{(k)}\) can be reduced to
The truncated moments E(X j ∣y j ) and \(E(\boldsymbol{X}_{j} \boldsymbol{X}_{j}^{T} \mid\boldsymbol{y}_{j})\) can be swiftly evaluated using the expressions (70) and (71) in Sect. 3.2.
Appendix C: E-step for FM-uMST
The four conditional expectations \(e_{1,hj}^{(k)}\), \(e_{2,hj}^{(k)}\), \(\boldsymbol{e}_{3,hj}^{(k)}\), and \(\boldsymbol{e}_{4,hj}^{(k)}\) involved in the E-step are given by
where \(S_{1,hj}^{(k)}\) is a scalar defined by
and X hj is a truncated p-dimensional t-variate given by
The first two moments of X hj can be implicitly expressed in terms of the parameters \(\boldsymbol{q}_{hj}^{(k)}\), \(d_{h}^{(k)}(\boldsymbol{y}_{j})\), \(\boldsymbol{\varLambda}_{h}^{(k)}\), \(\nu_{h}^{(k)}\) using results (70) and (71). It is worth emphasizing that computation of \(\boldsymbol{e}_{3hj}^{(k)}\) and \(\boldsymbol{e}_{4hj}^{(k)}\) depends on algorithms for evaluating the multivariate t-distribution function, for which fast procedures are available.
Rights and permissions
About this article
Cite this article
Lee, S., McLachlan, G.J. Finite mixtures of multivariate skew t-distributions: some recent and new results. Stat Comput 24, 181–202 (2014). https://doi.org/10.1007/s11222-012-9362-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-012-9362-4