Skip to main content
Erschienen in: Prevention Science 6/2020

18.04.2020

Finite Mixture Models with Student t Distributions: an Applied Example

verfasst von: Albert J. Burgess-Hull

Erschienen in: Prevention Science | Ausgabe 6/2020

Einloggen, um Zugang zu erhalten

Abstract

The use of finite mixture modeling (FMM) to identify unobservable or latent groupings of individuals within a population has increased rapidly in applied prevention research. However, many prevention scientists are still unaware of the statistical assumptions underlying FMM. In particular, finite mixture models (FMMs) typically assume that the observed indicator variables are normally distributed within each latent subgroup (i.e., within-class normality). These assumptions are rarely met in applied psychological and prevention research, and violating these assumptions when fitting a FMM can lead to the identification of spurious subgroups and/or biased parameter estimates. Although new methods have been developed that relax the within-class normality assumption when fitting a FMM, prevention scientists continue to rely on FMM methods that assume within-class normality. The purpose of the current article is to introduce prevention researchers to a FMM method for heavy-tailed data: FMM with Student t distributions. We begin by reviewing the distributional assumptions that underlie FMM and the limitations of FMM with normal distributions. Next, we introduce FMM with Student t distributions, and show, step by step, the analytic and substantive results of fitting a FMM with normal and Student t distributions to data from a smoking-cessation trial. Finally, we extend the results of the applied example to draw conclusions about the use of FMM with Student t distributions in applied settings and to provide guidelines for researchers who wish to use these methods in their own research.
Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
These data are part of a larger simulation study conducted to highlight the potential dangers of fitting a FMM-n to non-normally distributed datasets. The results of this study are available in the online supplemental material.
 
2
Soft-randomization scheme: uses initialization values for the ECM algorithm between 0 and 1. Hard-randomization scheme: uses initialization values of either 0 or 1 for the ECM algorithm.
 
3
The k-means initialization procedure derives initialization values from a k-means clustering procedure and uses the parameter estimates derived from the k-means analysis as starting values in the ECM algorithm.
 
4
A model’s entropy is an aggregate measure of a model’s classification uncertainty and is derived from each individual’s posterior probability of membership in a particular subgroup. Entropy scores range from 0.00–1.00 with higher values (> .80) indicating that there is adequate separation between the identified subgroups (Asparouhov and Muthén 2018). R code to derive entropy scores from a fitted FMM is available in the online supplemental material.
 
5
To assign each individual to a unique subgroup, we took a “classify-analyze” approach where participants were assigned to the subgroup corresponding to their highest posterior probabilities. This approach was deemed appropriate because the majority of the identified models had an entropy ≥ 0.90 (Clark and Muthén 2009).
 
Literatur
Zurück zum Zitat Andrews, J. L., McNicholas, P. D., & Subedi, S. (2011). Model-based classification via mixtures of multivariate t-distributions. Computational Statistics & Data Analysis, 55, 520–529. Andrews, J. L., McNicholas, P. D., & Subedi, S. (2011). Model-based classification via mixtures of multivariate t-distributions. Computational Statistics & Data Analysis, 55, 520–529.
Zurück zum Zitat Asparouhov, T., & Muthén, B. (2016). Structural equation models and mixture models with continuous nonnormal skewed distributions. Structural Equation Modeling: A Multidisciplinary Journal, 23, 1–19. Asparouhov, T., & Muthén, B. (2016). Structural equation models and mixture models with continuous nonnormal skewed distributions. Structural Equation Modeling: A Multidisciplinary Journal, 23, 1–19.
Zurück zum Zitat Bauer, D. J. (2007). Observations on the use of growth mixture models in psychological research. Multivariate Behavioral Research, 42, 757–786. Bauer, D. J. (2007). Observations on the use of growth mixture models in psychological research. Multivariate Behavioral Research, 42, 757–786.
Zurück zum Zitat Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57, 289–300. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57, 289–300.
Zurück zum Zitat Blanca, M. J., Arnau, J., López-Montiel, D., Bono, R., & Bendayan, R. (2013). Skewness and kurtosis in real data samples. Methodology, 9, 78–84. Blanca, M. J., Arnau, J., López-Montiel, D., Bono, R., & Bendayan, R. (2013). Skewness and kurtosis in real data samples. Methodology, 9, 78–84.
Zurück zum Zitat Bonanno, G. A., & Mancini, A. D. (2012). Beyond resilience and PTSD: Mapping the heterogeneity of responses to potential trauma. Psychological Trauma: Theory, Research, Practice, and Policy, 4, 74–83. https://doi.org/10.1037/a0017829. Bonanno, G. A., & Mancini, A. D. (2012). Beyond resilience and PTSD: Mapping the heterogeneity of responses to potential trauma. Psychological Trauma: Theory, Research, Practice, and Policy, 4, 74–83. https://​doi.​org/​10.​1037/​a0017829.
Zurück zum Zitat Bonanno, G. A., Ho, S. M. Y., Chan, J. C. K., Kwong, R. S. Y., Cheung, C. K. Y., Wong, C. P. Y., & Wong, V. C. W. (2008). Psychological resilience and dysfunction among hospitalized survivors of the SARS epidemic in Hong Kong: A latent class approach. Health Psychology, 27, 659–667. https://doi.org/10.1037/0278-6133.27.5.659. Bonanno, G. A., Ho, S. M. Y., Chan, J. C. K., Kwong, R. S. Y., Cheung, C. K. Y., Wong, C. P. Y., & Wong, V. C. W. (2008). Psychological resilience and dysfunction among hospitalized survivors of the SARS epidemic in Hong Kong: A latent class approach. Health Psychology, 27, 659–667. https://​doi.​org/​10.​1037/​0278-6133.​27.​5.​659.
Zurück zum Zitat Burgess-Hull, A. J., Roberts, L. J., Piper, M. E., & Baker, T. B. (2018). The social networks of smokers attempting to quit: An empirically derived and validated classification. Psychology of Addictive Behaviors, 32, 64–75. https://doi.org/10.1037/adb0000336. Burgess-Hull, A. J., Roberts, L. J., Piper, M. E., & Baker, T. B. (2018). The social networks of smokers attempting to quit: An empirically derived and validated classification. Psychology of Addictive Behaviors, 32, 64–75. https://​doi.​org/​10.​1037/​adb0000336.
Zurück zum Zitat Cudeck, R., & Henly, S. J. (2003). A realistic perspective on pattern representation in growth data: Comment on Bauer and Curran (2003). Psychological Methods, 8, 378–383. Cudeck, R., & Henly, S. J. (2003). A realistic perspective on pattern representation in growth data: Comment on Bauer and Curran (2003). Psychological Methods, 8, 378–383.
Zurück zum Zitat Forster, M. R. (2000). Key concepts in model selection: Performance and generalizability. Journal of Mathematical Psychology, 44, 205–231. Forster, M. R. (2000). Key concepts in model selection: Performance and generalizability. Journal of Mathematical Psychology, 44, 205–231.
Zurück zum Zitat Fraley, C., & Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. Computer Journal, 41, 586–588. Fraley, C., & Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. Computer Journal, 41, 586–588.
Zurück zum Zitat Gerogiannis, D., Nikou, C., & Likas, A. (2009). The mixtures of Student’s t-distributions as a robust framework for rigid registration. Image and Vision Computing, 27, 1285–1294. Gerogiannis, D., Nikou, C., & Likas, A. (2009). The mixtures of Student’s t-distributions as a robust framework for rigid registration. Image and Vision Computing, 27, 1285–1294.
Zurück zum Zitat Hennig, C. (2015). What are the true clusters? Pattern Recognition Letters, 64, 53–62. Hennig, C. (2015). What are the true clusters? Pattern Recognition Letters, 64, 53–62.
Zurück zum Zitat Jackson, K. M., Sher, K. J., & Wood, P. K. (2000). Trajectories of concurrent substance use disorders: A developmental, typological approach to comorbidity. Alcoholism: Clinical and Experimental Research, 24, 902–913. Jackson, K. M., Sher, K. J., & Wood, P. K. (2000). Trajectories of concurrent substance use disorders: A developmental, typological approach to comorbidity. Alcoholism: Clinical and Experimental Research, 24, 902–913.
Zurück zum Zitat Krueger, R. F., Markon, K. E., Patrick, C. J., & Iacono, W. G. (2005). Externalizing psychopathology in adulthood: a dimensional-spectrum conceptualization and its implications for DSM-V. Journal of Abnormal Psychology, 114, 537. Krueger, R. F., Markon, K. E., Patrick, C. J., & Iacono, W. G. (2005). Externalizing psychopathology in adulthood: a dimensional-spectrum conceptualization and its implications for DSM-V. Journal of Abnormal Psychology, 114, 537.
Zurück zum Zitat Lange, K. L., Little, R. J., & Taylor, J. M. (1989). Robust statistical modeling using the t distribution. Journal of the American Statistical Association, 84, 881–896. Lange, K. L., Little, R. J., & Taylor, J. M. (1989). Robust statistical modeling using the t distribution. Journal of the American Statistical Association, 84, 881–896.
Zurück zum Zitat Lanza, S. T., & Rhoades, B. L. (2013). Latent class analysis: An alternative perspective on subgroup analysis in prevention and treatment. Prevention Science, 14, 157–168. Lanza, S. T., & Rhoades, B. L. (2013). Latent class analysis: An alternative perspective on subgroup analysis in prevention and treatment. Prevention Science, 14, 157–168.
Zurück zum Zitat Lee, S. X., & Mclachlan, G. J. (2013). On mixtures of skew normal and skew t-distributions. Advances in Data Analysis and Classification, 7, 241–266. Lee, S. X., & Mclachlan, G. J. (2013). On mixtures of skew normal and skew t-distributions. Advances in Data Analysis and Classification, 7, 241–266.
Zurück zum Zitat Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18, 50–60. Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18, 50–60.
Zurück zum Zitat McLachlan, G. J., & Peel, D. (2000). Finite mixture models. Wiley. McLachlan, G. J., & Peel, D. (2000). Finite mixture models. Wiley.
Zurück zum Zitat McLachlan, G. J., & Peel, D. (1998). Robust cluster analysis via mixtures of multivariate t-distributions. In A. Amin, D. Dori, P. Pudil, & H. Freeman (Eds.), Advances in pattern recognition. SSPR /SPR 1998 (pp. 658–666). Berlin, Heidelberg: Springer. McLachlan, G. J., & Peel, D. (1998). Robust cluster analysis via mixtures of multivariate t-distributions. In A. Amin, D. Dori, P. Pudil, & H. Freeman (Eds.), Advances in pattern recognition. SSPR /SPR 1998 (pp. 658–666). Berlin, Heidelberg: Springer.
Zurück zum Zitat McNicholas, P. D., & Subedi, S. (2012). Clustering gene expression time course data using mixtures of multivariate t-distributions. Journal of Statistical Planning and Inference, 142, 1114–1127. McNicholas, P. D., & Subedi, S. (2012). Clustering gene expression time course data using mixtures of multivariate t-distributions. Journal of Statistical Planning and Inference, 142, 1114–1127.
Zurück zum Zitat Muthén, B. (2003). Statistical and substantive checking in growth mixture modeling: Comment on Bauer and Curran (2003). Psychological Methods, 8, 369–377. Muthén, B. (2003). Statistical and substantive checking in growth mixture modeling: Comment on Bauer and Curran (2003). Psychological Methods, 8, 369–377.
Zurück zum Zitat Muthén, L. K., & Muthén, B. O. (1998-2017). MPlus User’s Guide (Eighth ed.). Los Angeles, CA: Muthén & Muthén. Muthén, L. K., & Muthén, B. O. (1998-2017). MPlus User’s Guide (Eighth ed.). Los Angeles, CA: Muthén & Muthén.
Zurück zum Zitat Nagin, D. S., & Tremblay, R. E. (2005). Developmental trajectory groups: Fact or a useful statistical fiction? Criminology, 43, 873–904. Nagin, D. S., & Tremblay, R. E. (2005). Developmental trajectory groups: Fact or a useful statistical fiction? Criminology, 43, 873–904.
Zurück zum Zitat Nylund, K. L., Asparouhov, T., & Muthén, B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling: A Multidisciplinary Journal, 14, 535–569. Nylund, K. L., Asparouhov, T., & Muthén, B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling: A Multidisciplinary Journal, 14, 535–569.
Zurück zum Zitat Piper, M. E., Smith, S. S., Schlam, T. R., Fiore, M. C., Jorenby, D. E., Fraser, D., & Baker, T. B. (2009). A randomized placebo-controlled clinical trial of 5 smoking cessation pharmacotherapies. Archives of General Psychiatry, 66, 1253–1262. Piper, M. E., Smith, S. S., Schlam, T. R., Fiore, M. C., Jorenby, D. E., Fraser, D., & Baker, T. B. (2009). A randomized placebo-controlled clinical trial of 5 smoking cessation pharmacotherapies. Archives of General Psychiatry, 66, 1253–1262.
Zurück zum Zitat Piper, M. E., Cook, J. W., Schlam, T. R., Jorenby, D. E., Smith, S. S., Bolt, D. M., & Loh, W. Y. (2010). Gender, race, and education differences in abstinence rates among participants in two randomized smoking cessation trials. Nicotine & Tobacco Research, 12, 647–657. Piper, M. E., Cook, J. W., Schlam, T. R., Jorenby, D. E., Smith, S. S., Bolt, D. M., & Loh, W. Y. (2010). Gender, race, and education differences in abstinence rates among participants in two randomized smoking cessation trials. Nicotine & Tobacco Research, 12, 647–657.
Zurück zum Zitat Posada, D., & Buckley, T. R. (2004). Model selection and model averaging in phylogenetics: Advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology, 53, 793–808. Posada, D., & Buckley, T. R. (2004). Model selection and model averaging in phylogenetics: Advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology, 53, 793–808.
Zurück zum Zitat Rocke, D. M., & Woodruff, D. L. (1997). Robust estimation of multivariate location and shape. Journal of Statistical Planning and Inference, 57, 245–255. Rocke, D. M., & Woodruff, D. L. (1997). Robust estimation of multivariate location and shape. Journal of Statistical Planning and Inference, 57, 245–255.
Zurück zum Zitat Sampson, R. J., & Laub, J. H. (2005). Seductions of method: rejoinder to nagin and tremblay's “Developmental trajectory groups: Fact or fiction?”. Criminology, 43, 905–913. Sampson, R. J., & Laub, J. H. (2005). Seductions of method: rejoinder to nagin and tremblay's “Developmental trajectory groups: Fact or fiction?”. Criminology, 43, 905–913.
Zurück zum Zitat Tofighi, D., & Enders, C. K. (2008). Identifying the correct number of classes in growth mixture models. In Advances in Latent Variable Mixture Models (pp. 317–341). Information age publishing. Tofighi, D., & Enders, C. K. (2008). Identifying the correct number of classes in growth mixture models. In Advances in Latent Variable Mixture Models (pp. 317–341). Information age publishing.
Zurück zum Zitat Van Horn, M. L., Smith, J., Fagan, A. A., Jaki, T., Feaster, D. J., Masyn, K., et al. (2012). Not quite normal: Consequences of violating the assumption of normality in regression mixture models. Structural Equation Modeling: A Multidisciplinary Journal, 19, 227–249. Van Horn, M. L., Smith, J., Fagan, A. A., Jaki, T., Feaster, D. J., Masyn, K., et al. (2012). Not quite normal: Consequences of violating the assumption of normality in regression mixture models. Structural Equation Modeling: A Multidisciplinary Journal, 19, 227–249.
Zurück zum Zitat Vermunt, J., & Magidson, J. (2002). Latent class cluster analysis. In J. Hagenaars & a. McCutcheon (Eds.), Applied latent class analysis (pp. 89–106). Vermunt, J., & Magidson, J. (2002). Latent class cluster analysis. In J. Hagenaars & a. McCutcheon (Eds.), Applied latent class analysis (pp. 89–106).
Zurück zum Zitat Vrbik, I., & Mcnicholas, P. D. (2014). Parsimonious skew mixture models for model-based clustering and classification. Computational Statistics & Data Analysis, 71, 196–210. Vrbik, I., & Mcnicholas, P. D. (2014). Parsimonious skew mixture models for model-based clustering and classification. Computational Statistics & Data Analysis, 71, 196–210.
Zurück zum Zitat Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica: Journal of the Econometric Society, 307–333 Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica: Journal of the Econometric Society, 307–333
Metadaten
Titel
Finite Mixture Models with Student t Distributions: an Applied Example
verfasst von
Albert J. Burgess-Hull
Publikationsdatum
18.04.2020
Verlag
Springer US
Erschienen in
Prevention Science / Ausgabe 6/2020
Print ISSN: 1389-4986
Elektronische ISSN: 1573-6695
DOI
https://doi.org/10.1007/s11121-020-01109-3

Weitere Artikel der Ausgabe 6/2020

Prevention Science 6/2020 Zur Ausgabe