Latent Class Analysis: An example for reporting results
Introduction
Latent classes are unobserved, or latent, segments. Participants, or more generally, cases, within the same latent class are considered homogenous based on certain pieces of information. Latent Class Analysis (LCA) was developed over 60 years ago as a way to characterize latent variables while analyzing dichotomous items.1 During the past several years, it has expanded to all for all types of data. In the literature, LCA is referred to in different ways. It has been called Latent Structure Analysis,2 Mixture Likelihood Clustering,3, 4 Model Based Clustering,5, 6, 7 Mixture-model Clustering,8 Bayesian Classification,9 and Latent Class Cluster Analysis.10
LCA allows researchers to create or characterize a multidimensional discrete latent variable based on a cross-classification of two more observed categorical variables.2 Because of the categorical nature of the latent classes, LCA is different from other latent approached such as factor analysis and structural equation modeling. LCA provides the possibility to develop typologies for understanding and, if desired, can be used in predictive models. In addition to the ability to analyze relationships in categorical data, numeric and non-numeric data can be utilized.
Classes or clusters are a common need for researchers and data analysists in general. K-means is a common technique used, but has several problems. The potential difficulties include sensitivity to outliers (outliers are extreme values that can skew the results) as well as the need to use interval or ratio data—which means that, in calculating distances, you have to know whether the numbers actually add up—and there are some concerns about the order in which data is assembled.11 In some cases, data may not be appropriate for the K-means method. More fundamentally, the stability of clusters cannot be assumed because traditionally there has been no objective set of criteria to judge the suitability of solutions. K-means will always produce a solution, and some of those solutions are likely to fit your expectations.
LCA quite easily overcomes all of the problems with K-means clustering. The increase of computing power in the 1990s made LCA a very efficient technique. The best way to distinguish between LCA and cluster analysis techniques is to note that LCA is model-based and cluster analysis is not. “Model-based,” means there is a statistical model that is assumed to come from the population from which the data was gathered.11 Both cluster and LCA are seeking divisions that maximize the between-cluster differences and minimize the within-cluster differences. But in traditional cluster analysis this decision is arbitrary or subjective. In LCA, a statistical model allows the comparison to be statistically tested, so that the decision to adopt a particular model is less subjective, or at least has some grounding for comparison. In addition, the items used in the analysis do not need to have the same scale or have equal variances. Finally, LCA allows for the examination of the residuals between items used in the analysis. In other words, LCA is useful in examining the data that does not fit the model or models, thus allowing the analyst to judge the overall quality.
A simulation study to compare K-means analysis and LCA against discriminant function analysis with known groups, a method generally considered to be the “gold standard,” tested how well variables predict group membership.10 In the study, group membership was known in advance and the authors applied the three methods to the data to examine classification performance. They argued that they used data that favored K-means analysis. Even so, the results of the comparison showed that K-means had an 8% misclassification rate versus 1.3% for LCA.10
The remainder of this article focuses on procedures for running LC models, decision points, and what should be included in a manuscript for review.
Section snippets
Sample
This data comes from a larger data set with military and non-military personnel. The full data set involved 1100 individuals who completed several surveys (e.g., Reading Interests, Cognition, along with some pattern recognition tasks, such as the Raven's Matrices). The data was never published and is quite old now, but did provide the foundation for other studies. For this example, there are five hundred participants that provided demographic data, some health history data, and depression data
Basic reporting guidelines
- 1.
Show Evaluative Information for all models tested, all of it.
- a.
Covariance Matrix (Table 7)
- b.
Profiles
- c.
Percentages in Each Class
- d.
Evaluative criteria used for choosing a specific model
- e.
Residual analysis
- a.
- 2.
Explain reasoning choices for choosing a specific model
- 3.
Explain choices related to “fixing” a bivariate residual relationship to zero.
- 4.
Provide software used and the version number. Software changes and you must let your reader know.
- 5.
Submit the traditional descriptive and frequency data along with correlation
LC-regression
If examining by classes is the starting point, but the desire to predict an outcome based on the classes, then running a latent class regression analysis technique is needed. One could create the classes and save them by individual and then use the class designation in a regression or logistic regression or run the regression model all at once. The example below utilizes the same data and variables, with one switch, CESD-Category is now the outcome variable.
The evaluation of the models begins
Conclusions
LCA and the advancements within LCA have provided a powerful alternative to previously used statistical techniques such as K-means for clustering. The ability to use a wide variety of data with different variances along with regression and factor analysis options makes latent class analysis an appealing option for many researchers. As with the changes in ease of use and understanding of the analysis technique, I expect many more LC analyses to be published in the near future.
References (16)
Latent Class Analysis. Quantitative Applications in the Social Sciences Series
(1987)- et al.
Latent Structure Analysis
(1968) - et al.
Mixture Models: Inference and Application to Clustering
(1988) Cluster Analysis
(1993)- et al.
Model- based Gaussian and non-Gaussian clustering
Biometrics
(1993) - et al.
Inference in model based clustering
Statistics Comput
(1997) - et al.
MCLUST: Software for Model-based Cluster and Discriminant Analysis
(1998) - et al.
The EMMIX software for the fitting of mixtures of normal and t-components
J Stat, Soft
(1999)
Cited by (132)
Sociodemographic Factors, Leisure-Time Physical Activity and Mortality
2024, American Journal of Preventive MedicineWho is more likely to feel ostracized? A latent class analysis of personality traits
2023, Personality and Individual DifferencesFinding the bright side: Positive online racial experiences, racial identity, and activism for Black young adults
2023, Computers in Human BehaviorHeterogeneity of smartphone impact on everyday life and its relationship with personality and psychopathology: A latent profile analysis
2023, Comprehensive PsychiatryCitation Excerpt :The optimal solution (i.e., the most adequate number of classes to represent our data) was determined based on both statistical and theoretical considerations. Statistically speaking, the guidelines on how to properly conduct an LPA suggest evaluating multiple indices simultaneously, also recommending that the interpretability and theoretical utility of the profiles be considered in order to choose the optimal solution [75,86,90,102]. Statistical considerations were based on the following indices as useful methods for comparing two models: (1) the Akaike information criterion (AIC), (2) the Bayesian information criterion (BIC), and (3) the sample-size adjusted BIC (SABIC), which are model fit indices; (4) the entropy of the solution, which indicates the extent to which classes are distinct from one another; (5) the classes' posterior probability, representing the accuracy by which individuals are assigned to a class; (6) the Lo-Mendell-Rubin (LMR); and (7) the bootstrapped likelihood ratio (BLRT) tests.
Learning Approaches: Cross-Cultural Differences (Spain-Argentina) and Academic Achievement in College Students
2023, Spanish Journal of Psychology