The online version of this article (doi:10.1186/1471-2288-14-32) contains supplementary material, which is available to authorized users.
The authors declare that they have no competing interest.
RH developed the plot methodology; MJ developed the modelling strategy and wrote the manuscript; GM and AD oversaw the project, provided supervision as well as critical input into the design and implementation; all authors reviewed and/or revised the manuscript and have approved it for submission.
Graphical techniques can provide visually compelling insights into complex data patterns. In this paper we present a type of lasagne plot showing changes in categorical variables for participants measured at regular intervals over time and propose statistical models to estimate distributions of marginal and transitional probabilities.
The plot uses stacked bars to show the distribution of categorical variables at each time interval, with different colours to depict different categories and changes in colours showing trajectories of participants over time. The models are based on nominal logistic regression which is appropriate for both ordinal and nominal categorical variables. To illustrate the plots and models we analyse data on smoking status, body mass index (BMI) and physical activity level from a longitudinal study on women’s health. To estimate marginal distributions we fit survey wave as an explanatory variable whereas for transitional distributions we fit status of participants (e.g. smoking status) at previous surveys.
For the illustrative data the marginal models showed BMI increasing, physical activity decreasing and smoking decreasing linearly over time at the population level. The plots and transition models showed smoking status to be highly predictable for individuals whereas BMI was only moderately predictable and physical activity was virtually unpredictable. Most of the predictive power was obtained from participant status at the previous survey. Predicted probabilities from the models mostly agreed with observed probabilities indicating adequate goodness-of-fit.
The proposed form of lasagne plot provides a simple visual aid to show transitions in categorical variables over time in longitudinal studies. The suggested models complement the plot and allow formal testing and estimation of marginal and transitional distributions. These simple tools can provide valuable insights into categorical data on individuals measured at regular intervals over time.
Additional file 2: Figure S1: Probability tree diagram for BMI group with observed and estimated transitional probabilities and 95% confidence intervals in brackets. (DOCX 78 KB)
Additional file 3: Figure S2: Mosaic plot of smoking status at survey wave 1 compared to wave 2. (DOCX 27 KB)
Additional file 4: Figure S3: Parallel sets diagram of smoking status transitions from survey waves 1 to 5. (DOCX 215 KB)
Additional file 5: Figure S4: Plot and marginal distribution table of body mass index group with a missing category over survey wave for the Australian Longitudinal Survey of Women’s Health. (DOCX 54 KB)
Additional file 6: Figure S5: Plot and marginal distribution table of smoking status over survey wave for ex-smokers at survey wave 2. (DOCX 46 KB)
Additional file 7: Figure S6: Plot and marginal distribution table of smoking status over survey wave for current smokers at survey wave 2. (DOCX 47 KB)
Hedeker D, Gibbons R: Longitudinal data analysis. 2006, Hoboken, New Jersey: John Wiley and Sons
Wilkinson L, Friendly M: The history of the cluster heat map. Am Stat. 2009, 63: 179-184. 10.1198/tas.2009.0033. CrossRef
Dobson AJ, Barnett A: An Introduction to Generalized Linear Models. 2008, Boca Raton, Florida: Chapman & Hall/CRC, 3
Long J: Regression Models for Categorical and Limited Dependent Variables. 1997, Thousand Oaks: Sage Publications
Friendly M: Mosaic displays for multi-way contingency tables. J Am Stat Assoc. 1994, 89: 190-200. 10.1080/01621459.1994.10476460. CrossRef
Kosara R: Parallel sets: interactive exploration and visual analysis of categorical data. Trans on Visualization and Comput Graph. 2006, 12: 1-12. CrossRef
Schmidt M: Der Einsatz von sankey-diagrammen im stoffstrommanagement. Beitraege der Hochschule Pforzheim. 2006, Nr. 124
- Visualising and modelling changes in categorical variables in longitudinal studies
Gita D Mishra
- BioMed Central
Neu im Fachgebiet AINS
Meistgelesene Bücher aus dem Fachgebiet AINS
Mail Icon II