How Low Can You Go?
An Investigation of the Influence of Sample Size and Model Complexity on Point and Interval Estimates in Two-Level Linear Models
Abstract
Whereas general sample size guidelines have been suggested when estimating multilevel models, they are only generalizable to a relatively limited number of data conditions and model structures, both of which are not very feasible for the applied researcher. In an effort to expand our understanding of two-level multilevel models under less than ideal conditions, Monte Carlo methods, through SAS/IML, were used to examine model convergence rates, parameter point estimates (statistical bias), parameter interval estimates (confidence interval accuracy and precision), and both Type I error control and statistical power of tests associated with the fixed effects from linear two-level models estimated with PROC MIXED. These outcomes were analyzed as a function of: (a) level-1 sample size, (b) level-2 sample size, (c) intercept variance, (d) slope variance, (e) collinearity, and (f) model complexity. Bias was minimal across nearly all conditions simulated. The 95% confidence interval coverage and Type I error rate tended to be slightly conservative. The degree of statistical power was related to sample sizes and level of fixed effects; higher power was observed with larger sample sizes and level-1 fixed effects.
References
2008, August). Cluster size in multilevel models: The impact of sparse data structures on point and interval estimates in two-level models. Proceedings of the joint statistical meetings, survey research methods section (pp. 1122–1129). Alexandria, VA: American Statistical Association.
(2009, April). The effect of sparse data structures and model misspecification on point and interval estimates in multilevel models Presented at the annual meeting of the American Educational Research Association, San Diego, CA.
(2000). Implementation and performance issues in the Bayesian and likelihood fitting of multilevel models. Computational Statistics, 15, 391–420. doi: 10.1007/s001800000041
(2006). The design of simulation studies in medical statistics. Statistics in Medicine, 25, 4279–4292. doi: 10.1002/sim.2673
(2008). When can group level clustering be ignored? Multilevel models versus single-level models with sparse data. Journal of Epidemiology and Community Health, 62, 752–758. doi: 10.1136/jech.2007.060798
(2007). Addressing data sparseness in contextual population research using cluster analysis to create synthetic neighborhoods. Sociological Methods & Research, 35, 311–351. doi: 10.1177/0049124106292362
(1968). Multiple regression as a general data-analytic system. Psychological Bulletin, 70, 426–443. doi: 10.1037/h0026714
(2009). Multilevel modeling: A review of methodological issues and applications. Review of Educational Research, 79, 69–102. doi: 10.3102/0034654308325581
(2010). A prior power analysis in longitudinal three-level multilevel models: An example with therapist effects. Psychotherapy Research, 20, 273–284. doi: 10.1080/10503300903376320
(2000). Design and analysis of cluster randomization trials in health research. London, UK: Arnold.
(2003). Multilevel statistical models (3rd ed.). London: Edward Arnold.
(2000). An introduction to multilevel modeling techniques. Mahwah, NJ: Erlbaum.
(2006, April). Interval estimates of fixed effects in multi-level models: Effects of small sample size. Presented at the annual meeting of the American Educational Research Association, San Francisco, CA
(1998). Multilevel modeling: When and why. In , Classification, data analysis, and data highways (pp. 147–154). New York, NY: Springer.
(2002). Multilevel analysis: Techniques and applications. Mahwah, NJ: Erlbaum.
(2001). The accuracy of multilevel structural equation modeling with psuedobalanced groups and small samples. Structural Equation Modeling, 8, 157–174. doi: 10.1207/S15328007SEM0802_1
(2001). The consequences of ignoring multilevel data structures in nonhierarchical covariance modeling. Structural Equation Modeling, 8, 325–352. doi: 10.1207/S15328007SEM0803_1
(2000). Multilevel theory, research, and methods in organizations. San Francisco, CA: Jossey-Bass.
. (2004). Robustness issues in multilevel regression analysis. Statistica Neerlandica, 58, 127–137. doi: 10.1046/j.0039-0402.2003.00252.x
(2005). Sufficient sample sizes for multilevel modeling. Methodology, 1, 86–92. doi: 10.1027/1614-2241.1.3.86
(2004). The consequences of ignoring a level of nesting in multilevel analysis. Multivariate Behavioral Research, 39, 129–149. doi: 10.1207/s15327906mbr3901_5
(2006). Power and money in cluster randomized trials: When is it worth measuring a covariate? Statistics in Medicine, 25, 2607–2617. doi: 10.1002/sim.2297
(2007). A simulation study of sample size for multilevel logistic regression models. BMS Medical Research Methodology, 7, 1–10. doi: 10.1186/1471-2288-7-34
(1995 ). Sample size requirements for 2-level designs in educational research. Unpublished manuscript, Macquarie University, Sydney, Australia.1998). Design and analysis of group-randomized trials. New York, NY: Oxford University Press.
(2002 ). Nonconvergence and sample bias in hierarchical linear modeling of dyadic data. Unpublished manuscript, Portland State University.1997). Now you see it, now you don’t: A comparison of traditional versus random-effects regression models in the analysis of longitudinal follow-up data from a clinical trial. Journal of Consulting and Clinical Psychology, 65, 252–261. doi: 10.1037//0022-006X.65.2.252
(2002). Hierarchical linear models. Newbury Park, CA: Sage.
(2003). Design issues in multilevel studies. In , Multilevel modeling: methodological advances, issues and applications (pp. 285–298). Mahwah, NJ: Erlbaum.
(2003). SAS, release 9.1 [computer program]. Cary, NC: SAS Institute.
. (2008). SAS/IML® 9.2 User’s guide. Cary, NC: SAS Institute.
. (2009). SAS® 9.2 Language Reference: Dictionary (2nd ed.). Cary, NC: SAS Institute.
. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MD: Houghton Mifflin.
(2005). Power and sample size in multilevel linear models. In , Encyclopedia of Statistics in Behavioral Science (pp. 1570–1573). Chicester, UK: Wiley.
(1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage.
(2000). The consequences of ignoring a nested factor on measures of effect size in analysis of variance. Psychological Methods, 5, 425–433. doi: 10.1037//1082-989X.5.4.425
(