Comparing Scaling Methods and Software in Real Data
To examine the performance of the various scaling methods, I fit two series of MLM. I chose these models because they represent the basic models presented by major texts on MLM (e.g., Raudenbush and Bryk[
16]), the models form the building blocks for more complicated models, and because the models in each series represent typical types of models analysts would explore in MLM. [
16,
28] I used publicly available data from the 2005–2006 National Survey of Children with Special Health Care Needs (NS-CSHCN: downloadable at
ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/slaits_cshcn_survey/2005_2006/Datasets/), sponsored by the Maternal and Child Health Bureau (MCHB) and conducted by the National Center for Health Statistics (NCHS). Within each state and Washington DC (hereafter state includes Washington DC), this survey used random digit dialing and collected data on approximately 750 children with special health care needs (CSHCN). It represents a "classic" two level design. CSHCN (level-1) nested within states (level-2). Given that the survey design specified approximately equal sample sizes for each state (
n ≅ 750 for each state), children in smaller states had a greater probability of selection. Likewise, in households with multiple children, one randomly selected CSHCN served as the subject. Thus, CSHCN in smaller families had a greater probability of selection. Level-1 design weights account for these unequal selection probabilities, adjusted for other design issues (e.g., nonresponse), and weight the data to make it representative of the CSHCN in the US. The NS-CSHCN sampled each state with certainty. Thus, states were not selected with unequal probability and do not need weights. As described, the level-1 weights account for unequal probability of selection given different population sizes within states. Thus, I left level-2 unweighted. See Blumberg et al. [
27] for complete details.
The first series of MLM I estimated examines a continuous outcome (the number of months CSHCN go without insurance) as a function of a level-1 predictor (family income relative to poverty level, hereafter labeled simply "family income") and a level-2 predictor (the proportion of families in the state with an income no greater than twice the US federal poverty level (i.e., 200% poverty level), here after labeled simply "proportion of families in poverty"). The second series of MLM examines a categorical outcome (whether a CSHCN went uninsured at any time in the previous 12 months) as a function of a level-1 predictor (family income) and a level-2 predictor (proportion of families in poverty). For both series, I fit six models: 1) an unconditional model, 2) a level-1 predictor only model specifying the level-1 slope as fixed, 3) a level-1 predictor only model that allowed the level-1 slope to vary across the states (level-2), 4) a level-2 only predictor model, 5) a model including level-1 and -2 predictors but no cross-level interaction, and, 6) a model including level-1 and -2 predictors and a cross-level interaction. For each series of analyses (continuous and categorical), the unconditional (empty) model examines whether the outcome (average number of months uninsured or odds of going without insurance) varies across states. The level-1 only predictor model asks whether family income predicts the outcome, while the level-2 predictor only model investigates whether the proportion of families in poverty in a state affects the outcome. The model including level-1 and level-2 predictors investigates the contributions of level-1 and level-2 predictors simultaneously, but does not include a cross-level interaction. Among other questions, it asks whether a relationship between family income and the outcome exists, controlling for the effects of the proportion of families in poverty in the state. The final model investigates the level-1 and level-2 predictors simultaneously and includes a cross-level interaction. This model asks several questions as well, including whether the relationship between family income and months without insurance differs according to the proportion of families in poverty in a state. For each series, all models allowed the intercept to vary across the states. Appendix C presents traditional MLM equations for each model I estimate.
For each series I fit the models in Mplus, MLwiN, and GLLAMM using unweighted data, scaling method A and scaling method B. For Mplus, I used MLR for both the continuous and categorical analyses. MLR delivers maximum likelihood parameter estimates with robust standard errors computed using a sandwich estimator. For categorical outcomes, MLR uses numerical integration and adaptive quadrature using 15 integration points per dimension. [
21] For MLwiN, I used iterative generalized least squares (IGLS) estimation for the continuous outcome. With categorical outcomes, MLwiN utilizes a quasi-likelihood procedure that uses a Taylor series-based linearization to transform discrete responses into a continuous model that is then estimated using IGLS or reweighted IGLS (RIGLS). MLwiN uses either marginal quasi-likelihood (MQL) or predictive (penalized) quasi-likelihood (PQL) to approximate the linear transformation. [
22] Rasbash et al. [
22] suggest adopting a two step process employing MQL to generate starting values and PQL to arrive at the final estimates. I followed this procedure. [
22] I first estimated each categorical model with 1
st order marginal quasi-likelihood (MQL) estimation and IGLS to obtain starting values. I then used the 1
st order MQL estimates as starting values for 2
nd order predictive (penalized) quasi-likelihood (PQL) estimation and IGLS to obtain final values. For both continuous and categorical outcomes in MLwiN, I requested robust standard errors. For all GLLAMM models, I initially used adaptive quadrature with 8 quadrature points. Consistent with Rabe-Hesketh et al.'s recommendation,[
23] I subsequently refit the models using 16 quadrature points to see if I found consistent estimates. In almost all cases, the results were nearly identical. In the two instances where I obtained discrepant values, I continued increasing the quadrature points until the estimates stabilized. For all models, I requested robust standard errors, which GLLAMM computes using a sandwich estimator. [
23,
29] Finally, Appendix A presents the details to create these datasets and it gives code in SAS and Stata to create scaled weights using both methods and Appendix B gives the equations to scale the weights. Appendix D provides a brief description of the original weights. For complete details about the weights, readers should review Blumberg et al. [
27]