Multilevel Analysis, Hierarchical Linear Models

The term “Multilevel Analysis” is mostly used interchangeably with “Hierarchical Linear Modeling,” although strictly speaking these terms are distinct. Multilevel Analysis may be understood to refer broadly to the methodology of research questions and data structures that involve more than one type of unit. This originated in studies involving several levels of aggregation, such as individuals and counties, or pupils, classrooms, and schools. Starting with Robinson’s (1950) discussion of the ecological fallacy, where associations between variables at one level of aggregation are mistakenly regarded as evidence for associations at a different aggregation level (see Alker 1969, for an extensive review), this led to interest in how to analyze data including several aggregation levels. This situation arises as a matter of course in educational research, and studies of the contributions made by different sources of variation such as students, teachers, classroom composition, school organization, etc., were seminal in the development of statistical methodology in the 1980s (see the review in Chap. 1 of de Leeuw and Meijer 2008). The basic idea is that studying the simultaneous effects of variables at the levels of students, teachers, classrooms, etc., on student achievement requires the use of regression-type models that comprise error terms for each of those levels separately; this is similar to mixed effects models studied in the traditional linear models literature such as Scheffé (1959).

The prototypical statistical model that expresses this is the Hierarchical Linear Model, which is a mixed effects regression model for nested designs. In the two-level situation – applicable, e.g., to a study of students in classrooms – it can be expressed as follows. The more detailed level (students) is called the lower level, or level 1; the grouping level (classrooms) is called the higher level, or level 2. Highlighting the distinction with regular regression models, the terminology speaks of units rather than cases, and there are specific types of unit at each level. In our example, the level-1 units, students, are denoted by i and the level-2 units, classrooms, by j. Level-1 units are nested in level-2 units (each student is a member of exactly one classroom) and the data structure is allowed to be unbalanced, such that j runs from 1 to N while i runs, for a given j, from 1 to n j . The basic two-level hierarchical linear model can be expressed as

$${Y }_{ij} = {\beta }_{0} +{ \sum \limits_{h=1}^{r}}{\beta }_{h}\,{x}_{hij} + {U}_{0j} +{ \sum \limits_{h=1}^{p}}{U}_{hj}\,{z}_{hij} + {R}_{ij};$$
(1a)

or, more succinctly, as

$$\mathbf{Y} = \mathbf{X}\,\beta + \mathbf{Z\,U} + \mathbf{R}.$$
(1b)

Here Y ij is the dependent variable, defined for level-1 unit i within level-2 unit j; the variables x hij and z hij are the explanatory variables. Variables R ij are residual terms, or error terms, at level 1, while U hj for h = 0, …, p are residual terms, or error terms, at level 2. In the case p = 0 this is called a random intercept model, for p ≥ 1 it is called a random slope model. The usual assumption is that all R ij and all vectors U j = (U 0j , …, U pj ) are independent, R ij having a normal \(\mathcal{N}(0,{\sigma }^{2})\) and U j having a multivariate normal \({\mathcal{N}}_{p+1}(\mathbf{0},\mathbf{T})\) distribution. Parameters β h are regression coefficients (fixed effects), while the U hj are random effects. The presence of both of these makes (1) into a mixed linear model. In most practical cases, the variables with random effects are a subset of the variables with fixed effects (x hij = z hij for hp; pr), but this is not necessary.

More Than Two Levels

This model can be extended to a three- or more-level model for data with three or more nested levels by including random effects at each of these levels. For example, for a three level structure where level-3 units are denoted by k = 1, …, M, level-2 units by j = 1, …, N k , and level-1 units by i = 1, …, n ij , the model is

$$\begin{array}{rcl}{ Y }_{ijk}& =& {\beta }_{0} +{ \sum \limits_{h=1}^{r}{\beta }_{ h}}\,{x}{hijk} + {U}_{0jk} +{ \sum \limits_{h=1}^{p}}{U}_{ hjk}\,{z}_{hijk} + {V }_{0k} \\ & & +{\sum\limits_{h=1}^{q}}{V }_{ hk}\,{w}_{hijk} + {R}_{ijk}, \\ \end{array}$$
(2)

where the U hjk are the random effects at level 2, while the V hk are the random effects at level 3. An example is research into outcome variables Y ijk of students (i) nested in classrooms ( j) nested in schools (k), and the presence of error terms at all three levels provides a basis for testing effects of pupil variables, classroom or teacher variables, as well as school variables.

The development both of inferential methods and of applications was oriented first to this type of nested models, but much interest now is given also to the more general case where the restriction of nested random effects is dropped. In this sense, multilevel analysis refers to methodology of research questions and data structures that involve several sources of variation – each type of units then refers to a specific source of variation, with or without nesting. In social science applications this can be fruitfully applied to research questions in which different types of actor and context are involved; e.g., patients, doctors, hospitals, and insurance companies in health-related research; or students, teachers, schools, and neighborhoods in educational research. The word “level” then is used for such a type of units. Given the use of random effects, the most natural applications are those where each “level” is associated with some population of units.

Longitudinal Studies

A special area of application of multilevel models is longitudinal studies, in which the lowest level corresponds to repeated observations of the level-two units. Often the level-two units are individuals, but these may also be organizations, countries, etc. This application of mixed effects models was pioneered by Laird and Ware (1982). An important advantage of the hierarchical linear model over other statistical models for longitudinal data is the possibility to obtain parameter estimates and tests also under highly unbalanced situations, where the number of observations per individual, and the time points where they are measured, are different between individuals. Another advantage is the possibility of seamless integration with nesting if individuals within higher-level units.

Model Specification

The usual considerations for model specification in linear models apply here, too, but additional considerations arise from the presence in the model of the random effects and the data structure being nested or having multiple types of unit in some other way. An important practical issue is to avoid the ecological fallacy mentioned above; i.e., to attribute fixed effects to the correct level. In the original paper by Robinson (1950), one of the examples was about the correlation between literacy and ethnic background as measured in the USA in the 1930s, computed as a correlation at the individual level, or at the level of averages for large geographical regions. The correlation was .203 between individuals, and .946 between regions, illustrating how widely different correlations at different levels of aggregation may be.

Consider a two-level model (1) where variable X 1 with values x 1ij is defined as a level-1 variable – literacy in Robinson’s example. For “level-2 units” we also use the term “groups.” To avoid the ecological fallacy, one will have to include a relevant level-2 variable that reflects the composition of the level-2 units with respect to variable X 1. The mostly used composition variable is the group mean of X 1,

$$\bar{{x}}_{1.j} = \frac{1} {{n}_{j}}{ \sum \limits_{i=1}^{{n}_{j} }}{x}_{1ij}.$$

The usual procedure then is to include x 1ij as well as \(\bar{{x}}_{1.j}\) among the explanatory variables with fixed effects. This allows separate estimation of the within-group regression (the coefficient of x 1ij ) and the between-group regression (the sum of the coefficients of x 1ij and \(\bar{{x}}_{1.j}\)).

In some cases, notably in many economic studies (see Greene 2003), researchers are interested especially in the within-group regression coefficients, and wish to control for the possibility of unmeasured heterogeneity between the groups. If there is no interest in the between-group regression coefficients one may use a model with fixed effects for all the groups: in the simplest case this is

$${Y }_{ij} = {\beta }_{0} +{ \sum \limits_{h=1}^{r}{\beta }_{ h}}\,{x}_{hij} + {\gamma }_{j} + {R}_{ij}.$$
(3)

The parameters γ j (which here have to be restricted, e.g., to have a mean 0 in order to achieve identifiability) then represent all differences between the level-two units, as far as these differences apply as a constant additive term to all level-1 units within the group. For example in the case of longitudinal studies where level-2 units are individuals and a linear model is used, this will represent all time-constant differences between individuals. Note that (3) is a linear model with only one error term.

Model (1) implies the distribution

$$\mathbf{y} \sim {\mathcal{N}}_{p}\left (\mathbf{X}\,\beta \mathbf{,Z\,T\,Z'} + {\sigma }^{2}I\right ).$$

Generalizations are possible where the level-1 residual terms R ij are not i.i.d.; they can be heteroscedastic, have time-series dependence, etc. The specification of the variables Z having random effects is crucial to obtain a well-fitting model. See Chap. 9 of Snijders and Bosker (1999), Chap. 9 of Raudenbush and Bryk(2002), and Chap. 3 of de Leeuw and Meijer(2008).

Inference

A major reason for the take-off of multilevel analysis in the 1980s was the development of algorithms for maximum likelihood estimation for unbalanced nested designs. The EM algorithm (Dempster et al. 1981), Iteratively Reweighted Least Squares (Goldstein 1986), and Fisher Scoring (Longford 1987) were applied to obtain ML estimates for hierarchical linear models. The MCMC implementation of Bayesian procedures has proved very useful for a large variety of more complex multilevel models, both for non-nested random effects and for generalized linear mixed models; see Browne and Draper (2000) and Chap. 2 of de Leeuw and Meijer (2008).

Hypothesis tests for the fixed coefficients β h can be carried out by Wald or Likelihood Ratio tests in the usual way. For testing parameters of the random effects, some care must be taken because the estimates of the random effect variances τ hh 2 (the diagonal elements of T) are not approximately normally distributed if τ hh 2 = 0. Tests for these parameters can be based on estimated fixed effects, using least squares estimates for U hj in a specification where these are treated as fixed effects (Bryk and Raudenbush 2002, Chap. 3); based on appropriate distributions of the log likelihood ratio; or obtained as score tests (Berkhof and Snijders2001).

About the Author

Professor Snijders is Elected Member of the European Academy of Sociology (2006) and Elected Correspondent of the Royal Netherlands Academy of Arts and Sciences (2007). He was awarded the Order of Knight of the Netherlands Lion (2008). Professor Snijders was Chairman of the Department of Statistics, Measurement Theory, and Information Technology, of the University of Groningen (1997–2000). He has supervised 52 Ph.D. students. He has been associate editor of various journals, and Editor of Statistica Neerlandica (1986–1990). Currently he is co-editor of Social Networks, Associate editor of Annals of Applied Statistics, and Associate editor of Journal of Social Structure. Professor Snijders has (co-)authored about 100 refereed papers and several books, including Multilevel analysis. An introduction to basic and advanced multilevel modeling. (with Bosker, R.J., London etc.: Sage Publications, 1999). In 2005, he was awarded an honorary doctorate in the Social Sciences from the University of Stockholm.

Cross References

Bayesian Statistics

Cross Classified and Multiple Membership Multilevel Models

Mixed Membership Models

Moderating and Mediating Variables in Psychological Research

Nonlinear Mixed Effects Models

Research Designs

Statistical Analysis of Longitudinal and Correlated Data

Statistical Inference in Ecology