Prevost's original Bayesian three-level hierarchical model
The three-level Bayesian hierarchical model proposed by Prevost et al. [
5] extends the standard two-level random-effects meta-analysis [
13] to include an extra level to allow for variability in effect sizes between different types of evidence (e.g., randomised versus non-randomised study designs). In addition to variability between study estimates within each study type, this model has the capacity to deal with any added uncertainty due to study design [
14]. The three levels allow for inferences to be made at the study, study type, and population levels. Although the model can accomodate more than two types of study designs, the application presented by Prevost et al. [
5] combined evidence from two study types, randomised and non-randomised.
This model can be written as follows:
(i = 1 or 2 for the 2 study types; j = 1,..., ki studies).
At the first level of the model (eq.1), yij is the estimated log relative risk in the jth study of type i, which is normally distributed with mean ψij and variance sij
2. The ψij represent the underlying effect, on the log relative risk scale, in the jth study of type i. At the second level of the model (eq.2), the ψij are distributed about an overall effect for the ith type of study, θi with σi
2 representing the between-study variability for studies of type i. At the third level of the model (eq.3) the study-type effects are distributed about an overall population effect, μ, with τ2 representing the between-study-type variability.
To try to explain between study heterogeneity, Prevost et al. [
5] extended their model to include a covariate for age at the study type level. This is shown in the equation below.
In equation
4, x
ij took the values of 0 and 1 for studies of women aged less than 50 years and studies of women 50 years and over, respectively. The same approach was used by Sampath et al. [
10] to adjust for study covariates representing continuous variables such as average age and proportion of males in each study. Grines et al. [
9] did not conduct covariate adjustment but rather used funnel plots to assess heterogeneity among individual study estimates.
Extension of Prevost's model to adjust for imbalances between study arms
While heterogeneity refers to unexplained variation, bias refers to systematic deviations from the true underlying effect due, for example, to imbalances between study arms [
2]. One potential source of bias is confounding [
15], where an extraneous factor is associated with both the exposure under study (e.g., treatment) and the outcome of interest, but is not affected by the exposure or outcome [
16]. Only when the groups being compared are balanced in all factors, both those that can be measured and those that cannot, that are associated with exposure and that affect the outcome (other than treatment) will it be certain that any observed differences between the groups are due to treatment and not the result of the confounding effects of extraneous variables. Randomisation increases the likelihood that the groups will be balanced not only in terms of the variables that we recognize and can measure but also in terms of variables that we may not recognize and may not be able to measure (i.e., unknowns) but that nevertheless may affect the outcome [
3]. In contrast, the greater likelihood of imbalances within the non-randomised studies could have implications especially when combining both types of study designs. In order to deal with this problem, we extended Prevost's three-level model to adjust for differences within studies rather than adjusting for aggregate values at the study type level as in equation
4. The proposed approach uses the variation in imbalances across studies to adjust for differences in patient characteristics between treatment arms within studies. As with RCTs, the resulting balance in patient characteristics within studies should avoid the influence of confounding.
The following presents an extension of Prevost's model based on odd ratios, but could be extended to relative risk. This analysis was undertaken using a binomial model in which the odds of the event (e.g., death) are calculated for each study and study arm level information is incorporated in the model. The model can be written as follows:
(i = 1 or 2 for the 2 study types; j = 1,..., ki studies, m = 1,.., M confounders).
It is assumed that the number of events in each arm in the jth study of type i (i.e., r
Cij and r
Tij for control (C) and treatment (T), respectively) follows a binomial distribution defined by the proportion of patients who experience the event in each arm in the jth study of type i (i.e., p
Cij and p
Tij) and the total number of patients in each arm in the jth study of type i (i.e., n
Cij and n
Tij), as shown in equation
5. Equation
6 describes the log odds for the event in the control (γ
ij) and treatment (γ
ij + ψ
ij) arms of each of the k
i studies.
This model assumes that the log odds ratio, ψij, follows a normal distribution with a mean which is the sum θi (i.e., the overall intervention effect in the ith type of studies) and a study specific bias adjustment, αm(xmTij – xmCij), that is proportional to the relative differences between the study arms in each of the studies (eq.7). In this expression, xmTij and xmCij are the values of the m-th potential confounder in each of the study arms (i.e., treatment and control) in the jth study of type i while αm represents the mean bias for the m-th potential confounding variable, across all the studies. The remaining variables were defined as before.
Prior distributions for the unknown parameters were intended to be vague. Normal priors with mean zero and variance 0.26 truncated to be positive, were specified for both random-effects standard deviations (σ
i,τ). The priors for σ
i and τ corresponded to the priors used in Grines et al. [
9] as they represented what may be considered reasonable priors in many situations [
13]. These priors support equality between studies while discounting substantial heterogeneity. A Normal prior with mean zero and variance ten was used for the overall population effect (μ). Vague Normal priors with mean zero and variance 1000 were assigned to the log odds (γ
ij's). These priors were applied to generate results both adjusted and unadjusted for potential confounders. In addition to these priors, the adjusted model also required priors for the bias coefficients (α
m) for each of the m-th potential confounders. These were also given vague Normal prior distributions with mean zero and variance 1000.
Alternative methods for potentially biased evidence
For comparison purposes, we also considered two approaches proposed to downweight the evidence from non-randomised studies. This is generally done by increasing the variance. The first method considered was the prior constraint used by Prevost et al. [
5] to assess the influence of the assumption that the randomised studies were less biased than the non-randomised studies, and hence that |μ - θ
1| < |μ - θ
2|. This approach increased the relative proportion of the between-study-type variance (τ
2) associated with the non-randomised studies compared to the randomised studies. In so doing the interpretation of μ is altered. Since the constraint gives more weight to the randomised studies, μ no longer represents the total population studied. The overall effects in the randomised and non-randomised studies are represented by θ
1 and θ
2, respectively. The second approach was the informative prior distribution used by Sutton et al. [
8] which included the evidence from the non-randomised studies via the prior for the treatment effect and combined this with a likelihood based only on the data from the randomised studies. Sutton et al. [
8] centred their informative prior for the population mean on the non-randomised pooled estimate but used a variance four times larger than that of the randomised studies. The same approach was used for the current analysis, hence an informative Normal(-0.5619,0.8179) prior distribution was specified for μ. The same prior distributions as previously specified were used for the other unknown parameters.
Analyses
All of the analyses were conducted using MCMC simulation implemented in WinBUGS 1.4.3 software [
17]. A 'burn-in' of 100 000 iterations was followed by a further 100 000 iterations during which the generated parameter values were monitored and summary statistics such as the median and 95% credible interval of the complete samples were obtained. History plots, autocorrelation plots, and various diagnostics available in the package Bayesian Output Analysis [
18], performed on two chains, were used to assess convergence. See additional file
1: Appendix for WinBUGS codes. The data are available from the author upon request.