As illustrated in Fig.
1, the calibrated meta-analysis combined weighted RCT data. Patients in each RCT were weighted to equate the baseline characteristics between the RCT and the target population. As a result, the weighted RCT samples resemble the target population more closely than the unweighted samples.
More specifically, the calibrated meta-analysis involved three stages. First, trial participation weights were computed for all patients in each RCT. To calculate weights, for each RCT, we first formed a new dataset that stacked the data from the target population and that RCT. For each stacked dataset, we defined a population membership indicator as 1 for patients in the target population and 0 for patients in the RCT. Next, we fit a logistic regression of the membership indicator given the baseline covariates as predictors to estimate the probability of being in the target population for each RCT participant [
33,
34]. These
participation scores were denoted
\({\widehat{e}}_{j}\), where
\(j\) indexes individuals. Next, the participation weights by the odds were defined as
\({w}_{j}={\widehat{e}}_{j}/(1-{\widehat{e}}_{j})\) for the participants in each RCT [
33‐
35]. Note that only participants in the trials were weighted to the target population. These weights were then used in subsequent analyses to make the RCT samples more similar to the target population on the baseline covariates. To assess this similarity, we calculated absolute standardized mean differences (ASMDs) of each of the baseline covariates between each of the RCTs and the target population [
36]. We compared ASMDs calculated before and after weighting to assess how much the weighing improved similarity. In addition, we averaged the ASMDs of all baseline covariates for each RCT to quantify overall similarity for each RCT. An ASMD less than 0.1 is indicative of good balance in covariates between an RCT and the target population [
37,
38]. Second, we estimated the TATE using each trial by fitting weighted regressions of the outcome with the weights
\({w}_{j}\) using the survey package in R [
39]. Third, we conducted a meta-analysis using the estimated TATEs. To account for between-study treatment effect heterogeneity, we fit a random-effects meta-analysis model with the DerSimonian and Laird inverse-variance method [
40]. The standard deviation of the random effects, denoted by
\(\tau ,\) is used to assess the between-study treatment effect heterogeneity.
To obtain accurate TATE estimates, two key assumptions are required [
19]. First, the span of the target population characteristics should be (at least somewhat) represented in RCTs, the so-called
positivity assumption. This means that everyone in the population had to have a positive probability of participating in each RCT. Otherwise, we can only extrapolate results from the RCT to the represented part of the population. Second, there should be no unmeasured effect moderators. The participation weights can only adjust for differences in the observed baseline covariates (i.e., potential effect moderators) between each RCT and the target population. Unmeasured effect moderators may lead to unreliable TATE estimates.
We also carried out a random-effects meta-analysis using unweighted outcomes and compared the results. In addition, we conducted a subgroup analysis including only RCT patients from North America as a sensitive analysis. All analyses were executed using R version 3.6.3 [
41].