Context and data structure
We focus on the following setting, which is common in submissions to HTA agencies. Let S and T denote indicators for the assigned study and the assigned treatment, respectively. There are two separate studies that enrolled distinct sets of participants and have now been completed. The index study (S=1) compares active treatment A (T=1) versus C (T=0), e.g. standard of care or placebo. The competitor study (S=2) evaluates active treatment B (T=2) versus C (T=0). Covariate-adjusted indirect comparisons such as MAIC perform a treatment comparison in the S=2 sample, implicitly assumed to be of policy interest. We ask ourselves the question: what would be the marginal treatment effect for A versus B had these treatments been compared in an RCT conducted in S=2?
The marginal treatment effect for
A vs.
B is estimated on the linear predictor (e.g. mean difference, log-odds ratio or log hazard ratio) scale as:
$$\hat{\Delta}_{12}^{(2)} = \hat{\Delta}_{10}^{(2)} - \hat{\Delta}_{20}^{(2)},$$
(1)
where
\(\hat {\Delta }_{10}^{(2)}\) is an estimate of the hypothetical marginal treatment effect for
A vs.
C in the competitor study sample, and
\(\hat {\Delta }_{20}^{(2)}\) is an estimate of the marginal treatment effect of
B vs.
C in the competitor study sample. MAIC uses weighting to transport inferences for the marginal
A vs.
C treatment effect from
S=1 to
S=2. The estimate
\(\hat {\Delta }_{10}^{(2)}\) is produced, which is then input into Eq.
1. Because the within-trial relative effect estimates are assumed statistically independent, their variances are summed to estimate the variance of the marginal treatment effect for
A vs.
B.
The manufacturer submitting evidence for reimbursement has access to individual-level data \(\mathcal {D}_{AC}=({\boldsymbol {x},\boldsymbol {t},\boldsymbol {y}})\) on covariates, treatment and outcomes for the participants in its trial. Here, x is a matrix of pre-treatment baseline covariates (e.g. comorbidities, age, gender), of size n×k, where n is the total number of subjects in the study sample and k is the number of covariates. A row vector xi=(xi,1,xi,2,…,x1,k) of k covariates is recorded for each participant i=1,…n. We let y=(y1,y2,…,yn) denote a vector of the clinical outcome of interest and t=(t1,t2,…,tn) denote a binary treatment indicator vector. We shall assume that there is no loss to follow-up or missing data on covariates, treatment and outcome in \(\mathcal {D}_{AC}\).
We consider all baseline covariates to be prognostic of the clinical outcome and select a subset of these, z⊆x, as marginal effect modifiers for A with respect to C on the linear predictor scale, with a row vector zi recorded for each patient i. In the absence of randomization, the variables in x would induce confounding between the treatment arms in the index study (internal validity bias). On the other hand, cross-trial imbalances in the variables in z induce external validity bias with respect to the competitor study sample.
Neither the manufacturer submitting the evidence nor the HTA agency evaluating it have access to IPD for the competitor trial. We let \(\mathcal {D}_{BC}=[\boldsymbol {\theta }_{\boldsymbol {x}}, \hat {\Delta }_{20}^{(2)}, \hat {V}(\hat {\Delta }_{20}^{(2)})]\) represent the published ALD that is available for this study. No patient-level covariates, treatment or outcomes are available. Here, θx denotes a vector of means or proportions for the covariates; although higher-order moments such as variances may also be available. An assumption is that a sufficiently rich set of baseline covariates has been measured for the competitor study. Namely, that summaries for the subset θz⊆θx of covariates that are marginal effect modifiers are described in the table of baseline characteristics in the study publication.
Also available is an internally valid estimate \(\hat {\Delta }_{20}^{(2)}\) of the marginal treatment effect for B vs. C in the competitor study sample, and an estimate \(\hat {V}(\hat {\Delta }_{20}^{(2)})\) of its variance. These are either directly reported in the publication or, assuming that the competitor study is a well-conducted RCT, derived from crude aggregate outcomes in the literature.
Matching-adjusted indirect comparison
In MAIC, IPD from the index study are weighted so that the moments of selected covariates are balanced with respect to the published moments of the competitor study. The weight
wi for each participant
i in the index trial is estimated using a logistic regression:
$$\ln(w_{i}) = \ln[w(\boldsymbol{z}_{i})] = \ln \left[ \frac{Pr(S=2 \mid \boldsymbol{z}_{i})}{1 - Pr(S=2 \mid \boldsymbol{z}_{i})} \right] = \alpha_{0} + \boldsymbol{z}_{i}\boldsymbol{\alpha}_{\boldsymbol{1}},$$
(2)
where
α0 is the model intercept and
α1 is a vector of model coefficients. While most applications of weighting, e.g. to control for confounding in observational studies, construct “inverse probability” weights for
treatment assignment, MAIC uses “odds weighting” [
39,
40] to model
trial assignment. The weight
wi represents the conditional odds that an individual
i with covariates
zi, selected as marginal effect modifiers, is enrolled in the competitor study. Alternatively, the weight represents the inverse conditional odds that the individual is enrolled in the index study.
The logistic regression parameters in Eq.
2 cannot be derived using conventional methods such as maximum-likelihood estimation, due to unavailable IPD for the competitor trial. Signorovitch et al. propose using a method of moments instead to enforce covariate balance across studies [
11]. Prior to balancing, the IPD covariates are centered on the means or proportions published for the competitor trial. The centered covariates for subject
i in the IPD are defined as
\(\boldsymbol {z}^{\boldsymbol {*}}_{i} = \boldsymbol {z}_{i} - \boldsymbol {\theta }_{\boldsymbol {z}}\).
Weight estimation involves minimizing the objective function:
$$Q(\boldsymbol{\alpha}_{\boldsymbol{1}}) = \sum\limits_{i=1}^{n} \exp \left(\boldsymbol{z}^{\boldsymbol{*}}_{i} \boldsymbol{\alpha}_{\boldsymbol{1}}\right).$$
(3)
The function
Q(
α1) is convex [
11] and can be minimized using standard convex optimization algorithms [
41]. Provided that there is adequate overlap, minimization yields the unique finite solution:
\(\hat {\boldsymbol {\alpha }}_{\boldsymbol {1}}=\text {argmin}[Q(\boldsymbol {\alpha }_{\boldsymbol {1}})]\). Feasible solutions do not exist if all the values observed for a covariate in
z are greater or lesser than its corresponding element in
θz [
22].
After minimizing the objective function in Eq.
3, the weight estimated for the
i-th participant in the IPD is:
$$\hat{w}_{i} = \exp(\boldsymbol{z}^{\boldsymbol{*}}_{i}\hat{\boldsymbol{\alpha}}_{\boldsymbol{1}}).$$
(4)
The estimated weights are relative, in the sense that any weights that are proportional are equally valid [
22]. Weighting reduces the ESS of the index trial. The approximate ESS after weighting is typically estimated as
\(\left (\sum _{i}^{n}\hat {w}_{i}\right)^{2}/\sum _{i}^{n}\hat {w}_{i}^{2}\) [
5,
42]. Low values of the ESS suggest that a few influential participants with disproportionate weights dominate the reweighted sample.
Consequently, marginal mean outcomes for treatments
A and
C in the competitor study sample (
S=2) are estimated as the weighted average:
$$\hat{\mu}^{(2)}_{t} = \frac{\sum_{i=1}^{n_{t}} y_{i,t} \hat{w}_{i,t}}{\sum_{i=1}^{n_{t}} \hat{w}_{i,t}},$$
(5)
where
nt denotes the number of participants assigned to treatment
t∈{0,1} of the index trial,
yi,t represents the observed clinical outcome for subject
i in arm
t, and
\(\hat {w}_{i,t}\) is the weight assigned to patient
i under treatment
t. For binary outcomes,
\(\hat {\mu }_{t}\) would estimate the expected marginal outcome probability under treatment
t. Absolute outcome estimates may be desirable as inputs to health economic models [
25] or in unanchored comparisons made in the absence of a common control group.
In anchored comparisons, the objective is to estimate a relative effect for
A vs.
C, as opposed to absolute outcomes. Indirect treatment comparisons are typically conducted on the linear predictor scale [
3,
4,
6]. Consequently, this scale is also used to define effect modification, which is scale specific [
5].
One can convert the mean absolute outcome predictions produced by Eq.
5 from the natural scale to the linear predictor scale, and compute the marginal treatment effect for
A vs.
C in
S=2 as the difference between the average linear predictions:
$$\hat{\Delta}_{10}^{(2)} = g \left(\hat{\mu}_{1}^{(2)} \right) - g \left(\hat{\mu}_{0}^{(2)} \right).$$
(6)
Here, g(·) is an appropriate link function, e.g. the identity link produces a mean difference for continuous-valued outcomes, and the \(\text {logit} \left (\hat {\mu }^{(2)}_{t} \right) = \ln \left [\hat {\mu }^{(2)}_{t}/\left (1-\hat {\mu }^{(2)}_{t} \right)\right ]\) generates a log-odds ratio for binary outcomes. Different, potentially more interpretable, choices such as relative risks and risk differences are possible for the marginal contrast. One can map to these scales by manipulating \(\hat {\mu }_{1}^{(2)}\) and \(\hat {\mu }_{0}^{(2)}\) differently.
Alternatively, the weights generated by Eq.
4 can be used to fit a simple regression of outcome on treatment to the IPD [
43]. The model can be fitted using maximum-likelihood estimation, weighting the contribution of each individual
i to the likelihood by
\(\hat {w}_{i}\). In this approach, the treatment coefficient of the fitted weighted model is the estimated marginal treatment effect
\(\hat {\Delta }_{10}^{(2)}\) for
A vs.
C in
S=2.
The original approach to MAIC uses a robust sandwich-type variance estimator [
44] to compute the standard error of
\(\hat {\Delta }_{10}^{(2)}\). This relies on large-sample properties and has understated variability with small ESSs in a previous simulation study investigating MAIC [
7] and in other settings [
45‐
48]. In addition, most implementations of the sandwich estimator, e.g. when fitting the weighted regression [
49], ignore the estimation of the trial assignment model, assuming the weights to be fixed quantities. While analytic expressions that incorporate the estimation of the weights could be derived, a practical alternative is to resample via the ordinary non-parametric bootstrap [
23,
50,
51], re-estimating the weights and the marginal treatment effect for
A vs.
C in each bootstrap iteration. Point estimates, standard errors and interval estimates can be directly calculated from the bootstrap replicates.
We briefly describe the assumptions required by MAIC and their implications:
1
Internal validity of the effect estimates derived from the index and competitor studies. This is certainly feasible where the studies are RCTs because randomization ensures exchangeability over treatment assignment on expectation. While internal validity may hold in RCTs, it is a more stringent condition for observational studies. The absence of informative measurement error, missing data, non-adherence, etc. is assumed.
2
Consistency under parallel studies [
52]. There is only one well-defined version of each treatment [
53] or any variations in the versions of treatment are irrelevant [
54,
55]. This applies to the common comparator
C in particular.
3
Conditional transportability (exchangeability) of the marginal treatment effect for
A vs.
C from the index to the competitor study [
39]. Namely, trial assignment does not affect this measure, conditional on
z. Prior research has referred to this assumption as the
conditional constancy of relative effects [
5,
6,
9]. It is plausible if
z comprises all of the covariates that are considered to modify the marginal treatment effect for
A vs.
C (i.e., there are no unmeasured effect modifiers) [
56‐
58]
1.
4
Sufficient overlap. The ranges of the selected covariates in
S=1 should cover their respective moments in
S=2. Overlap violations can be deterministic or random. The former arise structurally, due to non-overlapping trial target populations (eligibility criteria). The latter arise empirically due to chance, particularly where sample sizes are small [
60]. Therefore, overlap can be assessed based on absolute sample sizes. The ESS is a convenient one-number diagnostic.
5
Correct specification of the S=2 covariate distribution. Analysts can only approximate the joint distribution because IPD are unavailable for the competitor study. Covariate correlations are rarely published for
S=2 and therefore cannot be balanced by MAIC. In that case, they are assumed equal to those in the pseudo-sample formed by weighting the IPD [
5].
I make a brief remark on the specification of the parametric trial assignment model in Eq.
2. This does not necessarily need to be correct as long as it balances all the covariates, and potential transformations of these covariates, e.g. polynomial transformations and product terms, that modify the marginal treatment effect for
A vs.
C [
9,
23]. Squared terms are often included to balance variances for continuous covariates [
11] but initial simulation studies do not report performance benefits [
14,
17]. This is probably due to greater reductions in ESS and precision [
25].
The identification of effect modifiers will likely require prior background knowledge and substantive domain expertise. Bias-variance trade-offs are also important. Failing to include an influential effect modifier in
z, whether in imbalance or not, leads to bias in
S=2 [
5,
40,
61]. On the other hand, the inclusion of covariates that are not effect modifiers reduces overlap, thereby increasing the chance of extreme weights. This decreases precision without improving the potential for bias reduction [
6,
62], even if the covariates are strongly imbalanced across studies. That is, even if they predict or are associated to trial assignment.
Put simply, as is the case for other weighting-based methods [
63,
64], MAIC is potentially unbiased if either the trial assignment mechanism or the outcome-generating mechanism is known, with the latter leading to better performance due to reduced variance and increased efficiency.
Two-stage matching-adjusted indirect comparison
While the standard MAIC models the trial assignment mechanism, two-stage MAIC (2SMAIC) additionally models the treatment assignment mechanism in the index trial. The treatment assignment model is estimated to produce inverse probability of treatment weights. Then, these are combined with the odds weights generated by the standard MAIC. The resulting weights seek to balance covariate moments between the studies and the treatment arms of the index trial.
For the treatment assignment mechanism, a propensity score logistic regression of treatment on the covariates is fitted to the IPD:
$$\text{logit}[e_{i}] = \text{logit}[e(\boldsymbol{x}_{i})] = \text{logit}[Pr(T=1\mid \boldsymbol{x}_{i})] = \beta_{0} + \boldsymbol{x}_{i} \boldsymbol{\beta}_{\boldsymbol{1}},$$
(7)
where
β0 and
β1 parametrize the logistic regression. The propensity score
ei is defined as the conditional probability that participant
i is assigned treatment
A versus treatment
C given measured covariates
xi [
65].
Having fitted the model in Eq.
7, e.g. using maximum-likelihood estimation, propensity scores for the subjects in the index trial are predicted using:
$$\hat{e}_{i} = \text{expit}[\hat{\beta}_{0} + \boldsymbol{x}_{i} \hat{\boldsymbol{\beta}}_{\boldsymbol{1}}],$$
(8)
where
\(\text {expit}(\cdot)=\exp (\cdot)/[1+\exp (\cdot)], \hat {\beta }_{0}\) and
\(\hat {\boldsymbol {\beta }}_{\boldsymbol {1}}\) are point estimates of the logistic regression parameters, and
\(\hat {e}_{i}\) is an estimate of
ei. Inverse probability of treatment weights are constructed by taking the reciprocal of the estimated conditional probability of the treatment assigned in the index study [
37]. That would be
\(1/\hat {e}_{i}\) for units under treatment
A and
\(1/(1-\hat {e}_{i})\) for units under treatment
C.
Consequently, the weights produced by the standard MAIC (Eq.
4) are rescaled by the estimated inverse probability of treatment weights. The contribution of each subject
i in the IPD is weighted by:
$$\hat{\omega}_{i} = \frac{t_{i} \hat{w}_{i}}{\hat{e}_{i}} + \frac{(1-t_{i}) \hat{w}_{i}}{(1-\hat{e}_{i})}.$$
(9)
The weights
\(\{ \hat {w}_{i}, i=1,\dots,n \}\) estimated by the standard MAIC are odds, constrained to be positive. These balance the index and competitor study studies in terms of the selected effect modifier moments. The estimated propensity scores
\(\{ \hat {e}_{i},\, i=1,\dots,n \}\) are probabilities bounded away from zero and one. Therefore, the weights
\(\{ \hat {\omega }_{i},\, i=1,\dots,n \}\) produced by 2SMAIC in Eq.
9 are constrained to be positive. These weights achieve balance in effect modifier moments across studies, but also seek to balance covariate moments between the index trial’s treatment groups.
Marginal mean outcomes for treatments
A and
C in the competitor study sample are estimated as the weighted average of observed outcomes:
$$\hat{\mu}^{(2)}_{t} = \frac{\sum_{i=1}^{n_{t}} y_{i,t} \hat{\omega}_{i,t}}{\sum_{i=1}^{n_{t}} \hat{\omega}_{i,t}},$$
(10)
where
\(\hat {\omega }_{i,t}\) is the weight assigned to patient
i under treatment
t. One can convert the mean absolute outcome predictions generated by Eq.
10 to the linear predictor scale, and compute the marginal treatment effect for
A vs.
C in
S=2 as the difference between the average linear predictions, as per Eq.
6. Alternatively, a weighted regression of outcome on treatment alone can be fitted to the IPD, in which case the treatment coefficient of the fitted model represents the estimated marginal treatment effect
\(\hat {\Delta }_{10}^{(2)}\) for
A vs.
C in
S=2.
Inference can be based on a robust sandwich-type variance estimator or on resampling approaches such as the non-parametric bootstrap. As noted previously, the sandwich variance estimator is biased downwards when the ESS after weighting is small, leading to overprecision. In practice, the non-parametric bootstrap is a preferred option, re-estimating both the trial assignment model and the treatment assignment model in each iteration. This approach explicitly accounts for the estimation of the weights and is expected to perform better where the ESS is small.
It may seem counter-intuitive to estimate the treatment assignment mechanism when the index trial is an RCT. The randomized design implies that the true propensity scores {ei, i=1,…,n} are fixed and known. For instance, consider a marginally randomized two-arm trial with a 1:1 treatment allocation ratio. The trial investigators have determined in advance that the probability of being assigned to active treatment versus control is ei=0.5 for all i.
The rationale for estimating the propensity scores is the following. Randomization guarantees that there is no confounding on expectation [
66]. Nevertheless, covariate balance is a large-sample property, and one may still observe residual covariate imbalances between treatment groups due to chance, especially when the trial sample size is small [
67]. As formulated by Senn [
66], “over all randomizations the groups are balanced; for a particular randomization they are unbalanced.” The use of estimated propensity scores allows to correct for random finite-sample imbalances in prognostic baseline covariates. In the RCT literature, inverse probability of treatment weighting is an established approach for covariate adjustment [
68], and has increased precision, efficiency and power with respect to unadjusted analyses in the estimation of marginal treatment effects [
48,
69].
Insofar, the use of anchored MAIC has been limited to situations where the index trial is an RCT. 2SMAIC can be used when the index study is observational, provided that the baseline covariates in
x offer sufficient control for confounding. In non-randomized studies, the true propensity score for each participant in the index study is unknown, and additional conditions are needed to produce internally valid estimates of the marginal treatment effect for
A vs.
C. These are: (1) conditional exchangeability over treatment assignment [
70]; and (2) positivity of treatment assignment [
60]. Randomized trials tend to meet these assumptions by design. The assumptions have conceptual parallels with the conditional transportability and overlap conditions previously described for MAIC.
The first assumption indicates that the potential outcomes of subjects in each treatment group are independent of the treatment assigned after conditioning on the selected covariates. It relies on all confounders of the effect of treatment on outcome being measured and accounted for [
71]. The second assumption indicates that, for every participant in the index study, the probability of being assigned to either treatment is positive, conditional on the covariates selected to ensure exchangeability [
60]. This requires overlap between the joint covariate distributions of the subjects under treatment
A and under treatment
C. This assumption is threatened if there are few or no individuals from either treatment group in certain covariate subgroups/strata.