This section outlines our framework for assessing agreement between the mean incremental net (monetary) benefits estimated from two sets of data on treatment costs and effects for trial participants. Three commonly used statistics are adapted for this purpose: (1) the mean difference; (2) the probability estimate of miscoverage; and (3) the concordance correlation coefficient [
18] between two estimates of the incremental net benefit. We define the probability estimate of miscoverage as the proportion of samples in simulated replication of trial data in which the confidence limits for the mean incremental net benefit from one data source, designated as test data, contain the mean incremental net benefit estimated from the second data source, designated as the referent or gold standard data source. We outline a strategy for estimating the miscoverage probability and show how the concordance correlation coefficient can be adapted for assessing agreement between two estimates of the mean incremental net benefit evaluated at a specified cost-effectiveness threshold. A package to implement the routines described in the remainder of the paper in [
28] is available from
https://github.com/agaye/ceeComp.
Difference between two estimates of the incremental net benefit
Consider a trial in which paired data on treatment costs and effects, denoted as
D
1 and
D
2, are available for
N trial participants randomized to one of two interventions, denoted as
A and
B. Our illustrative example in the section “
Example application to the PiPS trial” highlights two potential data sources, namely trial case report forms and data obtained from a national patient electronic system. Denote
A as control intervention and let
\(\beta_{i\lambda }\) be an estimate of the mean incremental net benefit of intervention
B relative to
A from the
ith dataset
D
i \((i = 1,\,2)\) at a specified cost-effectiveness threshold
\(\lambda\). Then a simple measure of discrepancy between the two estimates of cost-effectiveness (in the form of the incremental net benefit of intervention
B relative to
A) generated from two data sources is
\(\omega_{\lambda }\) where
$$\omega_{\lambda } = \beta_{2\lambda } - \beta_{1\lambda } .$$
(1)
The variance of
\(\omega_{\lambda }\) (after dropping the
\(\lambda s\) to simplify the notation) is given by
$$\sigma_{\omega }^{2} = \sigma_{{\beta_{1se} }}^{2} + \sigma_{{\beta_{2se} }}^{2} - 2\rho_{{\beta_{1se} ,\beta_{2se} }} ,$$
(2)
where
\(\sigma_{{\beta_{1se} }}\),
\(\sigma_{{\beta_{2se} }}\) represent standard error of the incremental net benefit from datasets 1 and 2, respectively. Incremental net benefits generated this way are likely to be correlated, as the two datasets contain information from the same patients, the parameter
\(\rho_{{\beta_{1se} ,\beta_{2se} }}\) quantifies the covariance between the two. The parameters
\(\omega\),
\(\beta_{1}\),
\(\beta_{2}\) and associated variance and covariance terms in Eqs. (
1) and (
2) are unobserved, hence will be replaced in practice with their sample counterparts
\(\hat{\omega }\),
\(\hat{\beta }_{1}\) and
\(\hat{\beta }_{2}\), respectively. We show in Appendices
A and
B that the variance and covariance terms on the right hand side of Eq. (
2) can be written in terms of the variance of costs and effects and the covariance between the two within the respective arms of a trial with parallel group design (assuming no treatment switching or cross-over effects common in cancer trials). Under the large sample assumption, an approximate statistical test of the null hypothesis that there is no difference between incremental net benefits generated from the two data sources (i.e.,
\(\omega = 0\)) can be constructed by referring an estimate
\(\hat{Z}\) of the
Z statistic to the standard normal distribution where
\(\hat{Z}\) is given by
$$\hat{Z} = \frac{{\hat{\omega }}}{{\hat{\sigma }_{\omega } }}.$$
(3)
Note that failure to reject the null hypothesis of no agreement above does not imply evidence of agreement or that the two incremental net benefits are equivalent. A statistical test of equivalence if required can be constructed by specifying an equivalence margin
δ followed by two one-sided tests of the hypothesis that
|ω| < ± δ [
38].
Probability of miscoverage
This section introduces the probability estimate of miscoverage as a statistic for assessing agreement between two cost-effectiveness estimates. Miscoverage probabilities have previously been used in the health economics literature [
27] to compare the performance of different methods for estimating confidence intervals for the ICER. However, unlike Polsky et al., we base our assessment on the incremental net benefit rather than the ICER for the reasons stated in the introduction. For any two data sources that are available for the economic evaluation, we first designate one data source as referent data and the other as test data. From the referent dataset, we calculate
\(\hat{\beta }_{ref,\lambda }\), the sample estimate of the underlying population mean incremental net benefit
\(\beta_{ref,\lambda }\) at cost-effectiveness threshold
\(\lambda\). Next, we sample with replacement several times to generate
S bootstrap replicates of the test data. For each replicate dataset, we calculate a bootstrap estimate of the incremental net benefit and the associated variance given by Eq. (
9) of
Appendix A. Finally, we obtain the probability of miscoverage by counting the proportion of the
S bootstrap replicates in which the (95%) confidence intervals for the incremental net benefit statistic does not contain the corresponding estimate from the referent dataset.
Concordance correlation
Lin [
18] introduced the concordance correlation coefficient,
\(\rho_{c}\) and used it to quantify agreement or reproducibility of a clinical assay, test, or measuring instrument compared to the current measure or a gold standard. In doing so, Lin [
18‐
20] defined perfect agreement between two measurements as a 45° line passing through the origin of the Cartesian (
X,
Y) plane so that deviations from this line indicate evidence of disagreement. The concordance correlation coefficient quantifies this deviation in terms of the precision and accuracy of the new measure compared to the gold standard. As a correlation coefficient,
\(\rho_{\text{c}}\) satisfies the inequality
\(- 1 \le \rho_{\text{c}} \le 1\) where
\(\rho_{\text{c}} = 1\) indicates perfect agreement,
\(\rho_{\text{c}} = 0\) no agreement and
\(\rho_{\text{c}} = - 1\) perfect inverse agreement.
To adapt Lin’s method for our purpose, let
\(\left( {D_{j1} = \left\{ {C_{j1} ,E_{j1} ,t_{j} } \right\},D_{j2} = \left\{ {C_{j2} ,E_{j2} ,t_{j} } \right\}} \right)\) denote again our paired outcome information (comprising of treatment costs
\(C_{jt}\) and effects
\(E_{jt}\)) for the
jth patient (
\(j = 1,2, \ldots ,N\)) in treatment group
\(t_{j} = A\,or\,{B}\) from a bivariate population with mean incremental net monetary benefit
\((\beta_{1} ,\beta_{2} )\) and variance
\((\sigma_{{\beta_{1} }}^{2} ,\sigma_{{\beta_{2} }}^{2} )\) at specified cost-effectiveness threshold. Following Lin [
18], the degree of concordance between incremental net-benefits generated from the two data sources can be quantified by the expected value of the squared difference on the incremental net benefit scale:
$$E[(D_{2} - D_{1} )^{2} ] = (\beta_{2} - \beta_{1} )^{2} + \sigma_{{\beta_{1} }}^{2} + \sigma_{{\beta_{2} }}^{2} - 2\rho_{{\beta_{1} \beta_{2} }}.$$
(4)
where
\( \sigma_{\beta1}\, \text{and} \,\sigma_{\beta2} \) represent standard deviation of incremental net benefit generated from the two datasets and
\(\rho_{{\beta1}{\beta2}}\) are the covariance between the two. Lin [
18] showed that Eq. (
4) can be written in terms of the Pearson correlation coefficient
\(\rho\) which he suggested provided a measure of precision (i.e., “how far each observation deviates from the best fitted line”) and a bias correction factor
\(C_{\text{b}}\) that measures accuracy (i.e., “how far the best fitted line deviates from the 45° line”):
$$\rho_{\text{c}} = \rho C_{\text{b}} \quad {\text{where}}\quad C_{\text{b}} = \frac{{2\sigma_{{\beta_{1} }} \sigma_{{\beta_{2} }} }}{{(\beta_{2} - \beta_{1} )^{2} + \sigma_{{\beta_{1} }}^{2} + \sigma_{{\beta_{2} }}^{2} }}.$$
when used to assess agreement between pairs of measurements, an estimate
\(\hat{\rho }_{\text{c}}\) of
\(\rho_{\text{c}}\) is obtained by replacing the parameters in Eq. (
4) with their sample estimates. Hence, in our adaptation of Lin’s method, we define
\(\hat{\rho }_{\text{c}}\) in terms of the incremental net benefit generated from two data sources:
$$\hat{\rho }_{\text{c}} = \frac{{2\hat{\rho }_{{\beta_{1} ,\beta_{2} }} }}{{(\hat{\beta }_{2} - \hat{\beta }_{1} )^{2} + \hat{\sigma }_{{\beta_{1} }}^{2} + \hat{\sigma }_{{\beta_{2} }}^{2} }},$$
(5)
where
\(\hat{\beta }_{2}\) and
\(\hat{\beta }_{1}\) represent sample estimates of the incremental net benefit from the respective datasets,
\(\hat{\sigma }_{{\beta_{1} }}\) and
\(\hat{\sigma }_{{\beta_{2} }}\) represent estimates of the corresponding standard deviation and
\(\hat{\rho }_{{\beta_{1} ,\beta_{2} }}\) estimate of the covariance between the two. Again as shown in Appendices
A and
B, the parameters on the right-hand side of Eq. (
5) can be written in terms of the arm-specific estimates of the mean costs and effects given by Eq. (
6) and associated variance and covariance terms given by Eqs. (
9) and (
16), respectively. Finally, to estimate a confidence interval and carry out hypotheses tests, Lin [
18] suggested the Fisher
Z transformation as a useful approximation to the standard normal distribution with mean
$$Z_{{\rho_{\text{c}} }} = \frac{1}{2}\ln \left( {\frac{{1 + \rho_{\text{c}} }}{{1 - \rho_{\text{c}} }}} \right),$$
and variance
\(\sigma_{{Z_{{\rho_{\text{c}} }} }}^{2}.\) An estimate
\(Z_{{\hat{\rho }_{\text{c}} }}\) and
\(\sigma_{{Z_{{\hat{\rho }_{\text{c}} }} }}^{2}\) of
\(Z_{{\rho_{\text{c}} }}\) and
\(\sigma_{{Z_{{\rho_{\text{c}} }} }}^{2}\) can be obtained using bootstrapping before re-transforming back to the original scale.
Statistical tests of the hypothesis that
\(\rho_{\text{c}}\) is greater than an arbitrarily defined threshold value,
\(\rho_{{{\text{c}}0}}\), can be constructed using the transformed parameters and one-sided
p values generated for a specified level of significance. Concordance correlation coefficient thresholds often cited in the literature as indicating acceptable levels of agreement include
\(\rho_{\text{c0}} > 0.4\) [
4] and
\(\rho_{{{\text{c}}0}} > 0.65\) [
11] with coefficients greater than 0.8 generally taken as good evidence of agreement [
11,
22]. Rather than define an arbitrary threshold value, an alternative strategy suggested by Lin [
19] is to estimate
\(\rho_{{{\text{c}}0}}\) through the expression
\(\rho_{\text{c0}} = C_{\text{b}} \sqrt {\rho^{2} - x}\) where
\(x\) represents a pre-specified percentage loss in precision that is acceptable for the particular measure or clinical scenario under investigation and
\(\rho\) is again the Pearson correlation coefficient. For example,
\(x = 0.05\) for a 5% acceptable loss in precision. In our adaptation of this approach, if we designate one dataset as the referent data and another as the test dataset, then
\(x\) represents the percentage loss in precision in mean incremental net monetary benefit generated from the test data that can be considered acceptable compared with the corresponding estimate obtained using the referent dataset. Statistical tests of the hypothesis that
\(\rho_{\text{c}} > \rho_{\text{c0}}\) can then be constructed and one-sided
p-values estimated.