Vaccine efficacy (VE) assessed in a randomized controlled clinical trial can be affected by demographic, clinical, and other subject-specific characteristics evaluated as baseline covariates. Understanding the effect of covariates on efficacy is key to decisions by vaccine developers and public health authorities.
Methods
This work evaluates the impact of including correlate of protection (CoP) data in logistic regression on its performance in identifying statistically and clinically significant covariates in settings typical for a vaccine phase 3 trial. The proposed approach uses CoP data and covariate data as predictors of clinical outcome (diseased versus non-diseased) and is compared to logistic regression (without CoP data) to relate vaccination status and covariate data to clinical outcome.
Results
Clinical trial simulations, in which the true relationship between CoP data and clinical outcome probability is a sigmoid function, show that use of CoP data increases the positive predictive value for detection of a covariate effect. If the true relationship is characterized by a decreasing convex function, use of CoP data does not substantially change positive or negative predictive value. In either scenario, vaccine efficacy is estimated more precisely (i.e., confidence intervals are narrower) in covariate-defined subgroups if CoP data are used, implying that using CoP data increases the ability to determine clinical significance of baseline covariate effects on efficacy.
Conclusions
This study proposes and evaluates a novel approach for assessing baseline demographic covariates potentially affecting VE. Results show that the proposed approach can sensitively and specifically identify potentially important covariates and provides a method for evaluating their likely clinical significance in terms of predicted impact on vaccine efficacy. It shows further that inclusion of CoP data can enable more precise VE estimation, thus enhancing study power and/or efficiency and providing even better information to support health policy and development decisions.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Abkürzungen
AIC
Akaike Information Criterion
AUC
Area under the ROC curve
CI
Confidence interval
CoP
Correlate of protection
CoR
Correlate of risk
PoD
Probability of disease
PPV
Positive predictive value
ROC
Receiver operating characteristic
RR
Relative risk
VE
Vaccine efficacy
Background
This work introduces a novel use of immune response biomarkers to help identify baseline covariates affecting vaccine efficacy (VE). VE is defined as a proportional reduction in risk of disease occurrence for vaccinated subjects, compared to placebo control subjects, and is often assessed by counting disease cases and non-cases in randomized controlled clinical trials [1]. Baseline covariates refer to demographic and clinical characteristics (e.g., age, gender, race and ethnicity, or pre-vaccination serostatus) or other information (e.g., time and site of enrollment) collected from subjects before the time of randomization (i.e., random assignment to active vaccination versus placebo arm). A randomized controlled trial can be used to estimate VE even if the primary analysis does not consider baseline covariates because, due to randomization, measured and unmeasured covariates will, on average, be balanced between the vaccinated and control groups. However, VE may be affected by baseline covariates (for example, it can vary with age) and understanding the impact of covariates on VE is key to making informed decisions, not only in the development of safe and effective vaccines, but also in public health considerations post-licensure.
Statistical significance of covariate effects on binary clinical outcome (e.g., diseased versus non-diseased) is typically [2‐5] evaluated using multiple (also referred to as “multivariable”) logistic regression, which (as used here) involves incorporating multiple explanatory variables (predictors) in the regression analysis for prediction of a single binary outcome. Clinical significance of a covariate effect on VE, a notion different from that of statistical significance, can be assessed by comparing estimated relative risks (RR) between the vaccinated and control subjects in covariate-defined subgroups. Estimated effect can be large in magnitude but not prove statistically significant (e.g., due to variability, trial size, etc.), or can prove highly significant statistically, yet not clinically (e.g., when the effect size is too small to have a measurable impact on public health).
Anzeige
Efficacy trials often measure subjects’ immune response post-vaccination (immunogenicity) as an exploratory endpoint in addition to assessing the trial’s primary clinical endpoint(s). An immunogenicity biomarker which reliably predicts protection is termed correlate of protection (CoP) [6]. The first formal method to validate an immunogenicity biomarker as a CoP using data from a phase 3 trial was proposed by Prentice [7], who introduced the following criteria:
Criterion of vaccine efficacy: demonstrating vaccine effect on the clinical endpoint (e.g., occurrence of disease or infection) evaluated using case-counting
Criterion of vaccine immunogenicity: demonstrating vaccine effect on the immunogenicity biomarker
Criterion of a correlate of risk (CoR): demonstrating that the immunogenicity biomarker correlates with the clinical endpoint
Criterion of a CoP: demonstrating that the probability of the clinical endpoint is conditionally independent of vaccination status when conditioned on the immunogenicity biomarker (indicating that the full vaccine effect is mediated by immunogenicity)
The Prentice framework has gained widespread adoption [8‐11] due to its simplicity. In this work, the term “CoP” is used for a biomarker that meets all four Prentice criteria [7] and fully mediates the vaccine effect. It has been shown that CoP-based VE prediction is more precise than case-count-based VE estimation [12].
The work presented here is motivated by the need to assess a natural extension of that result: inclusion of CoP data could increase efficiency in finding covariate effects and comparing VE between subgroups (by reducing the width of CoP-based confidence interval compared to case-counting). VE can be affected in three ways (given immunogenicity is a CoP): either (i) immunogenicity measurements are distributed significantly differently across sub-populations, or (ii) the sub-populations differ in the relationship between immunogenicity and clinical outcome, or (iii) they differ in both the immunogenicity distributions and the relationship. Because the assessment of (i) is typically done using existing methods [13], the work here focuses on the harder problem of assessing (ii) and (iii).
The aim of this work is to compare two kinds of logistic regression models in terms of their ability to identify and estimate covariate effects on VE. Specifically, we compare (1) the “typical” approach, which evaluates the effects of baseline covariates and vaccination status on clinical outcome (disease status), to (2) the proposed approach (referred to as “CoP-based”) which assesses the effects of baseline covariates and CoP on clinical outcome. Both approaches enable the estimation of RR (vaccinated versus control subjects) and VE in subgroups of interest.
Anzeige
This paper is organized as follows. Sect. "Methods" describes the modeling assumptions and the typical and CoP-based logistic regression approaches used in our analysis. Sect. "Simulation Study" evaluates the properties of these approaches (their relative ability to identify baseline covariates that impact VE) using many (5000) simulated vaccine clinical trials. Sect. "Example Analysis of a Single Hypothetical Vaccine Clinical Trial Dataset" illustrates their application to one kind of typical vaccine clinical trial (using simulated data from one representative trial). Sects. "Discussion" and "Conclusions" summarize key findings and (respectively) the implications for efficacy-based decisions.
Methods
Data collection and assumptions
Because they are often collected in, e.g., a randomized controlled phase 3 vaccine clinical trial, assume that the following data are available for each subject: disease status (diseased or non-diseased), vaccination status (vaccinated or control), immunogenicity biomarker value (assumed to be a CoP), and a set of baseline covariates. Disease status is a binary variable, with value 1 in diseased and 0 in non-diseased subjects, an indicator of clinical outcome set to 1 if the disease is diagnosed (by formal trial endpoint criteria) at any time during the fixed duration of the trial’s observation period. Vaccination status is a binary variable indicating treatment, with value 1 in vaccinated and 0 in control (assumed here to be placebo) subjects. The immunogenicity biomarker is a continuous variable, typically lognormally distributed (within properly defined subgroups) and typically increased by an efficacious prophylactic treatment (vaccination). Baseline covariates can be binary, categorical (ordinal or without ordering), or continuous; they are determined at baseline, prior to randomization (and vaccination).
Let \({T}_{i}^{\text{vaccinated}}\), \({T}_{j}^{\text{control}}\) be the log immunogenicity biomarker measurement (also referred to as log-titer, since neutralizing antibody titer is often used) for \(i\)-th vaccinated subject (\(i=1,\dots , N\)) and \(j\)-th control subject (\(j=1,\dots , M\)), respectively. \(N\) and \(M\) are the number of subjects in the vaccinated and control groups, respectively. Let \({VS}_{i}^{\text{vaccinated}}\), \({VS}_{j}^{\text{control}}\) be vaccination status, and \({DS}_{i}^{\text{vaccinated}}\), \({DS}_{j}^{\text{control}}\) be disease status for \(i\)-th vaccinated subject and \(j\)-th control subject. Let \({C}_{i,k}^{\text{vaccinated}}\), \({C}_{j,k}^{\text{control}}\) be the covariate value for \(i\)-th vaccinated subject, \(j\)-th control subject and \(k\)-th baseline covariate variable (\(k=1,\dots , K\)). \(K\) represents the total number of collected baseline covariates. For a given set of \(L\) independent variables \({x}_{1}, {x}_{2}, \dots , {x}_{L}\), the log-odds of disease (\(y\)) can be estimated by logistic regression, using a linear predictor (\(lp\)) as:
where \(lp={\beta }_{0}+\sum_{l=1}^{L}{{\beta }_{l}x}_{l}\).
Alternatively, a logistic model involving an interaction term (denoted \({\beta }_{\mathrm{1,2}}\)) between independent variables \({x}_{1}\) and \({x}_{2}\) may be described as:
$$y=lp+{\beta }_{\mathrm{1,2}}{x}_{1}{x}_{2}.$$
(2)
The probability of disease (PoD), \(p\), is
$$p=\frac{1}{1+{e}^{-y}}.$$
(3)
If one of the independent variables in the logistic model is log-titer, the probability of disease will be referred to as \(PoD(T)\) or a PoD curve, a function of log-titer (as well as, potentially, other independent variables).
Both approaches described below, i.e., typical (not involving log-titer) and CoP-based (involving log-titer), can be used to evaluate statistical and clinical significance of covariate effects on clinical outcome as follows:
To assess statistical significance of covariate effects, the test for the presence of an effect is deemed positive if either the covariate or interaction effect is proved significant (e.g., at statistical significance level \(\alpha =0.05\)).
Clinical significance of any covariate effect depends on the application; it might be ascertained by comparing the relative health impact between subpopulations, defined with respect to the covariate of interest, using the VE difference associated with subpopulations in question. Thus, to assess clinical significance of a covariate effect associated with specific subpopulations, VE is estimated and compared across covariate-defined subgroups.
The models below are assumed to include all potentially clinically meaningful covariates, following the concept of a full covariate modeling approach [14].
Anzeige
Typical approach
Independent variables used to predict disease status in the typical approach are vaccination status and baseline covariate(s) of interest. Log-odds of disease is given by Eqs. 1 and 2, with \({x}_{1}=VS\), and \({x}_{2},{x}_{3},\dots ,{x}_{L}={C}_{1},{C}_{2},\dots ,{C}_{K}\).
For a given set of data, the 95% confidence interval (CI) associated with estimated VE needs to account for the uncertainty regarding the \({\beta }_{0}, {\beta }_{1}, {\beta }_{2}, {\beta }_{\mathrm{1,2}}, \dots , {\beta }_{L}\) parameters and variability in the observed data. This can be done via parametric resampling of the posterior distribution for parameters and bootstrapping the observed data in the vaccinated and control groups. The bootstrap resampling of observed data is performed on subjects: each time a subject is selected, all his/her characteristics (covariate values) are used in the estimation of VE.
CoP-based approach
Several approaches have been proposed to model the relationship between the CoP and probability of disease [12, 15, 16]. In this paper, a logistic model is used for the PoD curve estimation (see Fig. 1 for comparison between logistic model and other models [12, 15, 16]).
×
Independent variables used to predict disease status in the CoP-based approach include (as above) the baseline covariate(s) of interest, and now also include titer (usually in the form of log-titer). Log-odds of disease is given by Eqs. 1 and 2, with \({x}_{1}=T\), and \({x}_{2},{x}_{3},\dots ,{x}_{L}={C}_{1},{C}_{2},\dots ,{C}_{K}\).
Several models can be considered, when one baseline covariate, \({C}_{1}\), is evaluated (here, again as above, for illustration):
Anzeige
a model with linear term for titer, not involving interaction between the independent variables,
If the assumption of log-titer being a CoP is met (according to the Prentice framework), the effect of log-titer (linear or quadratic) is significant (among other conditions). Here, “significant” means that the coefficient involving log-titer is different from 0 at pre-specified level of statistical significance (here we adopt \(\alpha =0.05\)). The use of a quadratic term is here used as an illustration for a more general, non-linear relationship: in general, an unrealistically large amount of data is likely to be required to distinguish the curvature of different models in this context. The importance of logistic models with the non-linear effect of log-titer was highlighted by Callegaro and Tibaldi, 2019 [17], who demonstrated that lack of fit of a model (e.g., when using a linear effect of log-titer in the context of high VE) leads to substantial loss in power to meet Prentice criteria. Although these learnings are applicable primarily to the CoP assessment (an objective different from ours), the CoP-based approach to evaluate baseline covariate effects (proposed here) is analogous to evaluating Prentice criterion four (i.e., evaluating the effect of vaccination status when controlling for log-titer).
CoP-based VE can be determined for each of the models above using Eq. 6. The 95% CI is calculated as described above; log-titer is treated as any other covariate. Accuracy and precision of CoP-based VE, as well as the coverage probability of the respective confidence intervals, are evaluated in Sect. "Simulation Study".
A similar approach for predicting VE (without covariates) was described by Coudeville et al., 2010 [16], who used a different functional form in representing the PoD curve (Fig. 1C), and pre-vaccination and post-vaccination immune marker measurements in the vaccinated subjects (instead of immune marker data post-vaccination in the vaccinated and control groups used here).
Simulation Study
Overview of the simulation process
To test relative performance of the typical and the CoP-based approach in identifying impactful baseline covariates, four steps were performed:
Step 1: Assumed true values were assigned (i) to PoD curve parameters, and (ii) to log-titer distribution parameters for all covariate-defined subgroups of the vaccinated and control group.
Step 2: Log-titer data and baseline covariate data were generated for all vaccinated and control in silico subjects using random sampling from true distributions. Disease status was assigned to each subject randomly using the probability of disease defined by the true PoD curve.
Step 3: To evaluate statistical significance of a baseline covariate, p-values associated with estimated regression coefficients in Eqs. 4, 5 (typical approach), and 9, 10 (CoP-based approach, with linear term for log-titer) were used as described in Sect. "Data collection and assumptions".
Step 4: To evaluate clinical significance, the best-fitting model for each of the two approaches was selected, using the Akaike Information Criterion (AIC), from Eqs. 4 or 5 (covariate model for the typical approach), and from Eqs. 9, 10, 11, or 12 (structural and covariate model for the CoP-based approach). The selected model was used to estimate VE as described in Sects. "Typical approach" and "CoP-based approach". (By “structural model” we mean the linear, quadratic, or other form of dependence of probability of disease on log-titer.)
Steps 2 to 4 were repeated 5000 times to yield 5000 sets of data and corresponding results of covariate analysis (i.e., p-values of fitted coefficients in Step 3 and VE estimates in covariate-defined subgroups in Step 4), which were compared to the “truth” (values implied by the assigned model and parameter values).
Alternatively, the assessment of statistical significance of baseline covariate(s) may be performed using the best-fitting model, selected in Step 4. Even when numerous covariates and non-linear log-titer dependence are considered, it can be advantageous (in terms of finding the most parsimonious model) to take the stepwise-approach proposed above. This approach would be to use a linear term for log-titer in Step 3 for statistical significance assessment, and proceed to Step 4 to find the best-fitting structural model (e.g., potentially quadratic dependence on log-titer) for CoP-based VE estimation only if any covariate(s) are significant.
Data generation and parameter estimation
Five thousand datasets representing typical phase 3 vaccine efficacy trials were simulated for each of the four scenarios, numbered i through iv, defined by parameter values shown in Table 1, and illustrated in Figs. 1, 2 and 3. For ease of understanding and computational efficiency, only one binary baseline covariate (e.g., age group, for illustration) was considered. Difference in VE across subgroups (80% in younger, 50% in older), in simulated scenarios ii and iv, was driven by the difference in the PoD curves (Fig. 2). Simulated immunogenicity distributions of the vaccinated and control subjects were the same in younger and older subgroups in all scenarios (Fig. 3).
Table 1
Parameter values used as a “truth” for simulated data
All scenarios: 75% of the trial population subjects are in the “younger” group and 25% in the “older” (i.e., 3:1 ratio of younger to older), with a 2:1 ratio of vaccinated to control. Parameters were chosen to obtain the indicated VE in the respective populations for each scenario while minimizing the number of parameter value differences between scenarios
Scenarios iii and iv: The Hill function was chosen for simplicity as the representative example of a sigmoid function; similar results can be expected if the true PoD curve followed a scaled logistic model (Fig. 1C), given its high similarity to a Hill function [18]
×
×
The true PoD curve was represented by a logistic function in scenarios i and ii:
where \(A\) represents the age group, with \(A=1\) for younger participants and \(A=0\) for older participants. In these forms, the \({\beta }_{\mathrm{1,2}}\) and \(k\) parameters, respectively, represent a shift in log-titer required to provide a given protection for older subjects (versus that for younger).
Each simulated dataset was fitted with six logistic models (derived from Eqs. 4, 5, 9, 10, 11, 12, respectively):
Age group, indicated by marker \(A\) (with values of \({A}_{i}^{\text{vaccinated}}\), \({A}_{j}^{\text{control}}\) for \(i\)-th vaccinated subject, and \(j\)-th control subject), was termed a significant covariate in the typical approach if p-value associated with \({\beta }_{2}\) (being different from 0) in Eq. 15 or \({\beta }_{\mathrm{1,2}}\) (similarly) in Eq. 16 was less than \(0.05\). In the CoP-based approach, the effect of age group was deemed significant if p-value associated with \({\beta }_{2}\) in Eq. 17 or \({\beta }_{\mathrm{1,2}}\) in Eq. 18 was less than \(0.05\). (Issues of p-value correction for multiple testing are not addressed here.)
Table 2 summarizes true positive rates (percentage of simulated trials in which the effect of age group was correctly identified as significant in scenarios ii, iv), true negative rates (percentage of simulated trials in which the effect of age group was correctly identified as not significant in scenarios i, iii), positive predictive values (probability that age group impacts the outcome, when its effect was found significant), negative predictive values (probability that age group does not impact the outcome, when its effect was not found significant), and areas under the ROC curve (AUC).
Table 2
Results of age group effect assessment in 5000 clinical trial simulations
Metric
Logistic regression approach
Simulation scenario
i
ii
iii
iv
True positive (%)
Typical
-
100
-
99.30
CoP-based
-
100
-
99.62
True negative (%)
Typical
90.62
-
91.12
-
CoP-based
90.66
-
94.08
-
Positive predictive value (%)
Typical
91.42
91.79
CoP-based
91.46
94.39
Negative predictive value (%)
Typical
100
99.24
CoP-based
100
99.60
Areas under the ROC curves (AUC) are > 99.6% for all simulation scenarios
The typical approach provided true positive rates, false positive rates, negative predictive values, and AUCs that appear to be very similar to those from the CoP-based approach. When the true PoD curve was a Hill function, inclusion of the CoP predictor in the logistic regression increased positive predictive value of covariate effect detection by 2.6% from 91.8% to 94.4% (5000 simulated trials with 15,000 subjects, ~ 200 disease cases, Table 2), an effect size with potentially substantial impact (cf. Sect. "Discussion").
The CoP-based logistic regression with linear term for log-titer showed (Table 2) good performance of covariate effects assessment based on the p-value, even when the true relationship was a Hill function (i.e., when log-odds is a non-linear function of log-titer). Thus, for the scenarios investigated here, the model selection step (to determine the best fitting structural model, as done in Sect. "Accuracy and precision of VE estimation") is not necessary for detection of covariate effects using the criterion of statistical significance.
Accuracy and precision of VE estimation
VE and its CI were estimated using the selected best fitting model (Eq. 15 or 16 for the typical approach; Eqs. 17, 18, 19, or 20 for the CoP-based approach; Table 3) and compared to case-counting-based estimation of VE and its CI. The simulated trials for which the selected model does not match the simulated model are generally consistent between the CoP-based and the typical approach (Table 3) and can be understood in terms of data from those trials (Supplementary Material, Figure S1). Distributions of the VE point estimates for the 5000 simulation scenarios are summarized in Fig. 4.
Table 3
Results of model selection for vaccine efficacy estimation
i True PoD model: logistic function with linear term for log-titer, no age effect
ii True PoD model: logistic function with linear term for log-titer, age effect
Number of simulated trials, (% of simulated trials)
Number of simulated trials, (% of simulated trials)
Model without interaction
Model with interaction
Model without interaction
Model with interaction
Typical
4197 (84%)
803 (16%)
Typical
28 (1%)
4972 (99%)
Model without interaction
Model with interaction
Model without interaction
Model with interaction
CoP-based
Linear
3476 (70%)
634 (13%)
CoP-based
Linear
1 (< 1%)
4140 (83%)
Quadratic
705 (14%)
185 (4%)
Quadratic
1 (< 1%)
858 (17%)
iii True PoD model: Hill function, no age effect
iv True PoD model: Hill function, age effect
Number of simulated trials, (% of simulated trials)
Number of simulated trials, (% of simulated trials)
Model without interaction
Model with interaction
Model without interaction
Model with interaction
Typical
4181 (84%)
819 (16%)
Typical
216 (4%)
4784 (96%)
Model without interaction
Model with interaction
Model without interaction
Model with interaction
CoP-based
Linear
0 (0%)
0 (0%)
CoP-based
Linear
0 (0%)
0 (0%)
Quadratic
4198 (84%)
802 (16%)
Quadratic
18 (< 1%)
4982 (> 99%)
Counts of simulated trials when the respective models were selected as best fitting (the option closest to the simulated “truth” is in bold) and to be used for vaccine efficacy estimation. Results are listed for all simulation scenarios. The most frequently selected model (bold) is generally consistent with the true model. The percentage of times each of the methods find a trend of interaction or lack thereof (age effect or not, resp.) is consistent between the two approaches
×
In the CoP-based approach, the logistic PoD curve fit (based on the selected model), combined with immunogenicity data, produced accurate point estimates of VE (Fig. 4) and well-calibrated CI (details in Supplementary Material, Tables S2-S4) for all four scenarios.
In both age-defined subgroups, the VE estimation by the CoP-based approach was more precise when compared to VE estimated by the typical approach (which was more precise than case-counting in the absence of age effect, i.e., scenarios i and iii; Fig. 4 and additional details in Supplementary Material, Figure S2). The VE estimate by the CoP-based method was closer to the (simulated) “truth” than that of the typical approach for most simulated trials for these scenarios (Fig. 4). Further, Fig. 5 shows that over 90% of the time the CoP-based CI on that estimate was narrower than that obtained by the typical approach.
×
Implications of skipping the model selection step for accuracy and precision of VE estimation are discussed in Supplementary Material (Figures S3–S6). Even if model selection is not performed (e.g., due to computation time), accurate CoP-based estimates (one for each scenario i-iv) of VE are obtained (i.e., using only the Eq. 20 model to fit the simulated trials, Supplementary Material, Figure S6). However, precision of such estimation can be lower than that obtained using models resulting from the model selection step. If the logistic regression uses an incorrect representation of covariate effects (and, in the case of the CoP-based approach, of the structural model), then the result can be biased; if the model is not misspecified then VE predicted by typical logistic regression can be nearly identical to that obtained from case-counting.
Example analysis of a single hypothetical vaccine clinical trial dataset
To illustrate the proposed data analysis, all the methods used in the simulation study were applied to a single simulated dataset of a vaccine clinical trial (randomly selected from 5000 simulated datasets of scenario iv), which used a Hill function as the underlying true PoD curve, and the age group effect on the PoD curve leading to true VE of 50% in older and 80% in younger subjects. The simulation was of 15,000 subjects: 10,000 vaccinated, 5000 control; 3760 older, 11,240 younger; 191 diseased, 14,809 non-diseased. As shown in Fig. 6, respective immunogenicity distributions for the older and younger subjects were very similar, as there was no age effect on the simulated immunogenicity in the model used for simulation.
×
Table 4 shows results of age group effect assessment based on the statistical criteria. Age group was a significant predictor in all six fitted models. Table 5 reports VE estimated using the selected model for typical logistic regression (Eq. 16), and that for CoP-based logistic regression (Eq. 20), enabling assessment of clinical significance of the age group effect. CoP-based VEs were closer to true VEs and had narrower CIs than those obtained by the typical logistic regression or by case-counting. PoD curve estimates (Eq. 20) and immunogenicity data used for CoP-based VE estimation in older and younger subgroups are visualized in Fig. 6.
Table 4
Both logistic regression approaches (CoP-based and typical) show significant effect of age group
Eqs. 16 and 20 were appropriately selected for vaccine efficacy estimation.
Age group effect assessment using statistical significance: All models with linear log-titer (Eqs. 15, 16, 17, 18) show significant effect of age group (p-values of the age group coefficient and/or of the interaction term between linear log-titer and age group are less than \(0.05\)).
Model selection (from the six considered) for VE estimation: The model with interaction was selected for the typical approach (Eq. 16; AIC = 1953.8) and the model with quadratic term for log-titer and with interaction was selected for the CoP-based approach (Eq. 20; AIC = 1878.8).
Table 5
CoP-based logistic regression is more precise than case-counting and typical logistic regression in estimating VE
Estimates of vaccine efficacy in older and younger subjects (with 2000 bootstrap samples used for CI estimation). True VE is 80% in younger and 50% in older subjects. The immunogenicity-based estimate allows (in general, and in this example) the more appropriate assessment of the impact of age group on vaccine efficacy.
Even though both methods correctly identify age group as a significant factor affecting VE, the CoP-based approach correctly indicates that the vaccine is efficacious in the older group, whereas the typical approach incorrectly suggests potentially negative vaccine efficacy (Tables 4 and 5). Thus, use of the typical approach could result in health authority hesitancy to license or recommend the use of this vaccine in older subjects or in a requirement for additional clinical evidence (e.g., results of an additional phase 3 study).
In contrast, if the CoP-based method is applied, a significant vaccine-induced protection in older subjects, 60% (95% CI, 44% to 70%), is estimated (appropriately).
Discussion
Logistic regression can be reliably used to detect the effect of a binary covariate on VE. In the simulated trials of 15,000 subjects with a Hill curve as the true PoD (~ 200 disease cases), inclusion of the CoP predictor in the logistic regression increased positive predictive value of covariate effect detection by 2.6%, compared to the typical approach (PPV, i.e., probability that the true model involves the covariate when our test indicates the covariate should be included: 94.4% versus 91.8%). Thus, with a Hill curve as the true PoD, the typical approach was 46% more likely (than CoP-based) to falsely indicate that there is a difference in subgroups (100% minus PPV, i.e., probability of the positive test to be a false positive is 8.2% for the typical approach versus 5.6% for the CoP-based method). In other words, when the test for a covariate effect (based on statistical significance) was positive, it was more likely to be correct when using the CoP-based approach. The difference in this performance was still present (although smaller) when the true PoD curve was logistic (this was tested using a linear term for log-titer).
And, even when the typical approach detects the presence of a covariate effect, the ability of the CoP-based approach to reduce the width of VE confidence interval and to detect clinically significant effects represents a strong advantage in understanding the degree to which baseline covariates impact VE. Simulation results show that using (case-counting or) the typical logistic regression approach in the presence of a covariate effect could result in VE being underestimated enough to stop development of an efficacious vaccine. Simulations also showed that vaccine efficacy in covariate-defined subgroups was estimated accurately and more precisely if CoP data were used in the logistic regression. Phase 3 studies to evaluate vaccine efficacy are typically powered for overall case-count VE as a primary endpoint, and our analysis shows that the resulting confidence intervals of case-count VE in covariate-defined subgroups are wider than those from CoP-based methods, potentially too wide to demonstrate that VE in subgroups is significantly different from zero. Even when the covariate effect is detected, the wider CIs can result in standard methods failing to identify clinically (and even statistically) significant VE differences between subgroups (e.g., due to overlapping CIs in subgroups of interest). The loss in information (leading to the loss in precision) of the standard methods compared to the CoP-based methods is due to not incorporating the biomarker data (when a predictive biomarker exists): even use of dichotomized ("absolute CoP”) information would be likely to improve predictions (i.e., it is not just the use of a continuous versus binary predictor).
The work presented assumes immunogenicity is measured in all trial participants. In the frequent case that only a subset of non-diseased subjects is assayed for immunogenicity values, weighted logistic regression [19, 20] accounting for the case-cohort design can be used. Further research should examine how the case-cohort trial design interacts with approaches described here.
The proposed method further assumes immunogenicity data are correlated with protection, meeting Prentice criteria [7], and fully mediating the vaccine effect. In case of lack of full mediation of vaccine-induced protection through the immune response biomarker, the CoP-based approach can still be applied, and vaccination status should be included in the CoP-based logistic model (vaccination status can be added as a predictor, in addition to immune biomarker and covariate data) to account for the residual effect of the vaccine. To the extent that the biomarker is sufficiently predictive in populations of interest (despite not being fully mediating), the conclusions of work presented here could be expected to hold even without adding vaccination status as an additional predictor. While it may be reasonable in some cases to expect many of the conclusions of the work presented here to hold even with that additional predictor, future work should evaluate the implications of this modification on accuracy and precision of VE estimation and on covariate effect assessment.
Conclusions
Inclusion of CoP data in logistic regression models provides a new method to identify baseline covariates affecting VE, offering a way to determine, sensitively and specifically, the impact of demographic, clinical, and other subject-specific characteristics on protective efficacy of a vaccine. This approach has potential to increase the precision of efficacy estimation, thus enabling increased precision and/or power in clinical trials, with concomitant enhancement of the decisions they inform.
Acknowledgements
The authors gratefully acknowledge the support, review, and input from Ferdous Gheyas, Larissa Wenning, Julie A. Stone, Beth-Ann G. Coller, and Alexander D. Becker.
Zdeněk Valenta: Dept. of Statistical Modelling, Institute of Computer Science of the Czech Academy of Sciences, Pod Vodárenskou věží 271/2 182 00 Prague 8, Czech Republic.
Jeffrey R. Sachs: Merck & Co., Inc., 126 East Lincoln Avenue, P.O. Box 2000, Rahway, NJ 07065, USA.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
J.D. is an employee of MSD Czech Republic s.r.o., Prague, Czech Republic, and J.R.S. is an employee of Merck Sharp & Dohme LLC, subsidiaries of Merck & Co., Inc., Rahway, NJ, USA. J.R.S. owns stock in Merck & Co., Inc., Rahway, NJ, USA. Z.V. has no conflict of interest to declare.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.