Methods
Generalized linear models (GLM) originate from a significant extension of traditional linear regression models [
14]. They consist of a random component that specifies the conditional distribution of the response variable (
Y) from an exponential family given the values of the explanatory variables
X1,X2,···,
Xk, a linear predictor (or systematic) component that is a linear function of the predictors,
ƞ=
β0+
β1X1+
β2X2+···+
β
k
X
k
, where
β=(
β0,
β1,...,
β
k
)
T
is the vector of the parameters
, and a smooth invertible link function that transforms the expectation of the response variable,
μ ≡
E(Y), to the linear predictors:
g(
μ)=
ƞ=β0+
β1X
1+
β2X
2+···
β
k
X
k
. For example, the most common link for binary outcomes is the logit (i.e. log (
μ/(1-
μ))) in a logistic model, the log (
μ) in a Poisson model, or a log-binomial model.
In the descriptions below, Y
i
and \( {x}_i^T=\left(1,{x}_{i1},{x}_{i2},\dots, {x}_{ik}\right) \) denote the binary outcome and the row vector comprised of k predictors for the ith individual (i = 1,2,…n), respectively. The observations from the n individuals are independent.
Log-binomial regression
In the GLM framework, the conditional distribution of
Y
i
given the predictor variables is binomial, with the mean response related to the predictors by the link function log (
μ
i
). In log-binomial regression,
μ
i
is often denoted as
p
i
, because
E(Y
i
) is a probability with a value between zero and one. Although there are other methods to obtain efficient estimators, the maximum likelihood approach is used to generate asymptotically efficient estimators (maximum likelihood estimates (MLE)) in log-binomial regressions [
5,
8].
The MLE of log-binomial models are derived from an iteratively reweighted least squares (IRLS) approach [
15]. In a log-binomial regression,
\( \log \left({P}_i\left(\beta \right)\right)={x}_i^T\beta \) where
p
i
(
β)=Pr(
y
i
=1|
x
i
),0 ≤
p
i
≤ 1, and
\( {x}_i^X\beta <0 \) (constrained). The log-likelihood is given by
$$ \mathrm{\ell}\left(\beta \right)=\sum \limits_{i=1}^n{y}_i\log \left({p}_i\left(\beta \right)\right)+\sum \limits_{i=1}^n\left(1-{y}_i\right)\log \left(1-{p}_i\left(\beta \right)\right)\cdot $$
(1)
It can be proven that the MLE for
β can be found by the following iteration (Additional file
4)
$$ {\beta}^{\left(t+1\right)}={\left({X}^{\hbox{'}} WX\right)}^{-1}\left({X}^{\hbox{'}} Wz\right) $$
(2)
$$ \mathrm{where}\kern0.28em z=X{\beta}^{(t)}+\frac{\left(Y-P\left({\beta}^{(t)}\right)\right)}{P\left({\beta}^{(t)}\right)},X=\left({x}_{i,j}\right)\in {R}^{n,k},\frac{Y-P\left(\beta \right)}{P\left(\beta \right)}={\left(\frac{y_1-{p}_1\left(\beta \right)}{p_1\left(\beta \right)},\frac{y_2-{p}_2\left(\beta \right)}{p_2\left(\beta \right)},\dots, \frac{y_n-{p}_n\left(\beta \right)}{p_n\left(\beta \right)}\right)}^T,\mathrm{and}\kern0.34em \mathrm{weight}\kern0.28em \mathrm{W}= Diag\left(\frac{p_i\left(\beta \right)}{1-{p}_i\left(\beta \right)}\right),\mathrm{i}=1,2,\dots, \mathrm{n};\mathrm{j}=1,2,\dots, \mathrm{k}. $$
The iteration process continues until β stabilizes. The weights W used in the iterative process contain p
i
(β) in the numerator and 1 – p
i
(β) in the denominator, where p
i
(β) = exp (\( {x}_i^T\beta \)) with a range from 0 to 1. When p
i
(β) is a very small number, the weight approximates p
i
(β). When p
i
(β) approaches one, the weight approaches infinity. This suggested that the IRLS approach is highly influenced by observations that have large p
i
(β). Moreover, the impact is also influenced by the average p
i
(β), or the average weight (lower average p
i
(β) is associated with lower average weight). For illustration, we constructed two hypothetical samples each with five observations having the following probabilities: sample 1 = {0.1, 0.3, 0.4, 0.5, 0.95} and sample 2 = {0.02, 0.03, 0.08, 0.15, 0.95}. The corresponding weights for the two samples were {0.25, 0.43, 0.67, 1, 19} and {0.02, 0.03, 0.09, 0.18, 19}, respectively. In sample 2, the observation with weight 19 will impact the point estimate more, compared the observation in sample 1 with the same weight.
Robust Poisson regression
In robust Poisson regression, a quasi-likelihood (QL) model can be applied to fit the data with a binary outcome [
14‐
18]. Quasi-likelihood was first introduced by Wedderburn (1974) as a function that has properties analog to those of log-likelihood functions [
18]. Similar to ML method, maximum QL method can be used to estimate the QL estimates. In a maximum QL model, only the relationship between the mean and the variance (i.e. the variance is a function of the mean) needs to be specified instead of the underlying distribution of the data [
15‐
19]. It can be shown that when
Y
i
comes from the exponential family, the quasi-score function is identical to the score function associated with the maximum likelihood of the GLM.
When the Poisson distribution is chosen, the quasi-score function can be simplified to
\( {S}_j\left(\beta \right)=\frac{1}{\phi}\sum \limits_{i=1}^n\left({y}_i-{\mu}_i\right){x}_{ij} \), resulting in quasi-score estimating equations
\( {S}_j\left(\beta \right)=\sum \limits_{i=1}^n\left({y}_i-{\mu}_i\right){x}_{ij}=0 \), which is the same as the estimating equations of the Poisson regression models. In the two equations above,
ϕ is the dispersion parameter, and j = 1,2,…k. The final estimate from the quasi-scoring procedure satisfies the condition
\( {S}_j\left(\widehat{\beta}\right)=0 \) and
\( \widehat{\beta} \) is a consistent and asymptotically unbiased estimate of
β [
20].
\( \widehat{\beta} \) does not depend on
ϕ.
The quasi-likelihood estimators are not maximally and asymptotically efficient [
14]. The robust Poisson regression model uses the classical sandwich estimator under the generalized estimation equation (GEE) framework to provide accurate standard errors for the elements [
19‐
21]. The variance-covariance matrix is
$$ {\left[\sum \limits_{i=1}^nE\left[{I}_i\left(\beta \right)\right]\right]}^{-1}{\left[\sum \limits_{i=1}^nE\left[\Big({S}_i\left(\beta \right){S}_i{\left(\beta \right)}^T\right]\right]}^{-1}{\left[\sum \limits_{i=1}^nE\left[{I}_i\left(\beta \right)\right]\right]}^{-1} $$
(3)
where
\( {I}_i\left(\beta \right)=-\frac{\partial {S}_i\left(\beta \right)}{\partial \beta } \) is the information matrix [
22]. A consistent estimate of the variance can be obtained by evaluating the variance-covariance matrix at
\( \widehat{\beta} \).
Implementation
Both regression models were implemented in SAS [
23] (SAS
Software Version 9.3 of the SAS System for Unix. Cary, NC. SAS Institute Inc. 2011). The SAS codes can be found in Additional file
5. For the log-binomial model, − 4 was set as the initial value of the intercept. For both models, the weighted least squares estimates (default) were used as initial values of parameters. The convergence criterion was 10
− 4 (default). A well-known issue of log-binomial models is failure to converge when the MLE is located on the boundary of the parameter space (i.e. the predicted probability of the outcome is equal to 1). To minimize the convergence issue, the COPY method was applied [
24,
25] in which the number of virtual copies was set to 10,000. To ensure a fair comparison between the log-binomial and robust Poisson models, the evaluation was conducted by only using the results based on exactly the same simulated data. If the COPY method did not converge for a dataset, the same dataset was then removed before the performance of the robust Poison models was evaluated. The exclusion of datasets was very rare in this study. Details on the number of excluded datasets can be found in the “Discussion” section.
For each simulated scenario, the simulation process was repeated 1000 times. In each of the 1000 simulated datasets, the log risk ratio was estimated from the log-binomial model and the robust Poisson model, respectively. For each scenario, the relative bias, standard error (SE), and mean square error (MSE) in log scale for all three measures were calculated by summarizing the results from the 1000 datasets for each regression model. Relative bias was defined as the average of the 1000 estimated RR in log scale minus the log of the true RR divided by the log of the true RR. For \( {\widehat{\theta}}_m \), the estimated log RR from the mth dataset using either the log-binomial model or the robust Poisson model, the relative bias was defined as \( \left(\frac{1}{1,000},\sum \limits_{m=1}^{1,000},\frac{{\hat{\theta}}_m-\log (trueRR)}{\log (trueRR)}\right)\times 100\% \). Standard error was defined as the empirical SE of the estimated risk ratio in log scale over all 1000 simulations. The MSE was calculated by taking the sum of the squared bias in log scale and the variances, in which the bias was specified as \( \frac{1}{1,000}\sum \limits_{m=1}^{1,000}{\hat{\theta}}_m-\log (trueRR) \).
Because both SE and MSE depended on the sample size, the process described above was repeated for sample of size 500 for all scenarios with RR = 3.
Simulated datasets
Let Y be a common binary outcome (Y = 1 for disease and Y = 0 for non-disease) and X be a binary exposure variable (X = 1 for exposure and X = 0 for non-exposure). First, uncorrelated random variables Z1 and Z2 following the Bernoulli (0.5) and the Uniform [0, 1] distributions, respectively, were generated for 1000 subjects. These distributions were chosen for their simplicity in the design. Then, the exposure variable X based on the subject-specific probability of exposure, defined by the equation logit (P (X = 1| Z1, Z2)) = − 1.0 + Z1 + Z2 with E(P(X = 1| Z1, Z2)) = 0.5, was created for each subject. All of the outcome variables defined below were conditional on the exposure status and the covariates. For exposed subjects, P (Y = 1| X = 1, Z1, Z2) = 3 × P (Y = 1| X = 0, Z1, Z2). The adjusted RR (i.e. P (Y = 1| X = 1, Z1, Z2)) / P (Y = 1| X = 0, Z1, Z2)) was fixed at 3.0, chosen to reflect the effect size commonly seen in real-world settings, and strong enough to yield observable differences in performance between the two regression models.
Scenarios to study the impact of truncation
The equations to generate
Y took four different forms (
Y
1
,
Y2,
Y3,
Y4) to enable the examination of the impact of truncation. Unlike
Y1, which always had a perfect linear association with its predictors (i.e. not truncated),
Y2,
Y3, and
Y4 were truncated such that the values of
P (
Y
k
= 1|
X = 0,
Z1,
Z2) depended on whether or not “
Z1 + (beta of Z
2) ×
Z2” reached a threshold (
k = 2, 3, and 4) (Table
1). For example, in Scenario I-2, the threshold was set at 0.15, requiring “
Z1 + 3 ×
Z2” to be greater than 0.15 to impact
P (
Y
2
= 1|
X = 0,
Z1,
Z2)). The defined threshold varied by scenario and was chosen such that the percentages of exposed subjects at the maximum
P (
Y = 1|
X,
Z1,
Z2)) could be controlled within the range of 1.4–5.8%. A truncation yielded a spike of observations at the maximum
P (
Y = 1|
X,
Z1,
Z2)) for both exposed and unexposed subjects. When other parameters were fixed, a spike goes higher with the increase of the threshold. This allowed us to study how the volume of large
P (
Y = 1|
X,
Z1,
Z2)) impacted model performance.
Table 1
Design of the simulation data
I-1 | 0.75 | 3 | log | 0 | log (P (Y1 = 1| X = 0, Z1, Z2)) = −1.38 – Z1–3 * Z2 | No |
I-2 | 0.75 | 3 | log | 1.4 | log (P (Y2 = 1| X = 0, Z1, Z2)) = − 1.23 – max (Z1 + 3 * Z2, 0.15) | Yes |
I-3 | 0.75 | 3 | log | 2.8 | log (P (Y3 = 1| X = 0, Z1, Z2)) = − 1.08 – max (Z1 + 3 * Z2, 0.30) | Yes |
I-4 | 0.75 | 3 | log | 5.8 | log (P (Y4 = 1| X = 0, Z1, Z2)) = − 0.78 – max (Z1 + 3 * Z2, 0.60) | Yes |
II-1 | 0.85 | 3 | log | 0 | log (P (Y1 = 1| X = 0, Z1, Z2)) = − 1.26 – Z1–3 * Z2 | No |
II-2 | 0.85 | 3 | log | 1.4 | log (P (Y2 = 1| X = 0, Z1, Z2)) = − 1.11 – max (Z1 + 3 * Z2, 0.15) | Yes |
II-3 | 0.85 | 3 | log | 2.8 | log (P (Y3 = 1| X = 0, Z1, Z2)) = − 0.96 – max (Z1 + 3 * Z2, 0.30) | Yes |
II-4 | 0.85 | 3 | log | 5.8 | log (P (Y4 = 1| X = 0, Z1, Z2)) = − 0.66 – max (Z1 + 3 * Z2, 0.60) | Yes |
III-1 | 0.95 | 3 | log | 0 | log (P (Y1 = 1| X = 0, Z1, Z2)) = − 1.15 – Z1–3 * Z2 | No |
III-2 | 0.95 | 3 | log | 1.4 | log (P (Y2 = 1| X = 0, Z1, Z2)) = − 1.00 – max (Z1 + 3 * Z2, 0.15) | Yes |
III-3 | 0.95 | 3 | log | 2.8 | log (P (Y3 = 1| X = 0, Z1, Z2)) = − 0.85 – max (Z1 + 3 * Z2, 0.30) | Yes |
III-4 | 0.95 | 3 | log | 5.8 | log (P (Y4 = 1| X = 0, Z1, Z2)) = − 0.55 – max (Z1 + 3 * Z2, 0.60) | Yes |
IV-1 | 0.95 | 2 | log | 0 | log (P (Y1 = 1| X = 0, Z1, Z2)) = − 1.15 – Z1–2 * Z2 | No |
IV-2 | 0.95 | 2 | log | 1.4 | log (P (Y2 = 1| X = 0, Z1, Z2)) = − 1.05 – max (Z1 + 2 * Z2, 0.10) | Yes |
IV-3 | 0.95 | 2 | log | 2.8 | log (P (Y3 = 1| X = 0, Z1, Z2)) = − 0.95 – max (Z1 + 2 * Z2, 0.20) | Yes |
IV-4 | 0.95 | 2 | log | 5.8 | log (P (Y4 = 1| X = 0, Z1, Z2)) = − 0.75 – max (Z1 + 2 * Z2, 0.40) | Yes |
V-1 | 0.95 | 4 | log | 0 | log (P (Y1 = 1| X = 0, Z1, Z2)) = − 1.15 – Z1–4 * Z2 | No |
V-2 | 0.95 | 4 | log | 1.4 | log (P (Y2 = 1| X = 0, Z1, Z2)) = − 0.95 – max (Z1 + 4 * Z2, 0.20) | Yes |
V-3 | 0.95 | 4 | log | 2.8 | log (P (Y3 = 1| X = 0, Z1, Z2)) = − 0.75 – max (Z1 + 4 * Z2, 0.40) | Yes |
V-4 | 0.95 | 4 | log | 5.8 | log (P (Y4 = 1| X = 0, Z1, Z2)) = − 0.35 – max (Z1 + 4 * Z2, 0.80) | Yes |
VI-1 | 0.95 | 3 | logit | 0 | logit (P (Y1 = 1| X = 0, Z1, Z2)) = − 0.76 – Z1–3 * Z2 | Yes |
VI-2 | 0.95 | 3 | logit | 1.4 | logit (P (Y2 = 1| X = 0, Z1, Z2)) = − 0.61 – max (Z1 + 3 * Z2, 0.15) | Yes |
VI-3 | 0.95 | 3 | logit | 2.8 | logit (P (Y3 = 1| X = 0, Z1, Z2)) = − 0.46 – max (Z1 + 3 * Z2, 0.30) | Yes |
VI-4 | 0.95 | 3 | logit | 5.8 | logit (P (Y4 = 1| X = 0, Z1, Z2)) = − 0.16 – max (Z1 + 3 * Z2, 0.60) | Yes |
VII-1 | 0.95 | 3 | probit | 0 | probit (P (Y1 = 1| X = 0, Z1, Z2)) = − 0.48 – Z1–3 * Z2 | Yes |
VII-2 | 0.95 | 3 | probit | 1.4 | probit (P (Y2 = 1| X = 0, Z1, Z2)) = − 0.33 – max (Z1 + 3 * Z2, 0.15) | Yes |
VII-3 | 0.95 | 3 | probit | 2.8 | probit (P (Y3 = 1| X = 0, Z1, Z2)) = − 0.18 – max (Z1 + 3 * Z2, 0.30) | Yes |
VII-4 | 0.95 | 3 | probit | 5.8 | probit (P (Y4 = 1| X = 0, Z1, Z2)) = − 0.12 – max (Z1 + 3 * Z2, 0.60) | Yes |
Scenarios to study the impact of maximum P(Y = 1)
First, scenarios I-1, II-1, and III-1 were created using the log link function. The maximum values of
P (
Y
1
= 1|
X = 1,
Z1,
Z2)) were set to 0.75, 0.85 and 0.95, respectively, to study the impact of maximum
P (
Y
1
= 1|
X = 1,
Z1,
Z2)) (Table
1). The selected thresholds were set at 0.15, 0.30, and 0.60 for
Y2,
Y3, and
Y4, respectively. The percentages of exposed subjects who were at the maximum values set above (0.75, 0.85, and 0.95) were 1.4, 2.8, and 5.8%. These values were derived by simulation and represented the various levels of alteration of the linear predictors.
The intercepts were manually calculated to satisfy
P (
Y
k
= 1 |
X = 0,
Z1,
Z2) ≤ 0.75/3, 0.85/3 and 0.95/3, respectively. For example, for the equation log (
P (
Y1 = 1|
X = 0,
Z1,
Z2)) = α –
Z1–3 *
Z2 in scenario I-1, α = log(0.75/3) = − 1.38 since the logarithm is an increasing function and hence the maximum of
P (
Y1 = 1|
X = 0,
Z1,
Z2) is achieved when
Z1 = 0 and
Z2 = 0. For the same reason, for the equation log (
P (
Y2 = 1|
X = 0,
Z1,
Z2)) = α –
Z1 – max (
Z1 + 3 *
Z2, 0.15) in scenario I-2, α = log(0.75/3) + 0.15 = − 1.23. The 12 scenarios designed to study the impact of large
P(
Y = 1) were listed in the first section of Table
1 (in the first 12 rows).
Scenarios to study the impact of coefficient of Z2
To study the impact of the entire distribution of
P(Y = 1) when large
P(Y = 1) existed, eight more scenarios were produced, with the beta coefficient of
Z2 being 2 and 4 (shown in the middle section of Table
1), to join the four scenarios for which the beta coefficient of
Z2 was set to 3 (i.e. III-1, III-2, III-3, and III-4). The distribution of
P(Y = 1) was shifted towards zero as the beta coefficient of
Z2 increased. Thus, these scenarios allowed us to study the impact of the outcome distribution, or the average
P(Y = 1).
The intercepts and the thresholds were generated using the same approach as described in the previous section. Because Z2 follows the uniform distribution, the thresholds increase proportionally with the beta coefficients. For example, the threshold to make 1.4% of exposed subjects reached the maximum P (Y
2
= 1| X = 1, Z1, Z2) = 0.1 when the beta of Z2 was 2 and increased to 0.2 when the beta of Z2 was 4.
Scenarios to study the impact of misspecified link functions
The link function was altered from log to logit and probit in scenarios VI and VII to assess the model performance when the link functions were misspecified; refer to the last section of Table
1. For scenarios VI-2, VI-3, VI-4, VII-2, VII-3, and VII-4, not only were the link functions misspecified, but also the responses depended on covariates with truncated probabilities.
Scenarios with a weaker association between exposure and outcome (RR = 2)
To understand the impact of misspecified link functions and truncation when RR is different from 3.0, we also generated scenarios with parameters identical to those in III-1, III-2, III-3, III-4, VII-1, VII-2, VII-3 and VII-4, except that this time RR = 2.0 instead of 3.0.
Discussion
In this study, the statistical performances of the two most popular model-based approaches used to estimate RR for common binary outcomes were examined when the link function was misspecified or when the probability of the response variables was truncated at the right tail. Our findings suggest that point estimates from log-binomial models were biased when the link function was misspecified or when the probability distribution of the response variable was truncated for even a small proportion of observations. For log-binomial models, the percentage of truncated observations was positively associated with the presence of the bias. The bias was more significant if these observations came from a population in which the response rate was lower, given the other parameters being examined were fixed.
For MLE based methods, misspecification can cause inconsistent estimators of parameters [
20]. Lumley et al. (2006) pointed out that compared to robust Poisson and other non-MLE based models, log-binomial models (MLE based) have very large weights when p (referred by authors as μ) is large ([
26] Fig.
1). The same authors also pointed out that for log-binomial models, “a single point with μ close to 1 can have arbitrarily large influence despite having bounded covariate values”. Our observation was consistent with that of Lumley et al. We demonstrated that when the percentage of observations with large P increased, the magnitude of bias also increased. This may explain the less optimal performance of the model applied to data generated using a probit link compared to that of log or logit link when other parameters were fixed (Panel C of Fig.
2). It is well known that log-binomial models may fail to convergence or generate incorrect estimates when the covariate values in the data are not bounded by 1 [
3,
5,
8]. However, we believe that this is a different issue than what we have focused on, which has been the impact of large Ps. In Scenario VII-1, truncation was not applied and none of the observations had predicted probabilities > 1, yet the point estimate was still biased.
On the other hand, the point estimates from the robust Poisson models were nearly unbiased in all the scenarios examined, including when they were applied to the data that were generated using a probit link, which yielded quite different probability distributions compared to those from a log link, and/or when the distribution of 5.8% of the exposed subjects were altered. In Chen et al. [
27], both the MLE generated by log-binomial models and the quasi-likelihood estimators produced by robust Poisson models deteriorated when outliers were introduced [
27]. However, in the current study, the biases in point estimates based on robust Poisson models were negligible, even when both link functions and predictors were incorrectly specified. This interesting contrast can be explained by a major difference in the design of the two studies. In the previous study [
27], the association between the exposure and the outcome was weakened when the “outliers” were introduced, and thus the negative biases were observed for the robust Poisson models. Nevertheless, in the current study, the true
RR was maintained at 3.0, (or 2.0 for some scenarios), even when the link function was misspecified and/or when the probabilities were truncated. Our simulations demonstrated that for robust Poisson regression, the misspecification of the link function did not hinder its ability to find the true
RR. This is likely due to the fact that the quasi-likelihood method enables regression coefficient estimation without fully specifying the distribution of the observed data. We examined exposure-outcome associations with RR 3.0 and 2.0. The magnitude of the observed bias in our simulation results did not change much when the association was reduced from 3.0 to 2.0; however, it is conceivable that the bias could be reduced in scenarios when the association is smaller than 2.0.
Model misspecification does not always yield differences in point estimates between the two models. In fact, in a previous examination (Additional file
3), we found when an important explanatory variable was omitted, a higher order term of non-linear explanatory variable was ignored, or an interaction term was overlooked, the two models produced comparable results regardless of the outcome rate, risk ratio or the strength of association between the exposure and the confounder or between the outcome and the confounder. Only in the scenario where an interaction term was ignored did the models yield large biases. This highlights the relative importance of observations with large weights, since in the previous examination, the number of observations with large probabilities of having the response was small.
Although we did not evaluate data based on other link functions that are also suitable for modeling binary outcomes (e.g. complementary log-log or log-log), it is expected that the results would have similar patterns. A truncated distribution appears in many real-life datasets where the collection of data is limited to those that are above or under a threshold. For example, a typical scale used in clinics or hospitals can measure height up to 200 cm and weight up to 250 kg. Subjects exceeding these values would be truncated to these limits. In the simulated datasets, the distributions of approximately 1.4, 2.8, and 5.8% of the exposed subjects were truncated in that they no longer followed the distribution specified by the link function through a combination of linear predictors. The truncation rates (1.4, 2.8, and 5.8%) for the exposed subjects were plausible values that can be related to real-life applications.
In contrast to Chen et al. [
27], in which no differences were found at the second decimal point when the data were not contaminated with outliers, we found small differences in the variances at the second decimal point between the log-binomial and robust Poisson models under some of the scenarios for both samples (
n = 1000 and
n = 500) when the models were correctly specified. This finding is consistent with that of Petersen and Deddens [
11], which was based on a sample with 100 observations and a single independent variable with a uniform distribution.
Kauermann and Carroll [
28] showed that variances of sandwich estimators were generally less efficient than variance estimates derived from parametric models. This weakness impacts the coverage probability, the probability that a confidence interval covers the true
RR, and thus the ability to reject a null when the alternative is true. Hence, log-binomial models are preferred over the robust Poisson models when the log-binomial models are correctly specified.
The COPY method was reported to have convergence issue when there are continuous covariates in the model [
11]. However, convergence was barely an issue in this study as it converged completely (i.e. 1000 out of 1000 simulations) in 23 out of 28 scenarios when the sample size was 1000, and 21 out of 28 scenarios when the sample size was 500. In the 12 scenarios (five for sample size 1000 and seven for sample size 500) for which the COPY method did not completely converge in all 1000 simulations, there was only one out of 1000 simulations that failed to converge for each scenario. The number of virtual copies used in the study, 10,000, was reported to be accurate to three decimal places [
25].
Misspecification tests were developed [
29,
30] and proven to be able to maintain reasonable size across various settings in simulation when they were applied to logistic and beta-binomial regression models [
30]. However, the power to detect the types of alternatives commonly observed in practice (e.g. alternative link functions) was low [
30]. Blizzard and Hosmer [
10] assessed model-fit of log-binomial models by applying the Hosmer-Lemeshow test (a commonly used goodness-of-fit test for logistic regression models), the Pearson chi-square test, and the unweighted sum of squares test, finding that all three tests exhibited acceptable Type I errors yet low-to-moderate power. Due to the lack of powerful diagnostic tools to detect any forms of model misspecification, the robust Poisson model may be considered a good choice because of its ability to produce unbiased risk ratios. Efforts to establish efficient and robust parameter estimators are ongoing. A recent publication summarized issues with the current approaches within the GLM family to estimate relative risks and risk differences, and provided a possible alternative to estimate relative risks and risk differences using a non-GLM approach [
31]. The authors proposed to model relative risks as functions of baseline covariates. Validation of this approach is needed to determine its applicability to studies such as those presented here.