Assessing the magnitude of error using normal approximations
The central limit theorem guarantees that, for a sufficiently large sample size, the sample mean has a distribution which is arbitrarily close to normal (Gaussian). To evaluate the adequacy of the normal approximation under specific circumstances, in terms of cumulative distribution functions, we used a) the Berry-Esséen theorem and b) computation of the specific distributions. All computing was done using R, version 2.15 or higher.
Berry-Esséen theorem
Let
R1,R2,....,R
n
be independent and identically distributed (iid) zero-mean random variables with positive variance σ
2. Defining
\( {S}_n={\displaystyle {\sum}_{k=1}^n{R}_k/\sigma \sqrt{n}} \) as the standardised mean of the random variables,
F
n
(
y) as the cumulative distribution function (CDF) of
Sn, and Φ as the CDF of the standard normal distribution, the Berry-Esséen theorem [
20] states
$$ \left|{F}_n(y)-\Phi (y)\right|\le \frac{C\rho }{\sigma^3\sqrt{n}} $$
(1)
where
C is a distribution-independent positive constant, and
ρ < ∞ is the absolute third central moment, Ε(|
R − Ε(
R)|
3), which equals Ε(|
R|
3) thanks to the specification of zero mean. Values of
C have decreased markedly from Esséen’s original bound of 7.59 [
20] to 0.4690 obtained by Shevtsova in 2013 [
21]. For Poisson sums, including the Poisson itself, and the negative binomial as a mixture of Poissons, this can be replaced by 0.3051 [
22]. More precise values are also available for the special cases of the binomial distributions with parameter 0.5 [
23] or with denominator 1 [
24], although the latter is applicable only to sample sizes of at least 200.
The Berry-Esséen approach can be used even when direct calculation from the distribution is not feasible. The bound can be expressed in terms of the third non-absolute central moment and a finite sum (see Additional file
2). Such bounds are one way to assess the adequacy of the normal distribution assumptions implicit in common sample size methods. In the following section we describe a potentially more robust sample size approach.
Sample sizes from generalized linear model theory
Generalized linear models are for vectors of independent responses,
Y
i
(
i = 1
,…,N), arising from an exponential family distribution. Such distributions include the Poisson, binomial and gamma, as well as the negative binomial if its
k parameter is assumed fixed [
25,
26]. Covariates
x
ij
enter the model as linear combinations with unknown regression coefficients
β
j
and can be written as
$$ {\eta}_i={\displaystyle \sum_{j=1}^p}{\beta}_j{x}_{ij} $$
where ηi is related to μ
i
, the mean of Y
i
, via the link function g:η
i
= g(μ
i
).
The sample size for a hypothesis related to the mean of such a distribution can be calculated from the variance of its maximum likelihood estimate (MLE), on the scale of the link function. The covariance matrix of the parameter estimates for GLMs is approximately
$$ {\left({X}^TWX\right)}^{-1} $$
(2)
where
X is the design matrix and
W is the diagonal matrix of weights [
27]. We need to know how the sample size affects the variance of the parameter estimate. When comparing the means of two groups of size
N0 and
N1 (with
N0 +
N1 =
N),
X has two columns and
N rows. The first column, corresponding to the intercept, is all 1′s, and the second column is
N0 zeros and
N1 1′s.
W is defined by
$$ W=\frac{{\left(\frac{d\mu }{d\eta}\right)}^2}{V\left(\mu \right)} $$
(3)
where
V (
μ) is the variance function relating the mean and variance of
Y [
27]. The diagonal of
W is composed of
N0 copies of
w0 and
N1 copies of
w1, in an obvious notation. To compare the two means, we are interested in the second diagonal element of the 2 × 2 matrix given by equation (
2). Some basic matrix algebra shows that this element is (
N0w0)
−1 + (
N1w1)
−1.
For the sample size of this comparison, we apply principles outlined by Lachin [
1]. His notation uses subscripts 0 and 1 for the null and alternative hypotheses, which here we will change to
O and
A, using 0 and 1 instead to refer to the two groups being compared: 0 for reference or control, and 1 for intervention. We will also use
λ rather than
μ as a generic parameter, using the latter to denote the mean. We will also use a different subscript notation for standard normal deviates, so that
z
p
means the standard normal deviate for lower tail area
p. Our statistic (
X in Lachin’s notation) is the estimate of the difference in transformed means obtained by GLM. The transformation is typically log, or logit for binomial. The mean of this statistic is
λ
O
under the null hypothesis and
λ
A
under the alternative hypothesis, with the standard deviation being
Σ
O
and
Σ
A
. Lachin’s equation
1 then becomes
$$ \left|{\lambda}_A-{\lambda}_O\right|={z}_{1-\frac{\alpha }{2}}{\Sigma}_O-{z}_{1-\beta }{\Sigma}_A $$
(4)
Following Lachin again, we will denote the proportions in the groups by
Q0 =
N0/
N and
Q1 = =
N1/
N. Our approach is to apply a normal approximation on the scale of the link function. This is often the log, although, with the identity link, more familiar equations are obtained. We consider two approaches for estimating the variance under the null hypothesis. One is to use the reference value in both groups: following Zhu and Lakkis [
17], we call this method 1. Using the above matrix algebra,
Σ
Ο
equals
$$ \begin{array}{l}\sqrt{\frac{1}{Q_1N}\frac{V\left({\mu}_0\right)}{{\left(d\mu /d\eta \Big|{}_{\mu ={\mu}_0}\right)}^2}+\frac{1}{Q_0N}\frac{V\left({\mu}_0\right)}{{\left(d\mu /d\eta \Big|{}_{\mu ={\mu}_0}\right)}^2}}\\ {}=\sqrt{\frac{1}{N}\frac{V\left({\mu}_0\right)}{{\left(d\mu /d\eta \Big|{}_{\mu ={\mu}_0}\right)}^2}\left(\frac{1}{Q_1}+\frac{1}{Q_0}\right)}\end{array} $$
and
Σ
Α
equals
$$ \sqrt{\frac{1}{Q_1N}\frac{V\left({\mu}_1\right)}{{\left(d\mu /d\eta \Big|{}_{\mu ={\mu}_1}\right)}^2}+\frac{1}{Q_0N}\frac{V\left({\mu}_0\right)}{{\left(d\mu /d\eta \Big|{}_{\mu ={\mu}_0}\right)}^2}} $$
Hence, for method 1, we obtain
$$ \sqrt{N}=\frac{Z_{1-\frac{\alpha }{2}}\sqrt{\left(\frac{1}{Q_1}+\frac{1}{Q_0}\right)\frac{V\left({\mu}_0\right)}{{\left(d\mu /d\eta \Big|{}_{\mu ={\mu}_0}\right)}^2}}+{Z}_{1-\beta}\sqrt{\frac{1}{Q_1}\frac{V\left({\mu}_1\right)}{{\left(d\mu /d\eta \Big|{}_{\mu ={\mu}_1}\right)}^2}+\frac{1}{Q_0}\frac{V\left({\mu}_0\right)}{{\left(d\mu /d\eta \Big|{}_{\mu ={\mu}_0}\right)}^2}}}{g\left({\mu}_0\right)-g\left({\mu}_1\right)} $$
(5)
Zhu and Lakkis [
17] find that the test characteristics are generally better if, instead,
μ1 is used for the intervention arm under the null hypothesis (‘method 2’), so
Σ
Ο
equal
Σ
Α
, and
$$ \sqrt{N}=\frac{\left({Z}_{1-\frac{\alpha }{2}}+{Z}_{1-\beta}\right)\sqrt{\frac{1}{Q_1}\frac{V\left({\mu}_1\right)}{{\left(d\mu /d\eta \Big|{}_{\mu ={\mu}_1}\right)}^2}+\frac{1}{Q_0}\frac{V\left({\mu}_0\right)}{{\left(d\mu /d\eta \Big|{}_{\mu ={\mu}_0}\right)}^2}}}{g\left({\mu}_0\right)-g\left({\mu}_1\right)} $$
(6)
Equations (
5) and (
6) are general, with special distributional cases being easily determined. We will use equation (
6) except when referring to previous work based on method 1.
Negative binomial distribution
The negative binomial distribution is a generalization of the Poisson for count data, with an additional parameter (
k) which can describe over-dispersion [
28]. Small
k implies a large variance and as
k → ∞ the distribution tends to Poisson. We derive results first for the negative binomial distribution, then for the Poisson as a limiting case. Let
Y be a random variable which follows the negative binomial distribution with population mean
μ and dispersion parameter
k, with the variance function being V(
μ) =
μ + (
μ2/
k) and density as shown in Additional file
3. Analysis by GLM usually employs a natural logarithm link function [
25] for which
dμ/dη = μ. Substituting into equation (
5) gives
$$ \sqrt{N}=\frac{Z_{1-\frac{\alpha }{2}}\sqrt{\left(\frac{1}{\mu_0}+\frac{1}{k_0}\right)\left(\frac{1}{Q_1}+\frac{1}{Q_0}\right)}+{Z}_{1-\beta}\sqrt{\frac{1}{Q_1}\left(\frac{1}{\mu_1}+\frac{1}{k_1}\right)+\frac{1}{Q_0}\left(\frac{1}{\mu_0}+\frac{1}{k_0}\right)}}{ \log \left({\mu}_0\right)- \log \left({\mu}_1\right)} $$
(7)
For the special case of equal sample sizes and (
Q0 =
Q1 = 0.5) and
k parameters (
k0 =
k1) this reduces to the equation by Brooker et al. [
29]. Using equation (
6) instead gives:
$$ \sqrt{N}=\frac{\left({Z}_{1-\frac{\alpha }{2}}+{Z}_{1-\beta}\right)\sqrt{\frac{1}{Q_1}\left(\frac{1}{\mu_1}+\frac{1}{k_1}\right)+\frac{1}{Q_0}\left(\frac{1}{\mu_0}+\frac{1}{k_0}\right)}}{ \log \left({\mu}_0\right)- \log \left({\mu}_1\right)} $$
(8)
A normal approximation can be obtained by applying equation (
6) on the identity scale, with variances equal to
\( {\mu}_i+{\mu}_i^2/{k}_i\left(i=0,1\right) \):
$$ \sqrt{N}=\frac{\left({Z}_{1-\frac{\alpha }{2}}+{Z}_{1-\beta}\right)\sqrt{\frac{1}{Q_1}\left({\mu}_1+\frac{\mu_1^2}{k_1}\right)+\frac{1}{Q_0}\left({\mu}_0+\frac{\mu_0^2}{k_0}\right)}}{\mu_0-{\mu}_1} $$
(9)
We used simulation to estimate the actual power sample sizes obtained from equations (
8) and (
9), by generating repeated datasets of the calculated sizes and analysing them by GLM and Wald tests. We also used likelihood ratio tests, with similar results, unless where commented. For this we used the rnegbin and glm.nb function of the MASS package in R.
Poisson distribution
Let
Y be a random variable denoting the number of events per unit time (for example, per study duration) then
Y follows the Poisson distribution with mean
μ. By letting
k tend to infinity in equation (
8), or, equivalently, from equation (
6) with log link and
V(
μ) =
μ, we obtain:
$$ \sqrt{N}=\frac{\left({Z}_{1-\frac{\alpha }{2}}+{Z}_{1-\beta}\right)\sqrt{\frac{1}{Q_1}\left(\frac{1}{\mu_1}+\frac{1}{k_1}\right)+\frac{1}{Q_0}\left(\frac{1}{\mu_0}+\frac{1}{k_0}\right)}}{ \log \left({\mu}_0\right)- \log \left({\mu}_1\right)} $$
(10)
This is compared by simulation, for the case
Q0 =
Q1 = 0.5 (equal size arms), with the following normal approximation, on the scale of the identity link, obtained from equation (
9) by again letting
k tend to infinity:
$$ \sqrt{N}=\frac{\left({Z}_{1-\frac{\alpha }{2}}+{Z}_{1-\beta}\right)\sqrt{2\left({\mu}_1+{\mu}_0\right)}}{\mu_0-{\mu}_1} $$
(11)
This is also used, for example, by Kirkwood & Sterne [
7], except that here we include a factor of 2 inside the square root to obtain the total study size.
Binomial distribution
Let
Y be a binomial random variable denoting the number of successes in
d independent Bernoulli events, each with probability
μ. The most common situation is to have
d = 1, with each unit (person) having a response of 1 or 0 (e.g. positive or negative). An assumption of
d = 1 may explain why the literature does not always show
d in the variance function: we follow Fox [
30] in using
V(
μ) =
μ(1-
μ)/
d. For the canonical logit link,
dμ/
dη =
μ(1-
μ), so, from equation (
6), we obtain
$$ \sqrt{N}=\frac{\left({Z}_{1-\frac{\alpha }{2}}+{Z}_{1-\beta}\right)\sqrt{\frac{1}{Q_1}\frac{1}{\mu_1\left(1-{\mu}_1\right)}+\frac{1}{Q_0}\frac{1}{\mu_0\left(1-{\mu}_0\right)}}}{\sqrt{d}\left(\mathrm{logit}\left({\mu}_0\right)-\mathrm{logit}\left({\mu}_1\right)\right)} $$
(12)
On the scale of difference in proportions (identity link), the corresponding equation is:
$$ \sqrt{N}=\frac{\left({Z}_{1-\frac{\alpha }{2}}+{Z}_{1-\beta}\right)\sqrt{\mu_1\left(1-{\mu}_1\right)\frac{1}{Q_1}+{\mu}_0\left(1-{\mu}_0\right)\frac{1}{Q_0}}}{\sqrt{d}\left({\mu}_0-{\mu}_1\right)} $$
(13)
This differs from the Lachin’s equation (
12), and that of Kirwood and Sterne, both of which have
Z
α
multiplied by a function of
\( \overline{\pi}\left(1-\overline{\pi}\right) \), where
\( \overline{\pi} \) is an average of the
μ0 and
μ1. Some outcomes, in particular the occurrence of a given condition, could be quantified either as a Poisson rate (events per unit time, with rate
μ) or as a binomial proportion (fraction of people experiencing the condition in a given period
T). These options can be linked mathematically, with the latter probability equalling 1-
e-μT. This relation can, in turn, be used to compare the power or sample size for quantifying a given scenario as either a rate or proportion. In this case the rate is the more powerful option [
31]. This is to be expected, since the proportion loses information by considering all those with one or more events as a single category.