Models for estimating the heterogeneity of effect in the cross-sectional SW-CRT
We consider the marginal model via the generalized estimating equation (GEE) approach for the cross-sectional SW-CRT design, where we focus on the interaction effect between the intervention and a binary individual covariate. Let
\({Y}_{ijk}\) be a binary outcome of the
\(k\) th individual (
\(k\in \left\{1,...,{m}_{i}\right\}\)) in the
\(i\) th cluster (
\(i\in \left\{1,...,I\right\}\)) at the
\(j\) th time interval (
\(j\in \left\{1,...,J\right\}\)), where
mi denotes cluster size for cluster
i,
I denotes the total number of clusters, and
J denotes total time steps and
\({Y}_{ijk}=1\) denotes the event of interest and
\({Y}_{ijk}=0\) otherwise. A GEE model is formulated for the cross-sectional SW-CRT as follows:
$${\text{logit}}\left({\mu }_{ijk}\right)={\theta }_{0}+{\gamma }_{j}+{\theta }_{1}{W}_{ij}+{\theta }_{2}{X}_{ijk}+{\theta }_{3}{W}_{ij}{X}_{ijk}$$
(1)
where
\({\mu }_{ijk}={\mathbb{E}}\left[{Y}_{ijk}\right]\), denotes the marginal mean response of
\({Y}_{ijk}\),
\({\theta }_{0}\) is the baseline log-odds of the outcome in the control group corresponding to the reference group for the binary covariate (
\({X}_{ijk}\in \left\{\mathrm{0,1}\right\}\),
\({X}_{ijk}=0\) represents the reference group and
\({X}_{ijk}=1\) represents the other), and
\({\gamma }_{j}\) is the period fixed effect for the
\(j\) th time interval.
\({W}_{ij}\) is the design indicator;
\({W}_{ij}=1\) means that all individuals in cluster
\(i\) at time interval
\(j\) receive the intervention and
\({W}_{ij}=0\) otherwise.
\({\theta }_{1}\) is the overall treatment effect (OTE),
\({\theta }_{2}\) is main effect of binary covariate, and
\({\theta }_{3}\) captures the interaction between treatment and the binary covariate (i.e., heterogeneity in treatment effect denoted by HTE). For identification, we set
\({\gamma }_{1}=0\). We use
\(\Theta =({\gamma }_{2},\cdots ,{\gamma }_{J},{\theta }_{0},{\theta }_{1},{\theta }_{2},{\theta }_{3})\in {\mathbb{R}}^{J+3}\) for the parameters and variance of
\({Y}_{ijk}\) is defined as
\({\upsilon }_{ijk}\), where
\({\mathbb{R}}^{d}\) represents the space of all
\(d\)-dimensional real vectors and
\({\upsilon }_{ijk}={\mu }_{ijk}(1-{\mu }_{ijk})\) for binary outcome. Hence model (1) can be written as
$${\text{logit}}\left({\mu }_{ijk}\right)={\mathbf{M}}_{ijk}^{\intercal }\Theta$$
where
\({\mathbf{M}}_{ijk}:={\left[\begin{array}{ccccc}1& {\mathbf{e}}_{j}^{\intercal }& {W}_{ij}& {X}_{ijk}& {W}_{ij}{X}_{ijk}\end{array}\right]}^{\intercal }\in {\mathbb{R}}^{J+3}\) with
\({\mathbf{e}}_{j}\) as the vector of length
\(J-1\) with all elements equal to 0 except the
\(j-1\) th element, which is equal to 1. Now by stacking
\({\mathbf{M}}_{ijk}\)’s as the matrix of
\(J{m}_{i}\) rows and
\(J+3\) columns, namely,
$${\mathbf{M}}_{i}={\left[\begin{array}{ccccc}{\mathbf{M}}_{i11}& \cdots & {\mathbf{M}}_{i1{m}_{i}}& \cdots & {\mathbf{M}}_{iJ{m}_{i}}\end{array}\right]}^{\intercal }\in {\mathbb{R}}^{J{m}_{i}\times \left(J+3\right)}$$
We reach
$${\text{logit}}\left({\mathbf{u}}_{i}\right)={\mathbf{M}}_{i}\Theta \in {\mathbb{R}}^{J{m}_{i}}$$
where
\({\mathbf{u}}_{i}={\left[\begin{array}{cccccc}{\mu }_{i11}& \cdots & {\mu }_{i1{m}_{i}}& {\mu }_{i21}& \cdots & {\mu }_{iJ{m}_{j}}\end{array}\right]}^{\intercal }\) and the function logit here is applied to the vector
\({\mathbf{u}}_{i}\) elementwise.
If the HTE is to be tested with respect to
\({X}_{ijk}\), the interaction effect parameter
\({\theta }_{3}\), instead of the OTE
\({\theta }_{1}\), should be considered for the sample size calculation. In this case, the null hypothesis
\({H}_{0}: {\theta }_{3}=0\) is to be tested against an alternative hypothesis
\({H}_{{\text{a}}}: {\theta }_{3}=\delta\) for some prespecified HTE level
\(\delta \ne 0\). For the purpose of simplification, we assume the SW-CRT has equal cluster sizes, i.e.
\({m}_{1}=\cdots ={m}_{I}=m\). If the HTE level
\({\theta }_{3}\) is estimated by a consistent, asymptotically normally distributed estimator
\({\widehat{\theta }}_{3,m}\), then the power is approximately calculated using the two-tailed Wald test,
$${\text{power}}=\Phi \left(\frac{\delta }{\sqrt{{\mathbb{V}}ar\left({\widehat{\theta }}_{3,m}\right)}}-{z}_{1-\alpha /2}\right)$$
(2)
where
\(\Phi\) is the standard normal distribution function and
\({z}_{1-\alpha /2}\) is the
\({(1-\alpha /2)}^{{\text{th}}}\) standard normal quantile. When estimating the OTE, t distribution is also recommended particularly for SW-CRT designs with small number of clusters:
$${\text{power}}= {\Phi }_{t,{\text{df}}}\left(\frac{\delta }{\sqrt{{\mathbb{V}}ar\left({\widehat{\theta }}_{3,m}\right)}}-{t}_{1-\alpha /2, {\text{df}}}\right)$$
(3)
where
\({\Phi }_{t,{\text{df}}}\) is the cumulative t distribution function with certain degree of freedom (df). Although the proposed method can be used to determine either the number of clusters or cluster size, in this article we focus on determining cluster size and we provide R code for both methods. The use of formula (
2) or (
3) requires an approximation of
\({\mathbb{V}}ar\left({\widehat{\theta }}_{3,m}\right)\), which is the (J + 3, J + 3)
th element in the model-based variance–covariance matrix,
\({\mathbb{V}}ar\left({\widehat{\Theta }}_{m}\right)\) where
\({\widehat{\Theta }}_{m}\) is the GEE estimator of
\(\Theta\). In this article, we provide the sandwich form of approximation for
\({\mathbb{V}}ar\left({\widehat{\Theta }}_{m}\right)\)’s, which arises from the GEE method. Due to the presence of the individual binary covariate, the closed form of the sample size calculation is not available. We consider two correlation structures: a simple exchangeable correlation structure and a nested exchangeable correlation structure which can be based on the values of within-period correlation and between-period correlation. The intraclass correlation (ICC) measures the correlation on the outcome for different individuals in the same cluster within a given time period. The within-period correlation is same as the intraclass correlation (ICC). The cluster autocorrelation (CAC) is the correlation between the population means from the same cluster at two different time periods and is defined as the ratio of between-period correlation and within-period correlation. Hence, a simple exchangeable correlation structure has the same value for within-period correlation and between-period correlation (i.e., CAC = 1), and a nested correlation structure has different values for within-period correlation and between-period correlation. In general, between-period correlation is smaller than within-period correlation (0 < CAC < 1).
Now denote the ICC as
\(\alpha \in \left(-\mathrm{1,1}\right)\) such that
\({\mathbb{C}}ov\left({Y}_{ijk},{Y}_{ijk{\prime}}\right)=\alpha \sqrt{{\upsilon }_{ijk}{\upsilon }_{ijk{\prime}}}\) for
\(k{\prime}\ne k\) and the CAC as
\(\rho \in \left[-\mathrm{1,1}\right]\) such that
\({\mathbb{C}}ov\left({Y}_{ijk},{Y}_{ij{\prime}k{\prime}}\right)=\alpha \rho \sqrt{{\upsilon }_{ij{\prime}k}{\upsilon }_{ijk{\prime}}}\) for
\(j{\prime}\ne j\). Then the variance–covariance matrix of the binary outcomes is
\({\mathbf{V}}_{i}={\mathbb{V}}ar\left({\mathbf{Y}}_{i}\right)={\mathbf{A}}_{i}^\frac{1}{2}{\mathbf{R}}_{i}\left(\alpha ,\rho \right){\mathbf{A}}_{i}^\frac{1}{2}\) for cluster
\(i\), where
$${\mathbf{R}}_{i}\left(\alpha ,\rho \right)=\alpha \rho {\mathbf{J}}_{J{m}_{i}}+\alpha \left(1-\rho \right){\mathbf{I}}_{J}\otimes {\mathbf{J}}_{{m}_{i}}+\left(1-\alpha \right){\mathbf{I}}_{J{m}_{i}}$$
\({\mathbf{I}}_{n}\) is the
\(n\times n\) identity matrix,
\({\mathbf{J}}_{n}\) is an
\(n\times n\) matrix with all elements equal to 1, and
\({\mathbf{A}}_{i}={\text{diag}}\left({\upsilon }_{i11},\cdots ,{\upsilon }_{i1{m}_{i}},\cdots {\upsilon }_{iJ1},\cdots ,{\upsilon }_{iJ{m}_{i}}\right)\). Hence the GEE is
$${\text{U}}\left(\Theta \right)=\sum_{i=1}^{I}\frac{\partial {\mathbf{u}}_{i}}{\partial\Theta }{\mathbf{V}}_{i}^{-1}\left({\mathbf{Y}}_{i}-{\mathbf{u}}_{i}\right)=0$$
(4)
By solving the GEE (3), the resulting
\({\widehat{\Theta }}_{m}\) satisfies
$${\widehat{\Theta }}_{m}-\Theta ={\left(\sum_{i=1}^{I}\frac{\partial {\mathbf{u}}_{i}}{\partial\Theta }{\mathbf{V}}_{i}^{-1}\frac{\partial {\mathbf{u}}_{i}}{\partial\Theta }\right)}^{-1}\sum_{i=1}^{I}\frac{\partial {\mathbf{u}}_{i}}{\partial\Theta }{\mathbf{V}}_{i}^{-1}\left({\mathbf{Y}}_{i}-{\mathbf{u}}_{i}\right)+{O}_{\mathbb{P}}\left({I}^{-1}\right)$$
By ignoring the
\({O}_{\mathbb{P}}\left({I}^{-1}\right)\) term if
\(I\) is sufficiently large, we have
$${\mathbb{V}}ar\left({\widehat{\Theta }}_{m}\right)={\mathbb{V}}ar\left({\widehat{\Theta }}_{m}-\Theta \right)\approx {\left(\sum_{i=1}^{I}\frac{\partial {\mathbf{u}}_{i}}{\partial\Theta }{\mathbf{V}}_{i}^{-1}\frac{\partial {\mathbf{u}}_{i}}{\partial\Theta }\right)}^{-1}={\left(\sum_{i=1}^{I}{\mathbf{M}}_{i}^{\intercal }{\mathbf{A}}_{i}{\mathbf{V}}_{i}^{-1}{\mathbf{A}}_{i}{\mathbf{M}}_{i}\right)}^{-1}$$
(5)
As a result,
\({\mathbb{V}}ar\left({\widehat{\Theta }}_{m}\right)\) can be approximately calculated as
$${\mathbb{V}}ar\left({\widehat{\Theta }}_{m}\right)\approx {\left(\sum_{i=1}^{I}{\mathbf{M}}_{i}^{\intercal }{{\varvec{\Sigma}}}_{i}^{-1}{\mathbf{M}}_{i}\right)}^{-1}$$
(6)
where
\({{\varvec{\Sigma}}}_{i}={\mathbf{A}}_{i}^{-\frac{1}{2}}{\mathbf{R}}_{i}\left(\alpha ,\rho \right){\mathbf{A}}_{i}^{-\frac{1}{2}}\) for cluster
\(i\). Equation (
6) leads to a model-based (naïve) variance estimator.
$$\widehat{{\mathbb{V}}ar}\left({\widehat{\Theta }}_{m}\right)={\left(\sum_{i=1}^{I}{\mathbf{M}}_{i}^{\intercal }{\widehat{{\varvec{\Sigma}}}}_{i}^{-1}{\mathbf{M}}_{i}\right)}^{-1}$$
(7)
where
\({\widehat{{\varvec{\Sigma}}}}_{i}={\mathbf{A}}_{i}^{-\frac{1}{2}}{\mathbf{R}}_{i}\left(\widehat{\alpha },\widehat{\rho }\right){\mathbf{A}}_{i}^{-\frac{1}{2}}\) with
\(\widehat{\alpha }\) and
\(\widehat{\rho }\) also obtained by solving the GEE (4). Note that for power calculation we do not have data. Instead, we have assumptions about the values of
\(\alpha\) and
\(\rho\). Hence we may compute
\(\widetilde{{\mathbb{V}}ar}\left({\widehat{\Theta }}_{m}\right)\) without any data analysis, i.e.
\(\widetilde{{\mathbb{V}}ar}\left({\widehat{\Theta }}_{m}\right)\) is a deterministic quantity rather than a random variable. We call power calculation method (2) or (3) based on the
\(\widetilde{{\mathbb{V}}ar}\left({\widehat{\Theta }}_{m}\right)\) with specific values of
\(\alpha\) and
\(\rho\) “GEE” for simplicity.
Bias-correction sandwich variance approaches
It is well-known that
\(\widetilde{{\mathbb{V}}ar}\left({\widehat{\theta }}_{3,m}\right)\), the
\(\left(J+3,J+3\right)\) th element of
\(\widetilde{{\mathbb{V}}ar}\left({\widehat{\Theta }}_{m}\right)\), is less than the true value of
\({\mathbb{V}}ar\left({\widehat{\theta }}_{3,m}\right)\) due to ignoring the
\({O}_{\mathbb{P}}\left({I}^{-1}\right)\) term particularly for cases with small
\(I\). From the perspective of data analysis, if the number of clusters
\(I\) is small, bias-correction techniques are recommended for obtaining
\(\widehat{{\mathbb{V}}ar}\left({\widehat{\Theta }}_{m}\right)\) to mitigate increased risk of type I errors [
24]. In this article, we consider two bias-correction techniques from data analysis to adjust
\(\widetilde{{\mathbb{V}}ar}\left({\widehat{\theta }}_{3,m}\right)\) for small number of clusters: 1) the Kauermann and Carroll (KC) [
25] Correction; and 2) the Mancl and DeRouen (MD) [
26] Correction. Both bias-correction techniques lead to the following modified form of approximation for
\({\mathbb{V}}ar\left({\widehat{\Theta }}_{m}\right)\):
$${\mathbb{V}}ar\left({\widehat{\Theta }}_{m}\right)\approx {\left(\sum_{i=1}^{I}{\mathbf{M}}_{i}^{\intercal }{{\varvec{\Sigma}}}_{i}^{-1}{\mathbf{M}}_{i}\right)}^{-1}\left(\sum_{i=1}^{I}{\mathbf{M}}_{i}^{\intercal }{{\varvec{\Omega}}}_{i}^{-1}{\mathbf{M}}_{i}\right){\left(\sum_{i=1}^{I}{\mathbf{M}}_{i}^{\intercal }{{\varvec{\Sigma}}}_{i}^{-1}{\mathbf{M}}_{i}\right)}^{-1}$$
(8)
where
\({{\varvec{\Omega}}}_{i}\) is the corresponding residual-modified version of
\({{\varvec{\Sigma}}}_{i}\) with the form
$${{\varvec{\Omega}}}_{i}={\left({{\varvec{\Sigma}}}_{i}^{-1}{\mathbf{A}}_{i}^{-1}{\mathbf{F}}_{i}{\mathbf{A}}_{i}{{\varvec{\Sigma}}}_{i}{\mathbf{A}}_{i}{\mathbf{F}}_{i}^{\intercal }{\mathbf{A}}_{i}^{-1}{{\varvec{\Sigma}}}_{i}^{-1}\right)}^{-1}$$
where the residuals modification matrices
\({\mathbf{F}}_{i}\) is defined as one of
$${\mathbf{F}}_{i}^{{\text{KC}}}={\left({\mathbf{I}}_{J{m}_{i}}-{\mathbf{H}}_{i}\right)}^{-\frac{1}{2}}\in {\mathbb{R}}^{J{m}_{i}\times J{m}_{i}}$$
and
$${\mathbf{F}}_{i}^{{\text{MD}}}={\left({\mathbf{I}}_{J{m}_{i}}-{\mathbf{H}}_{i}\right)}^{-1}\in {\mathbb{R}}^{J{m}_{i}\times J{m}_{i}}$$
depending on the selection of bias-correction technique (KC or MD). The matrix
\({\mathbf{H}}_{i}\) is defined as
$${\mathbf{H}}_{i}={\mathbf{A}}_{i}{\mathbf{M}}_{i}{\left(\sum_{i=1}^{I}{\mathbf{M}}_{i}^{\intercal }{{\varvec{\Sigma}}}_{i}^{-1}{\mathbf{M}}_{i}\right)}^{-1}{\mathbf{M}}_{i}^{\intercal }{{\varvec{\Sigma}}}_{i}^{-1}{\mathbf{A}}_{i}^{-1}\in {\mathbb{R}}^{J{m}_{i}\times J{m}_{i}}.$$
Note that by writing
$${\mathbf{F}}_{i}={\left({\mathbf{I}}_{J{m}_{i}}-{\mathbf{H}}_{i}\right)}^{0}={\mathbf{I}}_{J{m}_{i}}$$
the right hand-side of (8) collapses to
\(\widetilde{{\mathbb{V}}ar}\left({\widehat{\Theta }}_{m}\right)\), which means
\(\widetilde{{\mathbb{V}}ar}\left({\widehat{\Theta }}_{m}\right)\) is also a particular case of the right hand-side of (8) like the KC and MD cases. For this reason, we may distinguish our power calculation methods by the choice of
\({\mathbf{F}}_{i}\) in the right hand-side of (8), namely, “GEE” refers to the method with
\({\mathbf{F}}_{i}={\mathbf{I}}_{J{m}_{i}}\); “GEE-KC” refers to the method with
\({\mathbf{F}}_{i}={\mathbf{F}}_{i}^{{\text{KC}}}\); and “GEE-MD” refers to the method with
\({\mathbf{F}}_{i}={\mathbf{F}}_{i}^{{\text{MD}}}\). Again, we do not estimate
\({\mathbb{V}}ar\left({\widehat{\Theta }}_{m}\right)\) because
\({\mathbf{I}}_{J{m}_{i}}\),
\({\mathbf{F}}_{i}^{{\text{KC}}}\), and
\({\mathbf{F}}_{i}^{{\text{MD}}}\) are known given the assumption of
\({\mathbf{R}}_{i}\left(\alpha ,\rho \right)\) ‘s and the parameters therein at the design stage.. In other words, GEE-KC and GEE-MD are proposed to improve the predicted power based on GEE in the situation that number of clusters is small.
Scheme for generating simulation datasets
We simulate binary outcomes
\({Y}_{ijk}\)’s based on marginal means
\({\mathbf{u}}_{i}\) and correlation matrix
\({\mathbf{R}}_{i}\left(\alpha ,\rho \right)\). Note that we have
$${\mathbb{E}}\left[{\mathbf{Y}}_{i}{\mathbf{Y}}_{i}^{\intercal }\right]={\mathbf{A}}_{i}^\frac{1}{2}{\mathbf{R}}_{i}\left(\alpha ,\rho \right){\mathbf{A}}_{i}^\frac{1}{2}+{\mathbf{u}}_{i}{\mathbf{u}}_{i}^{\intercal }$$
where
\({\mathbf{Y}}_{i}={\left[\begin{array}{cccccc}{Y}_{i11}& \cdots & {Y}_{i1m}& {Y}_{i21}& \cdots & {Y}_{iJm}\end{array}\right]}^{\intercal }.\)
To simulate
\({\mathbf{Y}}_{i}\), we consider the copula-based method for multivariate binary outcomes [
27,
28] which assumes
\({Y}_{ijk}=1\left\{{Z}_{ijk}\le {\Phi }^{-1}\left({\mu }_{ijk}\right)\right\}\), for
\(j=1,\dots ,J\) and
\(k=1,\dots ,m\), where
\(1\left\{\cdot \right\}\) is the indicator function,
\(\Phi\) is the standard normal CDF and the random vector,
\({\mathbf{Z}}_{i}={\left[\begin{array}{cccccc}{Z}_{i11}& \cdots & {Z}_{i1m}& {Z}_{i21}& \cdots & {Z}_{iJm}\end{array}\right]}^{\intercal }\sim {N}_{Jm}\left(0,{{\varvec{\Xi}}}_{i}\right)\) for some correlation matrix
\({{\varvec{\Xi}}}_{i}\in {\mathbb{R}}^{Jm\times Jm}\) (hence
\({Z}_{ijk}\sim N\left(\mathrm{0,1}\right)\) for all
\(j=1,\dots ,J\) and
\(k=1,\dots ,m\) marginally). The elements of
\({{\varvec{\Xi}}}_{i}\) corresponding to the location of
\({Z}_{ijk}\) and
\({Z}_{ij\mathrm{^{\prime}}k\mathrm{^{\prime}}}\) within
\({\mathbf{Z}}_{i}\) are defined as
$${\xi }_{ijk,ij\mathrm{^{\prime}}k\mathrm{^{\prime}}}=Corr\left({Z}_{ijk},{Z}_{ij\mathrm{^{\prime}}k\mathrm{^{\prime}}}\right)={\mathbb{E}}\left[{Z}_{ijk}{Z}_{ij\mathrm{^{\prime}}k\mathrm{^{\prime}}}\right]$$
For all
\(j,{j}{\prime}=1,\dots ,J\) and
\(k,{k}{\prime}=1,\dots ,m\). To ensure that
\({\mathbf{R}}_{i}\left(\alpha ,\rho \right)\) is the correlation matrix of
\({\mathbf{Y}}_{i}\),
\({\xi }_{ijk,ij\mathrm{^{\prime}}k\mathrm{^{\prime}}}\) is determined by solving the equation
$${C}_{{\xi }_{ijk,ij\mathrm{^{\prime}}k\mathrm{^{\prime}}}}\left({\mu }_{ijk},{\mu }_{ij\mathrm{^{\prime}}k\mathrm{^{\prime}}}\right)={\mathbb{E}}\left[{Y}_{ijk}{Y}_{ij\mathrm{^{\prime}}k\mathrm{^{\prime}}}\right]$$
where \({C}_{{\xi }_{ijk,ij\mathrm{^{\prime}}k\mathrm{^{\prime}}}}\) is a bivariate Gaussian copula with correlation parameter \({\xi }_{ijk,ij\mathrm{^{\prime}}k\mathrm{^{\prime}}}\). We compute the bivariate copula components using the “pbinormcop” function of the R package “VGAM”.
Note that the resulting
\({{\varvec{\Xi}}}_{i}\) is not always positively definite [
29]. In this case, we need to modify this matrix to force it to become non-negative definite such that we may simulate the random vector
\({\mathbf{Z}}_{i}\) using the modified
\({{\varvec{\Xi}}}_{i}\). An eigenvalue modification trick [
30] is employed for this purpose. Values for parameters for the simulation study setups are shown in Table
1. We set
\({\theta }_{0}={\text{log}}(0.15/0.85)\) representing 15% prevalence of outcome for control group with reference group of binary covariate. This prevalence rate came from our motivational example. We included increasing secular time trend by specifying
\({\gamma }_{2}=0.1,{\gamma }_{3}=0.2,{\gamma }_{4}=0.3,\) and
\({\gamma }_{5}=0.4\). In power calculation the secular time trend fixed effects should be determined either from existing literature or by a preliminary analysis. In the Supplementary materials of (16), a demonstration of the ‘power.ap’ function in the R package CRTpowerdist considers categorical time trend using coefficients 1, 2, 3, and 4 for Gaussian outcomes; for binary outcomes using logit link these values are too large for computation. As a result, we use 0.1, 0.2, 0.3, and 0.4 instead for demonstration purpose only. We simulated binary covariate,
\({X}_{ijk}\) using the cluster prevalence levels 30% and 50%, respectively. These two values also came from our motivational example in which 30% of patients are Black and 48% of patients are female. We plan to evaluate the statistical operating characteristics of HTE when we have imbalanced and balanced binary covariates. For OTE, we set
\({\theta }_{1}={\text{log}}\left(1.35\right)\) and
\({\theta }_{1}={\text{log}}\left(1.68\right)\) representing a small and intermediate effect sizes for the intervention effect and
\({\theta }_{2}={\text{log}}\left(1.5\right)\) for the binary covariate. The main quantity of interest is HTE,
\({\theta }_{3}\). We considered two values,
\({\text{log}}\left(1.5\right)\) and
\({\text{log}}\left(2\right)\), to achieve an intermediate and large effect size, respectively.
Table 1
Setup for simulation study for parameter values except interaction along with the number of clusters, time points, and cluster size
\({\theta }_{0}={\text{log}}\left(0.15/0.85\right)\)
| the true log odds ratio of the outcome in the control group (\({W}_{ijk}=0\)) with reference group for binary covariate (\({X}_{ijk}=0\)) for kth individual in ith cluster at jth period. We assume that prevalence of outcome is 15% |
\({\gamma }_{1}=0, {\gamma }_{2}=0.1,{\gamma }_{3}=0.2,{\gamma }_{4}=0.3,{\gamma }_{5}=0.4\)
| Increasing secular trend |
\({{\theta }_{1}={\text{log}}\left(1.35\right); \theta }_{1}={\text{log}}\left(1.68\right)\)
| Chosen arbitrarily for small and intermediate effect size for intervention effect |
30%, 50% | Prevalence rate for binary covariate |
\({\theta }_{2}={\text{log}}\left(1.5\right)\)
| Chosen arbitrarily for intermediate effect size for binary covariate |
\({\theta }_{3}={\text{log}}\left(1.5\right); {\theta }_{3}={\text{log}}\left(2\right)\)
| Chosen arbitrarily for intermediate and large effect size for interaction |
\(I=8, 20, 40\)
| Number of clusters |
\(J=5\)
| Number of time steps |
\(m=\mathrm{20,40}, 60, \mathrm{80,120}\)
| Cluster size |
\(ICC=0.1\)
\(ICC=0.1\);\(CAC= 0.8\)
| For a simple exchangeable correlation structure For a nested exchangeable correlation structure |
Relatively small number of clusters are often used in cross-sectional SW-CRT. A literature review found that 50% of 56 cross-sectional SW-CRTs reviewed had fewer than 10 clusters [
31]. Therefore, we simulated eight clusters (I = 8), five time periods (J = 5), and several fixed cluster sizes (20, 40, 60, 80, and 120). Hence, the total numbers of subjects in this simulation study were 800, 1,600, 2,400, 3,200, and 4,800 individuals, respectively.
Since we used GEE models, we needed to specify working correlation matrix structure. Although there are three distinct correlation structures that are typically used in the cross-sectional SW-CRT: simple exchangeable correlation structure, nested exchangeable correlation structure, and exponential decay correlation structure, we decided to simulate the first two because they are most commonly used in simulation studies [
17,
32,
33]. For both correlation structures used, we set ICC to be 0.1. We considered a CAC value of 0.8 for the nested exchangeable correlation structure.
Comparison between simulated power and predicted power for sample size determination
We conduct a simulation study to examine sample size determination, provided in terms of cluster size, from each approach given the numbers of clusters and periods. We consider two distinct values for the prevalence of binary covariate (30% and 50%), the OTE effect size (\({\text{log}}\left(1.35\right)\) and \({\text{log}}\left(1.68\right)\)), and the HTE effect size (\({\text{log}}\left(1.5\right)\) and \({\text{log}}\left(2\right)\)).
The predicted powers/Type I errors are obtained by our proposed methods (GEE, GEE-KC and GEE-MD) using Eq. (
2) given the design parameters. The simulated powers/empirical Type I errors are obtained by simulation based on the same set of design parameters. While the simulated powers are usually seen as the true powers for the given design due to the law of large numbers, they might be very time-consuming depending on the complexity of the simulation scheme and the true value of OTE/HTE. For this reason, we expect to have priori information of a suitable range of cluster sizes such that our simulated power may achieve the target power (80% for example) without conducting simulation studies. To this end, mathematical approximation methods are employed to compute the predicted powers/Type I errors as fast feedbacks to guessing values of cluster sizes and to help determine the suitable cluster sizes.
Sensitivity analysis
In the sensitivity analysis for the three proposed approaches using different numbers of clusters and cluster sizes on simulated power, we consider two additional numbers of clusters (I = 20 and 40) and five additional cluster sizes (m = 20, 40, 60, 80, and 100) given a prevalence rate of 50% for the binary covariate and an intermediate OTE effect size (\({\theta }_{1}={\text{log}}\left(1.68\right)\)). Once again, we consider intermediate and large effect sizes for HTE, the quantity of interest. From different combinations of the number of clusters and cluster sizes, we examine whether an increase in the number of clusters improves statistical power compared to an increase in cluster size indirectly. We also examine the tradeoff between the number of clusters and cluster size in simulated power. We consider two total number of observations per step (160 and 800) and then two combinations of the number of clusters and cluster sizes for a given total number of observations. One pair represents small number of clusters and relatively bigger cluster size and another pair represents larger number of clusters and small cluster size. Hence, I = 8 & m = 20 and I = 40 & m = 4 for 160 observations per step and I = 8 & m = 100 and I = 40 & m = 20 for 800 observations per step. We also examine the impact of ICC and CAC on simulated power for the above setup. We consider three setups for ICC and CAC: ICC = 0.1 & CAC = 1; ICC = 0.1 & CAC = 0.8; and ICC = 0.05 & CAC = 0.4.