A. Sample size and the detectable difference
Because patient function (or dysfunction) is generally considered the more consequential of our two primary outcome measures [
27] (the other being bothersomeness of back pain), sample size calculations are based on the Roland-Morris Disability Questionnaire (RDQ). Our sample size is designed to ensure that we have good power to detect a clinically significant difference of 2.5 points for pairwise comparisons (yoga to self care and yoga to exercise) on the RDQ (Aim 1 and 2) and adequate power to detect a difference of 1.7 points between the yoga and exercise groups that would be of interest when exploring mechanisms of action (Aim 3). We think that a smaller detectable difference between yoga and exercise can be justified when examining mechanisms of action because in the Aim 3 comparison of yoga and exercise, we focus on the
additional benefits of yoga compared with exercise, anticipating that a portion of yoga's clinical effects would actually result from movement.
We have accommodated these dual power needs by proposing a 2:2:1 randomization ratio (yoga: exercise: self care), with a total of 210 participants. Assuming 10% loss to follow-up (which is slightly higher than in our previous study [
19]), there would be outcomes data for 75:75:38 participants in the yoga, exercise, and self care groups, respectively. To protect against multiple comparisons, we will use Fisher's protected least significant difference approach, which has been shown to have desirable properties when there are three groups [
51]. This approach makes pair-wise comparisons between the three treatment groups only if the overall
F-test is significant. The power of this omnibus
F-test depends on how the means from the three treatment groups differ. We therefore assumed that the yoga group would be 1.7 RDQ points superior to the exercise group, which would, in turn, be 0.8 RDQ points better than the self care group (giving a difference of 2.5 points between the yoga and self care groups). We chose a 1.7-point difference between yoga and exercise because it is slightly more conservative (i.e., smaller) than that found in our previous study [
19].
Our estimates of the standard deviations of our primary outcome measures adjusted for pre-randomization baseline values were derived from analyses of covariance of 12-week follow-up data estimated from the 101 study participants in our 3-arm pilot study: RDQ standard deviation (SD) = 3.68 and bothersomeness SD = 2.38. With our proposed sample size, the omnibus F-test for the RDQ score will have 92% power for detecting a statistically significant difference among the three treatment means with the distribution assumed above. If this omnibus test isstatistically significant we will address Aims 1 and 2 of the study by comparing the appropriate pairs of means, as discussed below. To detect a pair-wise difference of 2.5 RDQ points, we will have 92% power for the yoga (or exercise) to self care comparison and 98% power for the yoga to exercise comparison. For Aim 3 we will have 92% to detect a pairwise difference of 2.5 RDQ points between yoga (or exercise) and self care and 80% power to detect a pairwise difference of 1.7 points between yoga and exercise.
Our sample size will also provide adequate power to detect a clinically important difference of 1.5 on our 0 to 10 bothersomeness measure. For the omnibus F-test, we will have 89% power for detecting a significant difference of 1.5 points among the three groups (if we assume a difference of 1.1 points between the yoga and exercise groups and of 0.4 points between exercise and self care). For a difference of 1.5 points on the bothersomeness measure, we will have 88% power for the yoga (or exercise) to self care comparison and 97% power to for the yoga to exercise comparison. For Aim 3, we will have 88% power to detect a pairwise difference of 1.5 bothersomeness points between yoga (or exercise) and self care and 80% power to detect a pairwise difference of 1.1 points between yoga and exercise.
At each time point, both primary outcomes (function and symptoms) will be tested at the 0.05 level because they address separate scientific questions. Analyses of both outcomes at all follow-up times will be reported, imposing a more stringent requirement than simply reporting a sole significant outcome. Arguments against adjusting for multiple comparisons in this situation have been made by Rothman [
52,
53] and others [
54].
The power calculations are based on simple comparisons of the follow-up scores at a single point in time with adjustment for baseline values using analysis of covariance. We also plan to adjust for other baseline characteristics (e.g., age, gender, and baseline covariates found predictive of 10-week outcomes). Inclusion of such baseline covariates can improve precision of the variance estimate and therefore increase power.
Since assessment of Aim 3, the mediator analysis, is dependent on the results from Aim 1 and Aim 2, the study was not directly powered for Aim 3. We ran a simple power analyses for the primary mediator of interest, body awareness, assuming the expected sample sizes of 75:75:38, a single time point, and the RDQ Score as the outcome. We found a median power of 0.86 to detect body awareness as a significant mediator for yoga compared to self care and a median power of 0.83 for to detect body awareness as a significant mediator for yoga compared to exercise.
In summary, we have excellent power to detect a clinically meaningful difference on the omnibus test and the pairwise comparisons with self care for Aims 1 and 2 as well as adequate power to evaluate the yoga - exercise comparison for the mediator analysis for Aim 3. Although powered to detect a clinically significant difference on the RDQ, the resultant sample sizes will provide ample power for both of our primary outcome measures.
B. Statistical analysis
We will analyze outcomes from all follow-up time points in a single model, adjusting for possible correlation within individuals using generalized estimating equations (GEE) [
55]. Because we cannot reasonably make an assumption of constant or linear group differences over time, we will include an interaction between treatment group and time point. In this case, the multivariate model that includes all time points provides very similar results to fitting separate analysis of covariance models over time and should not substantially influence statistical power.
We also plan to adjust for other baseline characteristics. Specifically, gender, age, pain traveling below the knee but not meeting the criteria for sciatica, job related activity, and Body Mass Index. In a randomized trial of this size, most baseline values and other covariates are unlikely to differ between randomized groups. However, inclusion of baseline covariates can improve the precision of the estimate and therefore increase power to detect differences.
We will use an intent-to-treat approach in all analyses,
i.e., individuals will be analyzed by randomized group regardless of participation in any classes. This minimizes biases that often occur when participants not receiving assigned treatments are excluded from the analyses. The linear regression model (analysis of covariance) we will use is of the form:
where Y(t) is the response at follow-up time t, Baseline is the pre-randomization value of the outcome measure, Trt includes dummy variables for the yoga and exercise groups, Time is a series of dummy variables indicating the follow-up times, and z is a vector of covariates representing other variables being adjusted for. (Note that α1, α2, α3, and α4 are vectors.) The referent group in this model is the self care group at the first follow-up time. The models will be fitted using GEE to take into account possible correlation within individuals over time. For each follow-up time point that the omnibus F-test is statistically significant, we will go on to test whether there is a difference between yoga and self care to address Aim 1 and a difference between yoga and exercise to address Aim 2.
Based on similar studies on this study population we expect to have at least a 90% follow-up rate which reduces the potential for bias due to loss of follow-up. Therefore, our primary analysis will be a complete case analysis including all observed follow-up outcomes, but we will adjust for all baseline covariates that are predictive of outcome, probability of being missing, or differences between treatment groups. We will also conduct sensitivity analyses using an imputation method for non-ignorable non-response to evaluate if our results are robust in the complete case analysis [
56]. We will report both results in our manuscript.
To help us further understand the benefits of yoga as a treatment for back pain, we will explore possible interactions between treatment groups and covariates. For example, we will consider interactions of the treatment group and the baseline value that would indicate the effect of treatment depends on status at baseline. We will also test for significant interactions of treatment group with other variables (e.g., gender) to determine if treatment differences are modified by these variables. We will use similar methods to analyze secondary outcomes including disability days and satisfaction with care.
If we find that yoga is more effective than self care or exercise we will move to Aim 3 with the goal of exploring whether the beneficial effects of yoga on our primary outcomes are mediated through certain measured variables. As mentioned in Figure
1 and Table
4, we are interested in four major classes of measures that could mediate the effects of yoga on back pain outcomes: (1) physical function, (2) cognitive appraisal, (3) affect and stress, and (4) neuroendocrine function. In the interest of parsimony, we will narrow down the number of perspective variables within the four major classes of proposed mediators by using a modification of the framework described by Baron and Kenney [
57].
We will first individually regress each of the potential mediators within a major class on the treatment group. If the potential mediator is associated with the treatment group (α-level = 0.10), we will then evaluate the magnitude of the effect of the individual potential mediator by using an inverse probability weighted (IPW) modeling approach on each of the primary outcomes (i.e., RDQ score or bothersomeness score)[
58]. This approach allows us to estimate the direct effect of treatment after rebalancing the treatment groups with respect to the mediator. Specifically, we will first model the probability of the treatment given the mediator using logistic regression. From this model we will obtain the estimated probabilities that each person received their observed treatment given their observed mediator value. We will then use an inverse probability of treatment weighted regression to model the primary outcomes on treatment status while adjusting for the baseline level of the outcome. Comparing the weighted to the unweighted model will allow us to estimate how much of the direct effect of treatment on the outcome can be explained by a potential mediator.
We will do this for all potential mediators in a class and rank them based on how much of the direct effect of treatment on the outcome each explains. The potential mediators within a class that explain at least 10% of the direct effect will then be put in a multiple mediator weighted regression model to assess whether the effect can be mostly explained by a single mediator in the class or if it requires multiple mediators within the class. This stepwise approach will be used to reduce the number of mediators that will be included from a given class. It will not provide estimates of the final strength of the class of mediators since this would require the assumption that classes of potential mediators are independent of one another (i.e., physical function and stress are unrelated). Assessment of the strength of mediation from each class requires that we conduct a multiple mediator analyses evaluating all classes of mediators in a single model [
59].
After determining the subset of mediators to be included from each class, we will run a final multiple mediator IPW model. The application of the IPW approach, as compared to the traditional approach of adjusting for multiple mediators, allows us to more appropriately account for confounding between a mediator and the outcome both by additional mediators and by other measured variables [
60]. Further we are better able to estimate the indirect effects of each mediator in a causal framework through decomposition of the total effect of treatment into indirect effects through each mediator and the direct effect after accounting for all mediators.
All time points for which there is a significant difference between yoga and self care (or exercise) will be included in the models, and we will use GEE to account for possible correlation within subjects over time.
The results of our previous study suggest that there are several possible scenarios of treatment differences we might expect to find in our proposed trial [
19]. Each of these scenarios would result in a different approach to exploring mediating factors.
Scenario 1: yoga is more effective than self care but not significantly better than exercise, which is not significantly better than self care. In this scenario, we will look for mediators that explain the mechanisms through which yoga works compared to self care, but we cannot explore whether there are different mediators for exercise and yoga if exercise is not significantly better than self care. Thus, we will not be able to determine whether there are mechanisms of healing unique to yoga compared to a "body-focused" treatment like exercise.
Scenario 2: yoga is significantly better than both self care and exercise, and exercise may or may not be significantly better than self care. In this scenario, we will look for mediators that may explain how yoga works compared to self care and to exercise. It is possible that these mediators are different. By focusing on the yoga vs. exercise difference, we can determine which mediators are most responsible for the unique effects of yoga, which we believe will be mostly those related to increased awareness.
Scenario 3: both yoga and exercise are better than self care but yoga is not significantly different from exercise. In this scenario, we might conclude that the beneficial effects of yoga are solely or largely due to physical exercise, and that the awareness component has no materially important effect on back pain. However, it is possible that completely different pathways mediate these beneficial effects, and we will be able to determine this through the mediator analysis. For example, the treatment effect of yoga compared to self care might be significantly reduced when a specific mediator variable (e.g., body awareness) is included in the model, with no significant change on the treatment effect of exercise compared to self care.