01.12.2018 | Technical advance | Ausgabe 1/2018 Open Access

# Bayesian alternatives for common null-hypothesis significance tests in psychiatry: a non-technical guide using JASP

- Zeitschrift:
- BMC Psychiatry > Ausgabe 1/2018

## Background

_{0}) compared to the alternative hypothesis (H

_{1}), given a prior probability. A Bayes factor, which is a popular implementation of Bayesian hypothesis testing, can quantify the degree to which the data favor one of two hypotheses by considering the prior odds. It is important to note that the Bayesian framework also includes parameter estimation, which can address the size of an effect [for an excellent treatment of Bayesian parameter estimation, see [10]]. While Bayesian parameter estimation is a valuable tool, hypothesis testing via Bayesian model comparison can facilitate theory prediction by providing a measure of relative evidence between two models [13], typically a null and alternative model.

## Methods

## Results

### Correlations

_{0}) that the data is distributed according to a bivariate normal distribution with zero covariance — and therefore that there is no correlation between the spirituality and age (i.e., ρ = 0) — and the alternative hypothesis (H

_{1}) that age and spirituality distributed according to a bivariate normal distribution with a non-zero covariance are related. A default prior probability distribution for ρ restricts the parameter space between any value of − 1 and 1, however, values around ρ = 0 are far more likely. We can prescribe more mass to values around ρ = 0 by assigning a smaller stretched beta prior width. Here, we assigned a stretched beta prior [31], with a width of 0.5, in the JASP interface. The dashed line in Fig. 2b illustrates the prior distribution for our example. We now test how the observed data updates our prior distribution with the posterior distribution. Assuming that there is a relationship between age and spirituality, the estimate of the correlation coefficient (ρ) was 0.03 and the central credible interval ranged between − 0.18 and 0.25, which suggests that we are 95% confident that the true value of ρ is located within these bounds. Although confidence intervals were calculated for the NHST analysis described above, these intervals are calculated by average performance over the long run of a series of future hypothetical replications. Therefore, it is inaccurate to conclude using NHST confidence intervals that we are 95% confident that the true effect size lies between a set of confidence intervals [2]. However, as the Bayesian framework uses the present data to determine the credible interval, then such a conclusion is valid. As BF

_{01}= 4.55, this indicates the null model is 4.55 more favored than the alternative model, given the data. Not only does this provide evidence for H

_{0}relative to H

_{1}— something not possible with p-values — but the Bayes factor also conveys the magnitude of this evidence. Note that JASP reports equivalent BF

_{10}and BF

_{01}values (Fig. 2b), with the latter simply the inverse of the former. Here, it makes more sense to report the BF

_{01}value, as we are more interested in how much more favored the null model (the first subscript number) is than the alternative model (the second subscript number). An illustration of the effects of assigning a range of different prior distributions (i.e., a Bayes factor robustness check) is presented in Fig. 2c. If the data is not bivariate normal, then the Bayesian equivalent to Kendall’s tau [32] is also available as an analysis option in JASP.

### Frequency distributions

^{2}test suggests that these groups are not distributed differently [χ

^{2}(1) = 1.55, p = 0.21]. The log odds ratio for this analysis was − 0.92 [95% CI (− 2.4, 0.54)]. Like the previous analysis of correlational data, this does not provide any evidence for the null hypothesis nor provide any confidence that the true log odds ratio lies between the CI bounds. Bayesian frequency distribution analysis was performed using independent multinomial sampling, as the crucial test was a comparison of two proportions and the number of people assigned to receive each treatment was presumably fixed [33, 34]. The median log odds ratio was − 0.86, with a 95% credible interval of − 2.31 and 0.51. The null model was only slightly favored over the alternative model (BF

_{01}= 1.16). A Bayes factor close to 1 suggests that there were too few data for this analysis [4].

### T-tests

_{1}model includes more realistic effect sizes. The corresponding Bayes factor provides anecdotal evidence for the null hypothesis relative to the alternative hypothesis (BF

_{01}= 1.93; Fig. 3a), with a posterior median of 0.2 and a 95% credible interval range of − 0.2 to 0.61. As this BF

_{01}value was close to 1, this is suggestive of data insensitivity [4]. In other words, more data needs to be collected. A robustness check was also performed to assess sensitivity to the prior (Fig. 3b), with a wide prior yielding a BF

_{01}= 3.2. While some would consider this BF

_{01}value as moderate support for the null hypothesis, a wide Cauchy distribution scaling factor of 1 presumes we are 50% confident that the true effect will lie between d = − 1 and d = 1, which would be unrealistic for most areas of psychiatry. Directional hypothesis testing, similar to a classical one-sided t-test, is also possible with a Bayesian framework. Prior distributions can incorporate prior knowledge and be constrained to specific intervals. With a pre-registered hypothesis that intranasal oxytocin can increase ratings of spirituality (H

_{+}), the prior distribution can be set with more mass around zero (as per our non-directional test), but no mass less than zero (Fig. 3c). The directional test provided only very modest support in favor of the H

_{0}model compared to the H

_{+}model (BF

_{01}= 1.2).

_{10}= 1.38; Fig. 4a), with a posterior median of 0.28 and a 95% credible interval range of 0.01 to 0.53. Although we now have evidence for alternative hypothesis relative to the null hypothesis when using an informed prior (as opposed to evidence for the null model when using a default prior), this evidence is still quite weak. Without explicit prior information, the “Oosterwijk prior” (a t-distribution centered at 0.35, with a scale of 0.102 and 3 degrees of freedom) can be used as an informed prior, which is representative of the small-to-medium effects commonly observed in the biobehavioral sciences [37]. The informed Oosterwijk prior yielded a BF

_{10}of 1.53 (Fig. 4b; posterior median of 0.33; 95% credible interval range of 0.09 to 0.54), which was a similar result to the first informed prior we presented.

### ANCOVA

_{p}

^{2}= 0.061). For the Bayesian ANCOVA [38], a model including intervention group and religious affiliation will be compared against the null model, which only contains religious affiliation (See Table 1 for included models). The default JASP multivariate Cauchy priors (fixed effects Cauchy prior scale parameter for fixed effects = 0.5, Cauchy prior scale parameter for covariates = 0.354) will be used, although these parameters can be adjusted. As Bayes factors have a transitive relationship [39], the model with intervention group and religious affiliation (BF = 398,231) can be compared to the religious affiliation model (BF = 230,440) by division (398,231/230440 ≃ 1.73). Thus, after explaining for the error variance attributable to religious affiliation, oxytocin increases spirituality. However, as the oxytocin condition + religious affiliation model was only preferred to the oxytocin model by a factor of 1.73, this could be considered only very modest evidence. Given the modest magnitude of this Bayes factor, this does not suggest that there was no effect, but rather that the observed data were insensitive to detect an effect (i.e., more participants might be required). This is consistent with recent concerns surrounding statistically underpowered oxytocin studies [36].

Model type | Model contents | BF _{10} | BF _{01} |
---|---|---|---|

Null model | Only participants have effects | 1 | 1 |

Condition model | Null model + main effect of condition | 0.41 | 2.44 |

Religious affiliation model | Null model + main effect of religious affiliation | 230,440 | < 0.001 |

Condition + religious affiliation model | Null model + condition + religious affiliation | 398,231 | < 0.001 |

### ANOVA

_{p}

^{2}< 0.01), treatment (F(1, 74) = 1.25, p = 0.27, η

_{p}

^{2}= 0.02), or time × treatment interaction (F(1, 74) = 0.08, p = 0.78, η

_{p}

^{2}< 0.01). A Bayesian repeated measures ANOVA compares a series of different models against a null model [40]. We will compare 4 models against the null model (Table 2). Of note, the interaction model also includes the main effects model, as interactions without corresponding main effects are considered implausible [41]. The default JASP prior for fixed effects will be used (r scale prior width = 0.5). Here, the null model was 7.85 times more favored than the main effects model and 32.21 times more favored than the interaction model (Table 2). There was moderate evidence that the null model was more favored than the time model (BF = 5.34), but only very little evidence it was more favored than the condition model (BF = 1.54), which is suggestive of insensitive data. Comparison of the main effects model with the interaction model (7.85/32.21) reveals that the main effects model was preferred to the interaction model by a BF of 4.17 (i.e., 1/0.24).

Model type | Model contents | BF _{10} | BF _{01} |
---|---|---|---|

Null model | Only participants have effects | 1 | 1 |

Time model | Null model + main effect of time | 0.19 | 5.34 |

Condition model | Null model + main effect of condition | 0.65 | 1.54 |

Main effects model | Null model + time model + condition model | 0.13 | 7.85 |

Interaction model | Main effects model + interaction effects | 0.03 | 32.21 |

## Conclusions

_{1}and H

_{0}, the magnitude of this evidence is also presented as an easy-to-interpret odds ratio. For demonstration, we have provided worked examples of Bayesian analysis for common statistical tests in psychiatry using JASP. Interested readers that would like to perform other types of Bayesian analysis not currently available in JASP, or require greater flexibility with setting prior distributions can use the ‘BayesFactor’ R package [42].

Test | NHST | Bayes |
---|---|---|

Correlation | No significant relationship (p = 0.75) | Null model 4.55 times more favored than the alternative model |

Chi-squared test | No significant difference (p = 0.21) | Null model 1.16 times more favored than the alternative model |

T-test | No significant difference (p = 0.26) | Null model 1.93 times more favored than the alternative model |

ANCOVA | Significance difference (p = 0.03) | Covariate model 1.73 times more favored than oxytocin model |

ANOVA - time effect | No main effect (p = 0.65) | Null model 5.34 times more favored than time model |

ANOVA - condition effect | No main effect (p = 0.27) | Null model 1.54 times more favored than oxytocin model |

ANOVA - time*condition | No interaction effect (p = 0.78) | Main effects model 4.17 times more favored than interaction model |