Background
Research suggests that a majority of randomized clinical trials (RCTs) on medical interventions may not be justified based on established evidence, but contain unjustified research. Justified clinical trials may be defined as trials designed around a clear hypothesis around which uncertainty exists and that uncertainty should be as established through systematic reviews or network meta-analyses (NMA) based on existing evidence [
1]. This is of relevance because estimated costs of each piece of evidence in a series of RCTs increases across decades [
2,
3]. Optimizing the number of clinical trials to scientifically justifiable amounts is therefore recommended to save resources, reduce exposure of patients to less effective treatments, and allow for earlier uptake of treatment recommendations in practice [
1].
Conditional power of NMA has been introduced as a concept to optimize trial designs thereby contributing to the reduction of unjustified research [
4‐
6]. Conditional power is the probability that updating existing inconclusive evidence in NMA with additional trial(s) will result in conclusive evidence, given assumptions regarding trial design, anticipated effect sizes, or event probabilities [
7,
8]. A key issue when designing a RCT is to determine how large the sample size needs to be in order to achieve a desirable level of power given a predefined significance level
α [
7]. Further, some interventions may not achieve high levels of power when considered within a single trial in isolation. In such situations, two or more RCTs in combination may be appropriate to form a cumulative synthesis of findings from RCTs addressing the same question [
5,
6]. This situation may also arise if a direct treatment comparison of interest includes treatments that are known to be poorly tolerated in patients (e.g., due to known adverse events); therefore, adding indirect evidence including only better tolerable treatments in future trials may be more appropriate for the evidence to become conclusive. If conditional power analysis suggests for example at least 80% conditional power, which conventionally implies that trial(s) investigating a true effect will correctly reject the null hypothesis [
9], together with a reasonable required sample size, further research may be promising. Otherwise, if such an analysis suggests for example less than 20% conditional power, which conventionally may be regarded as futility boundary with values below indicating that a trial is likely to be futile under the null hypothesis [
10], then it may be recommended to refrain from further RCTs on a given intervention to save resources.
The present work aimed to estimate conditional power for NMA on antidepressant treatments. The analysis was based on a published network known as the GRISELDA dataset [
11], contributing 502 RCTs for the acute treatment of adult major depressive disorder (MDD) conducted between 1979-2018 [
12]. Together the network compares 21 antidepressants, considering outcomes such as efficacy in terms of the symptom change on the Hamilton Depression Scale (HAMD) [
13] and tolerability in terms of dropout rate due to adverse events (
Supplement 1 Fig. S1).
At the time of writing (as of October 2020), four ongoing RCTs can be found on
clinicaltrials.gov that cover one or more of the afore-mentioned antidepressants and fit the inclusion criteria of the present data set (NCT04364997, intervention: bupropion (BUP), escitalopram (ESC), mirtazapine (MIR), sertraline (SER), venlafaxine (VEN), planned sample size N = 400, estimated start and completion dates Jun-18 to Dec-22, Beijing Anding Hospital, China [
14]; NCT03538691, intervention: citalopram (CIT), duloxetine (DUL), escitalopram (ESC), fluoxetine (FLO), paroxetine (PAR), sertraline (SER), venlafaxine (VEN) versus placebo (PLA), planned sample size N = 1450, estimated start and completion dates Jul-18 to Sep-22, Otsuka Pharmaceutical Development & Commercialization, Inc. [
15]; NCT04345471, intervention: desvenlafaxine (DES) versus placebo (PLA), planned sample size N = 594, estimated start and completion dates May-20 to Dec-22, Mochida Investigational sites, Japan [
16]; NCT04422652, intervention: desvenlafaxine (DES) versus vortiozetine (VOR), planned sample size N = 600, estimated start and completion dates Aug-20 to Apr-26, H. Lundbeck A/S [
17]).
For example, one of the most recent antidepressants is vortioxetine (VOR) approved in 2013 by the US Food and Drug Administration (FDA). The existing evidence on VOR comprises 17 RCTs (16 placebo-controlled RCTs, 1 head-to-head RCT) completed between 2007 - 2017 and published between 2012 - 2018 [
18‐
34]. Based on this current evidence, VOR has been shown to be more effective (standardized mean difference (SMD) -0.29 [95%CI -0.38 - -0.20]), but less tolerable (odds ratio (OR) 1.48 [95%CI 1.15 - 1.89]) compared to placebo, with the evidence becoming conclusive in 2009 (efficacy) and 2011 (tolerability), respectively. An ongoing phase IV, double-bind RCT (NCT04448431 [
35]) started in August 2020 with estimated completion date in April 2026. This RCT aims to compare the efficacy of VOR versus desvenlafaxine (DES) in 600 MDD patients that have tried one available treatment without getting the full benefit, with the primary outcome being the change in the Montgomery and Åsberg Depression Rating Scale (MADRS) from baseline to week 8. Based on current evidence, the comparison DES:VOR is inconclusive in terms of efficacy (SMD -0.06 [95%CI -0.19 - 0.08]) and tolerability (OR 0.80 [95%CI 0.54 - 1.18]); suggesting a slight yet inconclusive advantage for VOR compared to DES with respect to both outcomes. To estimate whether the advantage for VOR may turn into conclusive evidence, conditional power analysis may support the decision whether the ongoing research on that comparison is promising or otherwise futile. This example shows how the present work may inform decision-makers and researchers regarding the expected clinical relevance of ongoing and future antidepressant RCTs that aim to challenge antidepressant treatment recommendations.
Discussion
The recent NMA by Cipriani et al. [
12] provided evidence regarding the ongoing debate on the effectiveness of antidepressant treatment. Today, two years after the publication of the NMA, the question aires whether additional RCTs updating the evidence would pay off. Current ongoing RCTs [
14‐
17] may contribute to answer the question, but final results may only be expected after estimated completion of the RCTs (completion dates 2022 - 2026). It may therefore be of clinical interest to estimate the probability whether the current research may lead to updates in treatment recommendations or whether it may be considered unjustified.
Overall, the present findings value the probability of achieving new conclusive evidence in antidepressant treatment recommendations that goes beyond current evidence to be low. Though, sufficient conditional power may be obtained for a majority of evaluated treatment comparisons (Fig.
4), there are substantial limitations in terms of both required sample sizes and expected effect sizes.
Considering median sample sizes in the in the four ongoing RCTs (range N = 400 - 1450) [
14‐
17], required sample sizes obtained by the present analysis to achieve conventionally recommended power of at least 80% [
9] were estimated to be more than double (tolerability) or even three times (efficacy) the size and may not even exceed the estimated futility boundaries (Table
1). Though, sample sizes may be reduced using optimized trial designs including additional indirect evidence, the associated research costs when conducting multiple trials may not pay off.
It should be noted that the present work is limited in the evaluation of optimal trial designs evaluating the relation between direct and indirect evidence. Nikolakopoulou et al. [
54] demonstrated how decisions in future trials may be supported by conditional power analyses considering not only ’different ratios of the number of trials’ contributing direct versus indirect evidence, as done in the current work, but also by considering ’different ratios of the sample size between trials’ assessing direct versus indirect information. An extensive analysis assessing these ratios is feasible in small networks or may be applied to selected treatment comparisons of interest based on a priori hypotheses. The large treatment space in the present network, however, did not allow for such extensive sensitivity analyses due to practical reasons considering both processing time and exponential result dimension. Future research should therefore consider the present findings as an approximation for a more detailed breakdown of the evidence.
Compared to the impact of trial designs on reducing sample sizes, the impact of varying effect sizes or event probabilities may be assumed of less practical importance; this is because trial designs can be experimentally modified, whereas effect sizes and event probabilities are inherently limited by the existing evidence of the various treatments. In particular, considering the well-known overall small effect sizes for efficacy in antidepressants in the conclusive treatment comparisons (i.e., drug-placebo differences with a median d = 0.3 in terms of Cohen’s d [
57]) and the even smaller effect sizes in so far inconclusive relative treatment comparisons (median d <0.1 in terms of Cohen’s d [
57]) (
Supplement 1,
Tab. S2), the clinical relevance of additional trials aiming to challenge current antidepressant treatment recommendations may be low. In other words, it may be questioned whether any additional RTCs on antidepressant treatment can challenge the current treatment recommendations.
Referring to the example in the introduction, the present results may be applied to judge the conditional power of the ongoing RCT (NCT04448431 [
35]) aiming to compare the efficacy of VOR versus DES. Though, current evidence may assume a trend towards the advantage of VOR compared to DES in terms of both efficacy and tolerability
Supplement 1,
Fig. S1), the probability of achieving conclusive evidence at reasonable sample sizes is low. The present analysis suggested required sample sizes to achieve at least 80% conditional power (
NCP=80%) of N = 1670 and N = 733 in terms of efficacy and tolerability, respectively (Fig.
3). These estimated sample sizes are considerably larger than the planned sample size of N = 600 [
35]. Indeed, the planned sample size of N = 600 corresponds to approximately 56% (efficacy) and 74% (tolerability) (
Supplement 2), and may thus be considered too low to reach new conclusive evidence in an updated NMA.
The above-mentioned example demonstrates the importance of a priori conditional power analyses, if it is the aim of a RCT to challenge current treatment recommendations. Based on the information available in the ongoing RCTs, it is unclear whether a priori conditional power analysis has been performed. The results expected after the completion of the ongoing RCTs will show whether a priori conditional power analysis could have contributed to improved trial designs, and thus would have saved resources in terms of clinical trial costs.
It should however be made clear that the ongoing RCTs may focus on primary aims other than challenging current antidepressant treatment recommendations. In other words, and they may have not been indented to be conditionally powered for possible future updating of NMAs, but may indeed be sufficiently powered as stand-alone trials. As discussed by Salanti and Nikolakopoulou [
58], when NMA is deemed inconclusive and future trials should be planned, specific recommendations about what sort of trials should be planned are required. Trials can be planned to reduce risk of bias in particular comparisons, to explain heterogeneity, or to inform outcomes for which evidence is imprecise. When the aim is to included the planned trial in an updated NMA later on, trials may not be considered as stand-alone trials but may be seen as sequential additions to the existing evidence. The power and findings of individual trials are thus not of interest; rather, the conditional power of the NMA when the new trial is added and the resulting summary effect are of importance. Consequently, when NMA is deemed inconclusive because of imprecision, sample size calculations should be based on the conditional power of an updated NMA.
With this in mind, the present work should not be misunderstood or lead to possible miss-use of conditional power analyses. Weber et al. [
59] raised that fundamental question regarding the use of conditional power analyses by asking whether “it is appropriate to gain power for an updated NMA by in- or decreasing the number of planned future trials while manipulating the power of each of the individual planned future trials?” The authors argued that traditional methods of power analysis are still favorable due to the fact that drug licensing is based on stand-alone RCT. Regardless of planning one or multiple trials, trials planned using conditional power may require different sample sizes (smaller or larger) than those planned using traditional power analysis aimed to achieve stand-alone conclusiveness. In other words, “individual RCTs should always be designed to satisfy their objectives and stand-alone studies (should not be) substituted by a meta-analysis of trials of inadequate size” [
60].
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.