Statement of principal findings
VoE is a standardized method that can be used in any meta-analysis to systematically evaluate the breadth and divergence of the results, depending on the choices made in the selection of studies, based on various criteria, and the analytical model used. As a case study, we show extensive VoE in an indirect comparison of nalmefene to naltrexone, leading to contradictory results. Although most combinations yielded no evidence of a difference, some meta-analyses showed superiority of nalmefene, whereas others showed superiority of naltrexone. These two compounds have many similarities, and it is unlikely to expect a genuine difference [
19]. When we considered direct comparisons against placebo, we observed less VoE for nalmefene than for naltrexone. Nalmefene is the most recent treatment option and has been the subject of two distinct but somewhat homogeneous development programmes [
90], resulting in several studies with a similar design. In contrast, naltrexone is an older option with a myriad of pre- and post-approval RCTs conducted in very different settings.
Strengths and weaknesses of this study
We recommend that a list of all possible major options should be made first when examining VoE in a meta-analysis. Nevertheless, we acknowledge that even the construction of such a list may itself be subject to unavoidable subjectivity. All the methodological choices we made to assess VoE in our case study corresponded to the criteria we considered to be easily “gameable”. Several may be clinically relevant, such as the exclusion of studies on abstinent patients or those on patients with a somatic or psychiatric comorbidity. Others are related to the literature search, such as the retrieval of unpublished studies. Most meta-analyses have difficulty unearthing unpublished studies, and publication bias [
91] may affect the treatment ranking in indirect meta-analyses [
92]. The relevance of other combinations may be debatable. For example, the use of fixed effect models in case of between-study heterogeneity is not considered statistically valid and would not be considered for publication. However, our sensitivity analysis that excluded meta-analyses that were considered inappropriate to do still found some VoE. It is difficult and subjective to judge the appropriateness of different combinations; the two illustrative examples (Additional file
1: Table S2-S3) demonstrate that contradictory meta-analyses are not necessarily inappropriate, per se. It is also likely that some datasets we combined violated the similarity assumption required for indirect comparisons. Dissimilar study results due to treatment effect modifiers may have led to some of the VoE we observed. In theory, positive and negative results from multiple meta-analyses are not necessarily contradictory if the inclusion criteria are so different that the results would apply to different research questions. However, in practice, the identification of treatment effect modifiers is very challenging [
93], and it is sometimes very difficult and subjective to make a clear judgement on how much different methodological choices really define different research questions. Here, we tried to pre-emptively retain the choices that would not have altered the research question, and we minimized the possibility of making mutually exclusive methodological choices. Moreover, our study was based on a relatively limited number of methodological choices, which may underestimate the whole set of alternative scenarios, and the VoE could have been even greater. For example, we could have studied the VoE depending on whether the indirect comparisons were made using a Bayesian approach or a frequentist approach. Another potential source of VoE was not investigated, namely the choice of the source from which data from a study are extracted (e.g. published articles, study reports, ClinicalTrials.gov). It is possible that in other contexts, such as meta-analyses exploring drug safety, non-randomized studies may be included and add even more VoE, especially due to possible bias (e.g. indication bias) in the primary studies.
Indirect comparison meta-analyses may yield less VoE in other, less controversial fields, in which the results of studies are more homogeneous. Conversely, VoE may be more prominent in complex, heterogeneous meta-analyses, such as that of large networks with prominent inconsistency [
94]. For example, it has been shown in network meta-analyses that even the consideration of which nodes are eligible and/or whether a placebo should be considered can already yield very different results [
95]. In addition, some of the contradictory meta-analyses that are generated in the VoE exercise may not pass peer review, receive harsh criticism for their choices, or even be retracted after publication, as for a meta-analysis of acupuncture [
96]. Therefore, we recommend that the choice of factors to consider in the VoE analyses should be realistic.
Perspectives
VoE has already been described in the field of observational epidemiology [
16,
97], but has been less explored in meta-analyses. Nevertheless, a previous study showed that it is possible to manipulate the effect sizes based on the discrepancies among multiple data sources (papers, clinical study reports, individual patient data) [
98]. In this study, the overall result of the meta-analyses performed to assess the ES of gabapentin for the treatment of pain intensity switched from effective to ineffective and the overall result for the treatment of depression with quetiapine from medium to small, depending on the data source. In our study, cherry-picking results from each included RCT may have introduced VoE without changing either the trial inclusion criteria or the methods of meta-analysis.
There is a large body of literature on discordant meta-analyses. Indeed, the first widely known meta-analyses in medicine were probably those performed by opposing teams in the 1970s that found opposite results on the risk of gastrointestinal bleeding from steroids. Over the years, debate has often arisen within specific topics in which two or more meta-analyses on seemingly the same question reached different conclusions [
6‐
13]. Discussion of the main reasons put forth for the discrepancy for each case, with careful clinical reasoning, is likely to continue being useful. VoE analysis offers a complementary systematic approach to evaluate the potential for a discrepancy in any meta-analysis, including large-scale meta-analyses. VoE offers a more generalized view of sensitivity analyses. Typically, some level of sensitivity analysis is commonly performed. For example, many/most meta-analyses may present side-by-side fixed and random effects models, as performed in our case study. Additional methodological choices may generate more sensitivity analyses related to the choice of outcome measure, handling of missing data, correction for potential bias, interdependence, etc. [
99‐
102]. Sensitivity analyses based on clinical characteristics are also common, but usually, only a few such analyses are reported, if at all.
Although the evaluation of VoE is generally systematic and involves more extensive analysis than the sporadic sensitivity analyses typically performed in past meta-analyses, it still requires a priori determination of the factors considered to be most relevant for making choices in the conduct of a meta-analysis. In this respect, it is not as clinically agnostic as the all-subset method, in which all possible meta-analyses of all possible subsets of studies are explored for a given set of studies to be meta-analysed [
103]. This method runs into computational difficulties with large meta-analyses. For example, application of the all-subset method for VoE in the current case study, with 51 and 9 trials, would result in 2
51 and 2
9 possible subsets, respectively, with 2
51 + 9 = 2
60 different indirect meta-analyses, i.e. 1,152,921,504,606,846,976 different indirect comparison meta-analyses to be performed, a number that is computationally absurd to explore and not clinically relevant.
Systematic reviews and meta-analyses (including indirect comparisons and more complex networks) are often considered to offer the highest level of evidence [
104]. These studies have become so influential that they can shape guidelines and change clinical practice. However, their use has reached epidemic proportions, and published meta-analyses [
104] and network meta-analyses [
5] are subject to extensive overlap and potential redundancy. It has been argued that these studies could be used as key marketing tools when there are strong conflicts concerning which results are preferable to highlight. For example, numerous meta-analyses of antidepressants authored by or linked to the industry have been described previously [
105]. Industry-linked studies almost never report any caveats about antidepressants in their abstracts. Conversely, it is a common practice in the industry to commission network meta-analyses to professional contracting companies, and thus, most are not registered a priori or published. A veto from industry is the most commonly stated reason for not having a publication plan for network meta-analyses [
106]. VoE is a method that could be used to highlight selective reporting and controversial results.
The extent of redundant meta-analyses and wasted efforts may be reduced with protocol pre-registration, for example, with the PROSPERO database [
107]. Nevertheless, registration does not provide the same guarantees for meta-analyses as for RCTs. For RCTs, registration is prospectively performed, before enrolment of the first patient. Conversely, meta-analyses are almost always retrospective (i.e. planned after the individual studies are completed), and even registration cannot prevent “cherry picking” of some methodological choices based on preliminary analyses of the existing data. The development of prospective meta-analyses could avoid these pitfalls, as in such meta-analyses, studies are identified, evaluated, and determined to be eligible before the results of any of the studies become known [
27].
Even if meta-analysis protocols are thoroughly and thoughtfully designed, a number of analytical and eligibility choices still need to be made and many may be subjective. An applicable safeguard could be the a priori reviewing of protocols by independent experts and comities that might prevent meta-analyses of being gameable. In addition, VoE allows a systematic exploration of the influence of analytical and eligibility choices on the treatment effect. It appears to be a tool that is worth developing in different contexts, such as head-to-head, network, and individual patient data meta-analyses. Systematically exploring VoE in a large set of meta-analyses may provide a better sense of its relevance.