Background
The ubiquity of participant losses (also known as missing participant outcome data, MOD) in systematic reviews in healthcare is well-acknowledged in the literature [
1‐
3]. The inclusion of studies with MOD in systematic reviews further complicates their quantitative synthesis [
4,
5]. As the term indicates, MOD refer to unavailable information about the outcome of participants due to several reasons [
6]. Addressing MOD, therefore, rests entirely on untestable assumptions on the possible outcome of the missing participants [
4,
7]. Kahale et al. [
8] reported that imposing clinically implausible assumptions on the outcome of missing participants led to great variation in the summary effect estimates and contradictory conclusions, whilst clinically plausible assumptions mitigated the variability in the summary effect estimates.
Handling MOD in systematic reviews requires an attentive plan to ensure credible results. The Cochrane Handbook promotes sensitivity analyses as necessary means to safeguard against spurious inferences [
9]. The authors of systematic reviews are advised to explore how sensitive the results are to different yet reasonable assumptions about MOD in the compared interventions [
9]. However, recent evidence on the planning and conduct of sensitivity analysis related to MOD in systematic reviews is underwhelming. Spineli et al. [
1] reported that two in five reviews that made their protocol available provided a plan to address MOD in the analysis. Eventually, only 6% of the reviews with MOD in the included studies performed a sensitivity analysis [
1]. According to Kahale et al. [
2], only 9% of the reviews reported having performed sensitivity analyses related to MOD, with approximately half of them reporting the actual sensitivity analysis results.
We recently proposed a novel framework in the context of sensitivity analyses to objectively infer the robustness of the primary analysis results to different plausible assumptions about the MOD mechanisms [
10]. This framework introduces the robustness index (RI) to quantify the similarity of the summary effect estimates from a series of sensitivity analyses to the primary analysis. When the RI does not exceed a pre-specified threshold (a minimally allowed deviation between the primary analysis results and alternative re-analyses), we can deem the primary analysis results robust to a possible risk of bias associated with MOD. Contrary to current sensitivity analysis standards, the RI incorporates a formal definition of ‘similar’ results and does not unduly rely on the statistical significance of the summary effect estimates.
We aim to demonstrate the ease of applying the RI using a collection of published systematic reviews with two or more interventions across several healthcare fields. By calculating the RI, we uncover the prevalence of primary analyses with frail conclusions that translates to a high risk of biased results due to MOD. We also investigate the agreement between RI and the current sensitivity analysis standards, which rely on statistical significance. With this empirical study, we aspire to initiate a paradigm shift in the analysis of aggregate MOD where sensitivity analysis and objective judgement of robustness become state of the art in systematic reviews.
Discussion
The primary analysis results can be sensitive to different assumptions about the missingness mechanisms in the compared interventions of the synthesised studies. The ratio of studies with low to a substantial amount of MOD can also implicate the robustness of the primary analysis results. Using the proposed RI showed almost double the number of frail conclusions compared with relying on the statistical significance of the summary effect estimate in the re-analyses. Comparing the RI with the current sensitivity analysis standards revealed that two in five analyses yielded contradictory conclusions regarding the robustness of the primary analysis results. Reliance on the statistical significance resulted in frail conclusions in analyses with a materially unaffected posterior distribution of the summary effect estimate that included the threshold for null effect in the primary and subsequent analyses. Based on the significance level of 5%, the statistical significance of these analyses changed when making more stringent assumptions.
Furthermore, the current sensitivity analysis standards yielded robust conclusions in analyses where the posterior distribution varied substantially under stringent assumptions. The statistical significance (at a 5% level) was maintained in all re-analyses of these PMAs/NMAs. The RI naturally accounted for the deviations in the location and dispersion of the posterior distribution in the re-analyses; therefore, it demonstrated the sensitivity of the primary analysis results to different assumptions.
This is the first empirical study to investigate the sensitivity of the summary effect estimates of PMAs and NMAs to different assumptions about MOD. We considered a wide range of clinically plausible assumptions about the missingness mechanisms in the compared interventions. Therefore, we were able to thoroughly investigate the sensitivity of the results to a varying degree of stringent assumptions. However, these assumptions were not tailored to the interventions and conditions under investigation. Ideally, expert opinion should be sought to determine the assumptions for the sensitivity analysis at the protocol stage of the analysis.
Furthermore, we used an objective framework to develop the robustness thresholds. These thresholds reflected the minimally allowed deviation in a general healthcare setting. Preferably, clinically specific robustness thresholds should be considered in addition to our proposed threshold.
This is also the first empirical study on systematic reviews to rely on objective criteria other than statistical significance to determine the presence or lack of robustness of the primary analysis results. Kahale et al. [
8] is the most recent empirical study on the impact of MOD on the summary effect estimates from PMAs. The authors reported that only a quarter of 100 PMAs failed to demonstrate robustness based on statistical significance. Our study revealed that mere reliance on statistical significance was sensitive to the selected significance level. It, hence, declared conclusions as robust or frail in cases where the posterior distribution of the summary effect estimate differed or was materially unchanged to the different re-analyses, respectively. By employing the RI in the database of Kahale et al. [
8], one may expect a higher percentage of PMAs with frail conclusions due to the substantial percentage of participants with definite or potential MOD in these PMAs (median 11.7% and interquartile range 5.6 to 23.7%).
The present study focused on the impact of two factors on the sensitivity of the primary analysis results: (1) the amount of MOD in the collated studies and (2) the different assumptions about the missingness mechanisms in the compared interventions. Potential unobserved confounding (stemming from analysing aggregate outcome data), the size and the number of the studies, and the distribution of the outcome across the studies, also constitute important factors that may affect the summary effect size, and by extent, the conclusions from a sensitivity analysis. Variability in the sample size and the distribution of the outcome should be expected and properly accounted for. In the present study, we preferred modelling the exact distribution of the binary outcome data (one-stage approach) rather than approximating the normal distribution (two-stage approach)—the latter being difficult to defend when the included studies are small, and the investigated outcome is rare [
45]. Following Dias et al. [
15], we have assumed approximately normally distributed sample means for the continuous outcome by convention, which may have implications for the summary SMD when the studies are small [
45].
Despite the cautionary tales on the misuse of statistical significance in interpreting the study results, dichotomising the results based on a 5% significance level remains the status quo in the published literature. This study showed the merits of objectively developed decision criteria, contrary to reliance on statistical significance in isolation, to interpret the sensitivity analysis results. Therefore, we aspire for this framework to be integrated into the GRADE guidance for assessing the risk of bias due to MOD, which, coupled with plausible clinical assumptions, may uncover the comparisons and outcomes with frail conclusions [
46]. In addition, the relevance and utility of our sensitivity analysis framework extend beyond the analysis of MOD. For instance, the sensitivity of the results to different prior distributions for the between-study heterogeneity parameter, different effect measures, or excluding outlying studies can be easily inferred with our proposed framework. Finally, it can be applied straightforwardly regardless of the analysis framework (frequentist or Bayesian).
An index that evaluates the consistency assumption would further help the analyst infer the degree of inconsistency in the network and whether the NMA results are valid. There are currently no recommendations to interpret the estimated inconsistency parameter as an indication of low or considerable inconsistency. Therefore, the analysts unduly rely on the statistical significance of the inconsistency parameter to infer the presence or lack of consistency.
Clinically relevant robustness thresholds would allow for contextualised conclusions regarding the robustness of the primary analysis results. For instance, deciding what constitutes a minimum clinically important difference (MCID) in the
sensitivity analysis context could be used as the robustness threshold. Then, an RI below this threshold would signify robust primary analysis results. Preferably, the elicited threshold would be based on several experts with different experiences on the subject under investigation [
47]. Then, the average of MCIDs across the experts weighted by their experience in years would comprise the robustness threshold.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.