Our cumulative meta-analysis of the comparison of laparoscopic appendectomy and open appendectomy for acute appendicitis demonstrated that the evidence provided by the meta-analysis of surgical RCTs can change over time. Intra-abdominal abscesses were significantly more frequent in the laparoscopic appendectomy group during the period from 2001 to 2009, but this significance disappeared as more trials accumulated. Our present findings visually demonstrated how evidence changes over time in the surgical field. Although other outcome measures did not exhibit the same transition as intra-abdominal abscess, all of the outcome measures demonstrated similar trends in favor of laparoscopic appendectomy.
Fluctuation of evidence
When there is evidence concerning the effectiveness of a medical intervention, one can reasonably conclude that no further research is needed on the topic. However, previous studies have shown that the results of meta-analyses are underused, and many RCTs are conducted even after significant evidence has been demonstrated through a meta-analysis [
2,
17]. Some researchers have contended that it is unethical and a waste of resources to randomize participants in unnecessary trials, and they emphasized the importance of avoiding redundant trials. In the meta-analysis from a Cochrane review published in 2010 [
10], intra-abdominal abscess was significantly more frequent in laparoscopic appendectomy than open appendectomy (albeit with moderate heterogeneity), but the present study illustrates that a significant result turned insignificant. The findings of our study thus provide an example in which large intervention effects are not always conclusive, and they fluctuate over time. Fluctuation was not found in the wound infection outcome data, and the result was consistently in favor of laparoscopic appendectomy. Penninga et al. have shown strong evidence of favoring laparoscopic appendectomy for wound infection using a trial sequential analysis [
18], and our result is consistent with this.
The nature of surgical trials
Evidential instability can be explained by the nature of surgical interventions, which are highly complex and difficult to evaluate [
6]. Surgical interventions involve many factors, including the surgeons’ skill and judgment, the skills of the treating team, the development of surgical devices, and pre- and post-surgical management. All of these factors change on a daily basis. Second, the effect of the learning curve influences outcomes [
19]. For example, surgeons’ performances improve to the point of acquiring expertise as they gain training and experience. The observed shift toward favoring laparoscopic appendectomy, which we observed in later trials, might be explained by the effects of these factors. Because of this phenomenon, surgical trials may differ from pharmaceutical trials, as in the latter, theoretically efficacy does not change over time. To minimize these factors, trial designs that consider the effects of the learning curve or perioperative management should be used [
20,
21].
The shift favoring conservative treatment in the early to middle period
Although we observed a shift favoring laparoscopic appendectomy in later trials in our analysis, an apparent shift in the opposite direction was observed in the early to middle period after good results were obtained for laparoscopic appendectomy in very early trials. Early trials tend to overestimate treatment effects for a variety of reasons, such as the under-reporting of disappointing results or the selection of favorable subgroups [
22‐
24]. However, as new interventions are disseminated and the study participant inclusion criteria are broadened, positive results become less extreme. Relevant examples can be found elsewhere [
25,
26]. We assume that the results favoring open appendectomy in the early-middle period are another example of this phenomenon.
Limitations
This study has several limitations. First, we observed the change from statistically significant to insignificant findings using the 95 % CI of odds ratios. There are other methods to analyze the results of meta-analyses chronologically, such as the trial sequential analysis (TSA) which takes into account random errors due to repetitive meta-analyses [
27,
28]. We performed a TSA for the intra-abdominal abscess outcome, and it did not show significance throughout; i.e., the required information size was not reached and the Z-curve did not cross the trial sequential monitoring boundaries. Therefore, we cannot dismiss the possibility of random errors which brought the statistical significance in the analysis. This strengthens the importance of not relying only on the 95 % CI of the effect size and of performing a TSA to conclude the comparison.
Second, this analysis can be prone to publication bias. It is likely that trials with contradicting results would be published more often than those confirming the existing evidence. Small study effects could also have existed in our analysis. A visual inspection of the funnel plot showed slight asymmetry in small studies (Fig.
6).
Third, we did not conduct an analysis of the operative time or the length of hospital stay due to considerable heterogeneity, and a small study effect could have a substantial impact on the heterogeneity.
Fourth, studies with a high risk of bias could skew the results. We conducted subgroup analyses for trials with low and high risks of bias, and the results showed that the intra-abdominal abscess outcome after laparoscopic appendectomy was considerably more frequent among the studies with a low risk of bias compared to those with a high risk of bias, although a cumulative meta-analysis showed a similar trend toward favoring laparoscopic appendectomy [see the Additional file
1].
Fifth, the rarity of intra-abdominal abscesses may have complicated the analysis. The small number of events can increase the uncertainty. Since there are many trials with zero-events in intra-abdominal abscess, we substituted the correction factor of 0.5 to 0.01 as a sensitivity analysis to test for robustness [
29]. The cumulative meta-analysis showed that statistical significance was first observed in the trial published in 2001 and it disappeared as more results accumulated, with the overall OR of 1.24 (95 % CI 0.84–1.81). The results were similar between both correction factors.
Sixth, the number of studies published each year was unbalanced. We included more trials during the time period from 1996 to 2001, which might have biased the results. Nevertheless, considering that the shift from surgeons favoring open appendectomy to those favoring laparoscopic appendectomy occurred after 2002, new findings might have been more evident if more trials had been published after 2001.
Finally, although we report herein an example demonstrating evidential instability in the surgical field, we cannot generalize this observation to all surgical interventions or other fields. Although the numbers of RCTs addressing other surgical topics are generally low [
30], more evidence should be accumulated for other topics to understand the stability of evidence by means of not only cumulative meta-analyses, but also TSAs.