Strengths and limitations of the findings
High levels of statistical heterogeneity associated with the meta-analysis dominate the results making their interpretation challenging. The addition of concomitant medications may explain some of this heterogeneity across intervention subgroups, with as the largest effect sizes found in trials where hypertonic saline was given alone. Studies were conducted across a number of different healthcare settings with diverse local services, usual care, guidelines (e.g. definition of “fit to discharge”) and disease severity at entry, all of which must contribute to the extensive intra-trial variation observed. That many of the largest trials contributed such little weight to the analysis undermines our confidence in the results. Furthermore, the absence of re-admission rate in the majority of trials may suggest the intervention is not as economically beneficial as the results suggest.
A key strength is the inclusion of 15 of the 18 trials in the meta-analysis of the primary outcome. This facilitated both investigation of publication bias and allowed for subgroup analyses and aggregate data meta-regression, although there was incomplete data in relation to secondary outcomes and trial design features.
One can argue the potential of publication bias based on the uncharacteristic funnel plot shape. Systematic error is not thought to always be caused by application of language restrictions to meta-analyses despite the potential reduction of the precision of pooled estimates [
77]. Nonetheless, restriction of trials to only English articles may have altered the precision, effect size, heterogeneity and overall risk of bias.
Summary of heterogeneity
In undertaking this systematic review it has become apparent that there are a number of semantic, methodological and cultural differences across the studies, all of which impacts on the results obtained and the generalisability of an individual trial’s findings. We propose some of these factors below, and offer an explanation for how these may impact on the interpretation of the review’s findings.
i)
The definition of ‘acute bronchiolitis’ differs between countries, and indeed across clinicians in the same institution. Inevitably this diversity was reflected in the description of infants included which variously specified wheeze and or crackles (
n = 4) [
20,
68,
70,
73]; a first episode of wheezing (
n = 5) [
62‐
66]; “bronchiolitis” (
n = 4) [
19,
67,
71,
72]; or bronchiolitis with a temperature >38C (
n = 1) [
18]; while information was absent in four others [
25‐
27,
69]. The term “wheeze” is itself open to interpretation (and sometimes misinterpretation) within the medical profession [
78‐
82], and may be taken to include children presenting with their first exacerbation of asthma, and manifesting as bronchospasm. The occurrence of this is less common among younger patients, and as a consequence we may have expected the effect size to vary according to the mean age of the study population. Nevertheless, our meta-regression to investigate this was equivocal.
A more immediate explanation is that the impact of HS varied with severity. All patients included in this study met the definition of acute bronchiolitis as used in the UK, Australia and parts of Europe which in summary involves apparent viral infection, signs of lower respiratory tract disease with airflow obstruction manifest by increased work of breathing, hyperinflation of the chest and widespread crackles, with or without intermittent wheeze. Clearly there are considerable differences in setting and in the types of patients included in different studies.
ii)
Variation among discharge criteria
The consistency of the outcomes—specifically ‘length of stay’ and ‘fit for discharge’—is self-evidently defined and assessed in very different ways across the studies. Moustgaard et al. suggest that definition of outcomes in trials is a widespread problem[
83]. The studies set (sometimes arbitrary) criteria regarding when the patient stay started, including “from study entry, which was within 12 h of admission” (
n = 2) [
20,
62]; from hospital admission (
n = 3) [
65,
68,
73]; or from first dose of study medication (
n = 2)[
70,
71]; information was absent for the remaining 11 studies [
18,
19,
25‐
27,
63,
64,
66,
67,
69,
72]. The reported time to entry into study varied from 3 to 24 h, and generally did not specify whether “entry” corresponded to consent or first treatment. The latter criterion in particular represents a huge proportion of an admission in units with mean stays of 72 h or less. Similarly, discharge was defined and assessed differently across studies. In one study the discharge assessment used a continuous discharge criteria [
73],but in at least five others the decision to discharge was made only once a day [
18,
19,
63,
65,
66], meaning the time of discharge is effectively a discrete outcome which occurs at intervals of 24 h. Although this inevitably overestimates the real time taken to be fit for discharge, it does so equally for both groups and would be expected to underestimate rather than overestimate the difference between the groups. With this in mind we have no explanation for why the positive studies are based on a once daily clinical assessment.
In the remaining studies the frequency of assessment for discharge was unclear. We present a summary of the discharge criteria in Additional file
10.
The criteria for discharge range from saturating 92 % or greater in air & oral feeding >75 % of usual intake [
73] to no respiratory signs or symptoms for the previous 12 h [
63,
64]. As may be expected, stricter criteria leads to longer LoS. The criteria that patients should be free of any signs or symptoms is curious as it has been well documented that the symptoms associated with acute bronchiolitis persists for many days or even weeks [
73,
84]. Behrendt et al. previously noted a marked variation in length of stay of patients admitted with RSV bronchiolitis with very short admissions [median approx. 72 h] in USA, UK and Northern Europe as compared with significantly longer admissions in Germany and Southern Europe [
85] a finding corroborated by more recent trials that have been included herein. These longer admissions were associated with increased co-morbidities such as diarrhoea which may be as a result of nosocomial infection resulting from longer admission times. This cultural difference is again noted with none of the Italian subjects in the study of Giudice being discharged before 72 h, a period beyond the mean ‘length of stay’ in the Dutch, UK and USA study [
70,
73,
85]. Finally, the subjects in the Luo studies with mild to moderate [
63] bronchiolitis remained in hospital longer than those with more severe disease [
64], a finding which is somewhat difficult to explain.
iii)
Publication, generalisability and other biases
This difference in practice may also, in large part, explain the differences in observed treatment effects in the large UK, Dutch and USA studies which found no benefit as compared with the apparently large effect observed in other studies [
70,
71,
73]. While early indications of a potential benefit may have been attributable to publication bias [
86,
87] the positive effects of later large studies may be attributable to study design and cultural effects. It is of note that all the recent studies of hypertonic saline have failed to demonstrate any benefit yet the ‘meta-analysis still appears to favour the treatment. This effect is largely driven by the relatively large studies of Luo et al. and it is likely that this is explicable when considering discharge criteria in more detail (see above).
In summary therefore, there remains considerable heterogeneity which are not germane to being captured and quantified by standard meta-regression tools. Clearly, a large amount of the heterogeneity is driven by two trials from the same team, led by Luo [
63,
64], with outlying results, relatively small sample sizes but narrow confidence intervals (around a day, compared with a day-and-a-half in SABRE [
73] and the other large northern European study—Teunissen 2014 [
70]). The removal of these two studies from the main analysis considerably reduces the effect sizes and statistical significance in the analyses to a more modest (and minimal) impact. Nevertheless, this does not eliminate heterogeneity completely.
Finally, there choice whether to favour a fixed- or random-effects analysis remains open to debate, with strong and apparently compelling proponents on both sides [
32,
88‐
91]. The presence of unexplained heterogeneity goes against the assumption of a single underlying (fixed) effect, and this is commonly taken to justify the random effect model. When the heterogeneity is excessive however, the random effect model has the unfortunate operational characteristic of allocating similar weights to all trials, irrespective of their size and precision. Our decision to pre-specify a fixed-effect as the primary analysis was taken to counter this limitation. That said, we are unable to offer a clinically sensible reason why the largest trial should be allocated only 4 % of the weight in this analysis. Given this, together with the large and unexplained heterogeneity in general, our recommendation is that no single overall summary measure—fixed, random or otherwise—is an adequate reflection of the identified trials. Although we investigated response in relation to dose (3, 5 or 6 %), the studies did not provide data on frequency or duration of HS, which may also have varied across studies.
Strengths and limitations compared to other reviews
Building on the review conducted by Zhang and colleagues which contained 11 RCTs (
n = 1090), our review included 15 trials (
n = 1922) which included three much larger trials which unanimously showed null results [
65,
70,
73]. We limited our inclusion criteria to trials of inpatient infants, whereas Zhang et al. also included outpatient and emergency department trials. Al-Ansari has been included in our review despite being included in the emergency department group by Zhang and colleagues, as the length of stay infers that the patients were admitted [
68]. Despite this, our meta-analysis included a further 8 trials [
25,
65‐
67,
70‐
73] which altogether unearthed significantly higher levels of heterogeneity than that stated in the previous Cochrane review. A potential explanation is that we applied no restrictions in terms of dose or way the intervention was administered, and in addition we included data from one unpublished study: Zhang et al. made no statement in regards to these.
Duplication is not without merit—it enables the replicability of methods to be demonstrated, as well as adding weight to or disputing the current evidence base [
92‐
94]. Even when faced with identical data, approaches taken and interpretations made can differ between researchers [
95]. A well-defined rationale for any such duplicate review, as required by the PRISMA checklist (though not explicitly) [
94], provides transparency regarding overlaps and subsequently, allows for informed debate about its value to the evidence base [
95].