Background
The relative scale has been used for decades for estimating the effects on binary outcomes, such as calculating that heavy alcohol consumption increases the occurrence of liver cirrhosis by the rate ratio (RR) of over 10 [
1]. It is also standard for survival analysis, using hazard ratios, and for comparisons of incidence rates, using incidence rate ratios. Meta-analyses have shown that the relative scale leads to less heterogeneity in the analysis of binary outcomes compared with the absolute scale (i.e. rate differences), which indicates that the relative scale better captures the biological effects [
2]. In contrast, the relative scale has rarely been used in the meta-analysis of continuous outcomes and it is not available as an option in popular meta-analysis software such as the RevMan program of the Cochrane collaboration [
3]. Instead, meta-analyses of continuous outcomes typically use the absolute scale, i.e., the original measurement units (mean difference, MD), or the standardized mean difference (SMD) scale in which the mean difference is expressed in the pooled standard deviation units. Both of these approaches (MD and SMD) are available as options in popular meta-analysis software [
3].
The selection of scale for continuous outcomes is relevant in the analysis of a single trial and in the meta-analysis of several trials. In a single trial, the scale influences the interpretation of the findings and the communication between researchers, clinicians and patients [
4]. In the case of a meta-analysis, the scale additionally influences the comparability of the trials, namely, the relative scale adjusts for the baseline variability in continuous outcomes in the same sense as the pooled RR adjusts for the baseline variability in risk between different studies in the analysis of binary outcomes. In meta-analyses that pooled diverse research topics of continuous outcomes, heterogeneity was less on the relative scale, than on the absolute scale [
5‐
7]. This suggests that the relative scale may better capture also many biological effects that are measured using continuous outcomes. As one illustration, the relative scale was demonstrated to be more informative in the analysis of disease duration compared with using the MD scale [
8‐
10].
The current study was motivated by the Cochrane review by Bonini et al., which examined the effects of β
2-agonists on exercise-induced bronchoconstriction (EIB) [
11]. The usual limit for classifying that a person has the condition EIB is a ≥ 10% decline in forced expiratory volume in 1 s (FEV
1) in a standardized exercise test [
12]. Based on 72 comparisons from 44 studies, Bonini et al. calculated that β
2-agonists reduced the exercise-induced FEV
1 decline by 17.67 percentage points (pp) (95% CI: 15.84 to 19.51 pp) [
11]. However, one person may suffer from an 11% decline in FEV
1 by exercise and another person may suffer from an 80% decline in FEV
1, yet both of them are similarly classified as cases of EIB. The Cochrane review implies that the expected effect of 17.67 pp. reduction in exercise-induced FEV
1 decline applies for both persons. However, it seems likely that the former person has an effect of β
2-agonist much less, whereas the latter person might have an effect much greater than the overall mean of 17.67 pp. reduction in FEV
1 decline.
The β
2-agonists were invented in the middle of the 1900s and their efficacy against EIB was demonstrated in numerous clinical trials starting from the 1970s [
12‐
16]. Thus, it is not relevant to ask the null hypothesis type of question whether β
2-agonists differ from placebo in their influences on EIB. Instead, the important question is to estimate the average size of the effect and the variation in effect size between individuals.
The goal of this study was to compare the usefulness of the relative and the absolute scales in the estimation of the effects of β
2-agonists on exercise-induced FEV
1 decline. If the relative scale better captures the effects of interventions on FEV
1 changes, then the meta-analyses that have used an absolute scale such as MD for analyzing the effects on FEV
1 changes [
11] may have led to sub-optimal estimates.
Discussion
The goal of this study was to compare whether the absolute or the relative scale yields more consistent estimates of effect, using the example of β2-agonist treatment to prevent FEV1 declines associated with EIB, the severity of which can range widely between patients. The absolute scale is routinely used in the analysis of continuous data and therefore the comparison of these two scales is relevant more widely than just for the analysis of FEV1 changes.
In people with EIB, Bonini et al. calculated that the β
2-agonists decreased exercise-induced FEV
1 decline by 17.67 pp. (95% CI: 15.84 to 19.51 pp) [
11]. If EIB was a homogeneous medical condition, such a uniform effect might be meaningful. Instead, EIB is highly heterogeneous, since it is usually defined by post-exercise FEV
1 decline of 10% or more, though other arbitrary cut-off limits have been used. Thus, in this dichotomization two persons with 11% and 80% FEV
1 declines after exercise are both classified as having EIB, whereas a person with a 9% FEV
1 decline is not. However, the person who has the 11% decline probably is biologically much closer to the person who has the 9% decline compared with the person who has the 80% FEV
1 decline after exercise. It does not seem reasonable to assume that Bonini’s estimate of 17.67 pp. effect would apply for people with a low and a high level of exercise-induced FEV
1 decline. Furthermore, dichotomization of continuous variables decreases statistical power [
38‐
41].
One approach to achieve more personalized effects of β
2-agonists is to categorize people into groups by their untreated exercise-induced FEV
1 decline levels (Table
3). In people who had untreated exercise-induced FEV
1 declines in the range from 10% to 19%, β
2-agonists reduced the FEV
1 decline by 15 pp. (95% CI: 10 to 20 pp), whereas in people who had untreated FEV
1 declines in the range from 30% to 39%, the reduction of the decline was 33 pp. (95% CI: 25 to 41 pp), and in people who had untreated FEV
1 declines of 40% and greater the percentage point improvement was even greater (Table
3). The confidence intervals of the three groups with FEV
1 decrease 30% and greater are all inconsistent with the 17.67 pp. effect calculated by Bonini [
11]. These three groups contain 61% (97 of 159) of the participants in Table
3. This illustrates that Bonini’s estimate of effect does not apply to a great proportion of people classified as having EIB.
The relative scale is most informative in the analysis of the β
2-agonist effects on exercise-induced FEV
1 declines since on the relative scale a single estimate of effect, expressed as a percentage improvement of the baseline exercise-induced FEV
1 decline (rather than a uniform percentage point improvement), applies over all study participants independent of their initial FEV
1 decline levels (Fig.
2, Tables
2 and
3). In our analysis, half of the participants with IPD had observed β
2-agonist effect 5.8 pp. or more distant from the mean 90% effect, which also shows that the relative scale better captured the observed β
2-agonist effect compared with the use of a single uniform 28 percentage point improvement, which had median residual of 10.8 pp.
In our study, the primary comparison of the absolute and the relative scales was based on IPD, since the wide distribution of FEV
1 declines in the IPD analysis results in greater statistical power to compare intercepts and slopes. We also compared the absolute and relative scales on the basis of study-level data of 44 trials, but no superiority of the relative scale was seen in that comparison, indeed absolute scale seemed to be slightly better (Table
4). In addition, no superiority of relative scale over the absolute scale was seen in standard meta-analyses (Fig.
5a and b). These discrepancies between the analyses based on IPD (Fig.
2) and on the study-level data are examples of the “ecological fallacy”. In order to avoid the potential for the ecological fallacy introduced by study-level analyses, whenever feasible, examination of IPD has been recommended [
42‐
44]. Thus, analysis of the study-level data alone (Table
4) or the comparison of standard meta-analyses (Fig.
5a and b) would have led to a false conclusion that the absolute scale is better or at least not worse than the relative scale.
Nevertheless, even though the analyses of the study-level data did not yield valid comparison of the absolute and relative scales, the study-level estimate calculated from 44 trials for the relative effect was quite similar with the estimate from the IPD analysis of 14 trials: 77% vs. 90% improvement in the exercise-induced FEV
1 decline, respectively. This divergence in estimates can be partly explained by the different sets of studies that were compared. The standard study-level meta-analyses of the 14 studies which had IPD available reached relative effect estimates of 83% and 90% reduction in FEV
1 decline, depending on the calculation of the SE (Fig.
5), very similar to the overall IPD mixed-effects regression analysis. This latter comparison was based on the same set of studies.
Most popular statistical software such as the RevMan of the Cochrane Collaboration do not have an option to pool continuous outcomes on the relative scale. However, it is available in the
metacont function of the R package
meta [
33,
45,
46]. Nevertheless, a simple approach to pool results of study-level data on the relative scale when this option is not available in a statistical program is to normalize the results of the studies by dividing the absolute mean effects and their SD values by the placebo group mean outcome value (Table S3). Such a transformation can easily be done with a spreadsheet program and the transformed data can be entered in a standard statistical program for meta-analysis. This approach of calculating the relative effect is illustrated in Fig.
5b. Alternatively, if IPD is available, one can calculate and pool the slopes of linear regression curves for each study, which usually leads to more narrow SE estimates and more accurate pooled estimates as shown in Fig.
5c. However, IPD is rarely available and therefore calculation of the slope is not often feasible. Furthermore, for many cross-over trials that reported the study-level data (Fig.
4), the paired SE was not published and would need to be imputed, but this problem applies to both the absolute and the relative scales.
In meta-analysis of binary outcomes, relative scale analysis using effect measures such as risk ratios or odds ratios leads to asymmetric confidence intervals, because the studies are pooled on the logarithmic scale with symmetric confidence intervals and then transformed back. Similarly, in meta-analysis of continuous outcomes, the findings can be pooled on the logarithmic scale using ratio effect measures, leading to asymmetric CIs [
5]. However, relative scale effects for continuous outcomes can also be derived from slopes (Fig.
2), or by the normalization of the results of the studies by dividing the absolute mean effects and their SD values by the placebo group mean outcome value (Table S3) [
10], both of which lead to symmetric CIs on the relative scale. Therefore, CIs of the continuous outcomes are not necessarily asymmetric.
The distribution of the relative effects at the individual level is skewed (Fig.
3). Therefore, the median relative effect might appear a more useful descriptive estimate than the mean relative effect. Study-level meta-analyses cannot find the median effect nor can they describe the distribution of the individual-level effects such as the interquartile range. Thus, the IPD analysis can give important information additional to the study-level analyses. In our case, the difference between the mean effect of 90% and the median effect of 88% prevention of EIB is minor. Nevertheless, the great variation in the individual-level effects indicates that the efficacy of a particular β
2-agonist in protecting against EIB needs to be assessed at the individual level (Fig.
3).
This study was motivated by Bonini’s meta-analysis on β
2-agonists for exercise-induced FEV
1 declines and their use of the absolute scale in the analysis of study results [
11]. However, the absolute scale, either as percentage point differences or as volume differences (measured in Liters), has been used in the analysis of FEV
1 changes in several other meta-analyses of the Cochrane Library [
47‐
53]. Thus, the superiority of the relative scale is not just an issue relevant to Bonini’s meta-analysis. For example, one of the Cochrane reviews [
53] estimated the effect of vitamin C on EIB on the absolute scale and described the effect of vitamin C five minutes after exercise in the Schachter (1982) trial [
54] as follows: “No significant difference between vitamin C and placebo: Vitamin C mean: –0.24 (SE ± 0.06) L/s, Placebo mean: –0.44 (SE ± 0.14) L/s, t = 2.13 (P = 0.057)” [
53]: Table
2. However, the slope of a linear regression analysis of the Schachter study [
54], which had reported the IPD, indicated that vitamin C’s relative decrease in FEV
1 decline was highly significant: 55% (95% CI: 32 to 78%;
P = 0.0003) [
55]. This difference in
P-values also illustrates that the calculation of the absolute effect, which is the custom in the Cochrane reviews, can lead to false negative conclusions.
Our study did not intend to reproduce Bonini’s main meta-analysis, which was labeled Analysis 1.1 in their paper [
11]. There were several errors and data extraction inconsistencies, some of which were severe, see Additional file
1: Table S4. We used Bonini’s review as an example to demonstrate that the calculation of absolute effects can lead to suboptimal effect estimates. Similar to Bonini’s analysis, we combined different β
2-agonists to calculate one single estimate of effect. We took this approach because our primary goal was to compare two different methods in the analysis of FEV
1 changes rather than estimating the effectiveness of a particular β
2-agonist, or a particular experimental protocol for conducting an exercise test. If one β
2-agonist or protocol is less effective than another, the lower effectiveness would be analyzed in both ways and, thereby, would contribute equally to both the relative and absolute scale analysis. We tried to reduce the heterogeneity of comparisons by selecting salbutamol (or if not tested, salmeterol) when several β
2-agonists were investigated in the same report, the shortest delay between β
2-agonist administration and exercise test when exercise tests were repeated several times after the administration of a β
2-agonist, and pre-drug FEV
1 as baseline when possible. Furthermore, we took into account the variations in β
2-agonists and the conduct of exercise tests used among different trials by using the β
2-agonist and the trial as clustering variables in the analyses.
Friedrich et al. compared the relative and absolute scales for diverse continuous outcomes and showed that, on average, the relative scale led to lower heterogeneity compared with the absolute scale indicating that the former is more informative [
5‐
7]. In addition, previous analyses demonstrated that the analysis of effects on the duration of diseases and comparable outcomes is more informative on the relative scale than on the absolute scale [
8‐
10]. However, there are many different kinds of contexts where continuous outcomes are generated and, therefore, the relative scale is not always applicable. Apparently, one requirement for using the relative scale is that there is a relevant 0% to 100% scale for the measurement. Such requirements are not always satisfied. For example, there are no reasonable 0% target levels for body weight, body temperature or blood pressure. In such cases, the relative scale may not be ideal.
Since in many contexts the relative scale is more informative in the analysis of continuous outcomes, the option to use the relative scale should be made widely available in meta-analysis software so that researchers can compare and decide themselves which scale is most suitable for their particular outcome.