Introduction
Network meta-analysis (NMA) is a widely used tool in health technology assessment (HTA) for the synthesis of direct and indirect evidence aiming to provide an overview of treatment effects [
1]. Traditionally, NMAs have been carried out using data from randomised controlled trials (RCTs) as these have been considered the “gold-standard” for assessing effectiveness of interventions due to the randomisation techniques used and the strict criteria for inclusion/exclusion of individuals [
2‐
4]. However, recently there has been an increased number of non-randomised observational and real-world studies conducted especially utilising large electronic health care databases. This has in turn highlighted an interest in including data from such studies in evidence synthesis, such as NMA, due to the epidemiological benefits they could provide [
5]. However, such non-randomised data are considered to be inherently biased due to the lack of randomisation of individuals included and unmeasured confounding factors [
1,
5]. If not accounted for, biased estimates from observational studies could in turn lead to biased estimates from the NMA, resulting in inappropriate conclusions drawn. Therefore, there is a growing need for methodological development and evaluation of methods for appropriate inclusion of non-randomised data in NMAs of RCTs and guidelines on such synthesis of data from randomised and non-randomised studies also begin to emerge [
6].
Inclusion of non-randomised studies in evidence synthesis of RCT data has been considered for a number of reasons, typically either to allow for extension of evidence base when RCT data are sparse, looking to either improve the precision of the results or to bridge disconnected networks of RCT evidence, or to generalise the results to a broader population. A number of methods have been suggested to allow for inclusion of non-randomised data in NMAs of RCTs [
5,
7‐
12]. Schmitz et al. [
9] developed and compared a number of approaches, including naïve pooling, use of informative prior distributions and hierarchical models, by applying them to data in rheumatoid arthritis [
9]. Schmitz et al. found that inclusion of observational evidence in NMA increased uncertainty of the pooled effectiveness estimates. Jenkins et al., who applied naïve pooling, a hierarchical model and power prior analysis to data in relapsing remitting multiple sclerosis, also obtained results with increased uncertainty compared to the analysis of RCT data alone, due to the increased between-study heterogeneity when incorporating data from non-randomised studies.
Bias, inherent in the observational data due to the lack of randomisation, has received a lot of consideration in the literature of methods for the analysis of individual participant data from observational studies [
13]. The issue of bias in the meta-analysis of aggregate level data, including non-randomised comparative studies, has also been investigated, but not explored extensively in the context of real world evidence. Begg and Pilote proposed a model for adjusting for bias when including non-randomised evidence in meta-analysis; however, non-randomised data considered in this method were limited to single-arm studies [
14]. In the context of NMA, a bias adjustment model for meta-analysis of comparative data has been introduced by Dias et al. [
15]; in this case considering the risk of bias within RCTs. Schmitz et al. included bias adjustment in their hierarchical model, adjusting for overestimation (or underestimation) in the observational studies using an additive random bias term applied to the mean, at the basic parameter level in NMA, or for over precision using a multiplicative factor applied to the variance [
9]. Efthimiou et al. propose a design-adjusted evidence synthesis method which combines data from randomized and non-randomised studies after adjusting the treatment effect estimates form the non-randomised evidence [
8]. The two above methods, by Schmitz et al. and Efthimiou et al., assume that only data from non-randomised sources are biased. Verde proposed a Bayesian mixture model for pairwise meta-analysis, allowing for the true treatment effects in the meta-analysis to be a mixture of biased and unbiased effects [
10].
Whilst in this paper we did not intend to carry out a full review of the literature on combining RCT and non-RCT data, the aim of this study was to evaluate and extend a number of methods for inclusion of non-randomised data in a NMA of RCTs. The existing methods that we focussed on included naïve pooling, hierarchical models and bias adjustment models, discussed by Schmitz et al. [
9]. We first explore the models which account for the hierarchy of the data in terms of the grouping of treatments within classes as well as considering the different designs of included studies (i.e., randomised and non-randomised). We then extend these hierarchical models to allow for the class effect in the hierarchical model of different study design. We also explore the hierarchical model with bias adjustment, introduced by Schmitz et al., allowing for the bias for the non-randomised studies to be introduced at the individual study level as a random effect and extend it to allow for the average bias to vary across treatment classes.
We applied the methods to an illustrative example in type 2 diabetes assessing the impact of treatments within two classes of glucose-lowering medications; sodium-glucose co-transporter 2 inhibitors (SGLT-2is) and glucagon-like peptide-1 receptor agonists (GLP-1RAs) [
16]. We illustrate how the methods can be utilised to model data from studies of different designs in NMA in more detail, and to explore the impact the modelling assumptions have on effect estimates and uncertainty.
Discussion
The methods used in this study provide a basis for inclusion of aggregate data from comparative non-randomised studies in a systematic review and NMA of RCTs. A number of methods were explored and developed further, which included naïve pooling, hierarchical models accounting for the design of the studies and bias adjustment for observational studies. All methods were applied to an illustrative example in type 2 diabetes medications.
In this systematic review and NMA of RCTs and non-randomised studies, a total of 64 RCTs and 10 observational studies were analysed. In most cases, the direction of effect was similar in both RCT data and non-randomised data, which is supported by current research [
20]. However, in contrast to the RCTs, the observational studies favoured two SGLT-2i therapies, compared to the reference treatment. Naïve-pooling averaged the effect estimate between what was observed in RCTs and non-randomised studies, with most effect estimates having similar or smaller credible intervals in comparison to the results of NMA of RCT data alone.
In order to account for the limitations of non-randomised studies, hierarchical models and bias adjusted models were explored. In this study, hierarchical models fitted accounted for the design of the study, which was further extended to consider the classification of treatments within the SGLT-2i and GLP-1RA class. As in previous studies [
9,
21], effect estimates were similar to those from the naïve pooling method but credible intervals were often wider. In particular, allowing for additional heterogeneity across studies of different designs increased credible intervals. By allowing for additional levels of heterogeneity, the impact of the over-precision of the estimates from the observational studies on the pooled effects may be reduced.
Bias adjusted models, applied to data from our example of type 2 diabetes, resulted in similar effect estimates, if slightly shifted to the direction of the bias, compared to the naïve pooling model. Similar to Dias et al. [
15], between trial heterogeneity decreased when adjusting for bias, thus suggesting some of this heterogeneity was explained by the bias in observational studies. Interestingly, allowing bias to vary by class, relaxing the assumption that bias could be in the same direction regardless of treatment, models provided a better fit to the data according to DIC. This suggests that the magnitude and directionality of bias could differ by class and it may not be appropriate to assume the same bias for all observational studies.
Limitations
There are a number of limitations that need to be considered in this study. Firstly, this study has considered a single dataset and illustrative example. While this is a relatively large NMA, considering a number of treatments and studies, it included a relatively small number of non-randomised studies, which may have contributed excessively to the increased level of uncertainty. It is important to consider the effect of these models in alternative datasets, which may depend on a number of factors. Previously published studies showed similar effects as this study when utilising the naïve pooling model and hierarchical model accounting for study design [
9,
21]. The results from this study are promising but would need further investigation to understand the implications in other datasets. Future studies should also consider using simulation to assess the performance of these methods under a range a scenarios. Secondly, the non-randomised studies included in the NMA on average contributed a larger proportion of individuals compared to RCTs. This could potentially lead to the increased impact of the non-randomised studies on the pooled effectiveness estimates, which is a limitation particularly in the presence of unmeasured confounding. Thirdly, the issue of double-counting of individuals in NMAs including observational studies was not considered in this study. As the number of real-world and observational studies using large electronic health care databases increase, it is likely that individuals could be included multiple times in evidence synthesis due to the same database being used or individuals included in the databases also taking part in RCTs, thus artificially inflating precision [
22]. However, allowing for further heterogeneity across study designs and by introducing a bias factor, may mitigate the impact of this issue due to the allowance for increased uncertainty. Fourth, bias within RCTs was not considered in this NMA. Risk of bias assessment was completed in the original systematic review and NMA for RCTs. Most studies showed low risk of bias in RCTs and so adjusting for bias in RCTs in this case may have minimal impact but further work could consider adjusting for bias within RCTs as well as observational studies by, for example, adapting a Bayesian mixture hierarchical model proposed by Verde [
10] to an NMA and allowing for bias to vary according to treatment class. In fact, an extension of the model by Verde to NMA was recently proposed by Hamza et al., who also make software available to analysts [
23]. Finally, this systematic review and NMA only considered aggregate level data for both RCTs and observational studies. It would be important to consider the extension of these methods when including IPD for both RCTs and observational studies, as recently proposed by Hamza et al. [
23]. Moreover, further work could also consider the impact of the quality of the effectiveness estimates from the non-randomised studies when modelling bias. For example, there is likely heterogeneity in the way treatment effects are estimated and reported. Some studies may use appropriate methods of adjustment for confounding whereas others may not. Such information could also be used when deciding how to share the bias parameters; across the studies providing better quality estimates vs those of poorer quality.
Conclusions
The inclusion of observational data in NMAs of RCTs is gaining considerable traction in HTA due to the many benefits such as increasing evidence base, potentially connecting disconnected networks and allowing for more generalizable inferences. Methods such as hierarchical NMA and bias adjustment allow for more detailed modelling of the heterogeneity between study designs and can also be extended to allow for differences between treatment classes or account for differences in treatment doses. Both, hierarchical and bias adjustment models can provide a better fit to the data in comparison to naïve pooling and should be explored when conducting evidence synthesis. While the methods developed may ameliorate the effects of overestimation in observational studies, further analysis such as simulation studies would need to be conducted to investigate the capabilities of these models.
Acknowledgements
The authors would like to acknowledge authors of the original systematic review and NMA used as an illustrative example in this study; Professor Kamlesh Khunti, Dr Francesco Zaccardi, Professor Melanie J. Davies, Emily Patsko, Dr Nafeesa Dhalwani, David Kloecker and Ekaterini Ioannidou.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.