Background
With the growing interest in “real-world evidence” obtained from analyzing administrative health data and the development of sophisticated quasi-experimental study designs [
1], regulatory agencies [
2], and others who systematically review health interventions are increasingly incorporating non-randomized studies (NRS) into their evidence syntheses [
3]. As such, methods to appraise the risk of bias, defined as the risk of systematic error in results or inferences [
4], of these complex evidence sources are now coming under closer scrutiny. The choice of risk of bias tools (RoB tools) is not straightforward for reviews of NRS, although methodological tools for assessing the risk of bias in randomized controlled trials (RCT) are more well-established, with the Cochrane Collaboration’s RoB Tool [
5] now considered the standard [
6]. The last two decades have seen a proliferation of tools developed to evaluate the risk of bias in NRS; a 2012 systematic review identified 74 tools developed for quality appraisal, of which risk of bias is a component, of non-experimental studies [
7]. However, none of these existing NRS quality appraisal tools are currently accepted as the gold standard [
1,
8], and it is unclear which tools are the most rigorous and practical.
Quality appraisal for NRS is complicated by the heterogeneity of this category of study design. Under this umbrella term are a multitude of designs, including experimental studies (e.g., non-randomized controlled clinical trials), quasi-experimental studies (e.g., controlled before-after studies, interrupted time series), and traditional observational studies (e.g., cohort, case-control, cross-sectional studies). NRS may be at higher risk of bias due to confounding compared to RCT [
9]; however, a single checklist may not adequately assess the risks particular to the various types of NRS. For example, past studies have found that existing tools are insufficient for the evaluation of the risk of bias in pharmacoepidemiological safety studies [
10], natural experimental studies [
11], and other quasi-experimental designs [
12]. Moreover, if multiple checklists are used in systematic reviews that incorporate multiple study designs, review authors need to consider whether these tools are comparable, particularly in terms of rating evidence within a grading system or when using a cut-off to determine which studies to include in a systematic review or meta-analysis.
Two studies published in 2018 found a wide variation in the use of RoB tools for NRS in published systematic reviews [
13,
14]. While the Newcastle-Ottawa Scale was the most frequently used tool for NRS in both studies, it was also not uncommon for systematic reviews to use no RoB tools at all or to inappropriately use tools intended for RCT. Further, Quigley et al. reviewed methodological recommendations from health technology assessment bodies and concluded that there is no consensus on which tool(s) should be the standard of practice for appraising bias in NRS [
13].
To our knowledge, no previous study has assessed the use of RoB tools by examining pre-published systematic review protocols, which may provide more detailed methodological information compared to published systematic reviews. Evaluating protocols registered in PROSPERO, an “international prospective register of systematic reviews,” enables us to look forward into the future to anticipate emerging trends in RoB tools, as well as look at historical trends in RoB tool use over time. Given the ongoing development of new RoB tools, certain tools may have fallen out of favor or gained currency over time.
In the present study, we conducted a cross-sectional analysis of systematic review protocols on health interventions registered in PROSPERO to identify which tools were the most commonly cited in 2018 to evaluate the risk of bias of RCT and NRS in systematic reviews. We also conducted a retrospective analysis of trends in the use of these commonly cited RoB tools in protocols of health interventions registered in PROSPERO since database inception (2011). In the absence of a gold standard, identifying the most common tools cited for use would help researchers position their RoB tool selection in the context of their peers. Knowing how RoB tools are applied in practice could also inform future tool development or identify areas where educational interventions on RoB tool use are needed.
Methods
Review of 2018 PROSPERO records
Data source and sample selection
The search for eligible protocols was conducted using PROSPERO’s database filters for type and method of the review, source of the review, and date of addition to the database [search strategy: (Intervention):RT NOT Cochrane:DB WHERE CD FROM 01/01/2018 TO 12/10/2018]. To be included in this analysis, PROSPERO protocols had to be for systematic reviews of health interventions. We excluded Cochrane review protocols because they were assumed to use Cochrane methodology and RoB tools. Protocols for rapid reviews were excluded as their approach to quality appraisal may be different compared to full systematic reviews. Protocols for overviews of reviews (or “umbrella” reviews), reviews of guidelines, qualitative studies, preclinical studies, and economic evaluations were excluded as the risk of bias assessment for these study designs was outside the scope of this study. Further, we selected protocols from only the most recent year available (2018) in order to determine contemporary practices in the use of RoB tools. Retrieved records were screened by one reviewer (K.F.) for inclusion.
All PROSPERO records that met the date and database review type limits were downloaded on October 12, 2018. There were 4215 eligible protocols registered in PROSPERO from January 1 to October 12, 2018. Of these protocols, 500 (approximately 10% of registered protocols) were randomly selected for practicality, as the aim of this analysis was to identify which RoB tools were the most commonly cited in systematic review protocols in 2018 when this analysis was conducted. A simple random sample was created using the random number generator from RANDOM.org.
Data was extracted on the types of studies to be included from each of the selected systematic review protocols. Protocols were then coded as including RCT (including quasi-RCT), NRS (including non-randomized experimental, quasi-experimental, or observational study designs), or both.
Data was also extracted on all of the tools the protocol authors planned to use for risk of bias assessment and, if specified, the study designs that the tools will be used to assess. Since we wanted to understand what RoB tools authors were choosing to use for quality appraisal, we recorded tools according to author intentions and regardless of whether the tools were specifically designed for this purpose. We recorded the systematic review using “suites of tools” in cases where the RoB tool was comprised of separate checklists for different study designs produced by the same organization, but the exact number of checklists to be used was not specified. For example, the Joanna Briggs Institute (JBI) produces a number of tools for appraising various study designs [
15]. If authors only refer to JBI tools generally, it is unclear how many tools are being employed. We recorded the review using “multi-design tools” in cases where the RoB tool was designed to assess both RCT and NRS, for example, the Downs and Black checklist [
16]. If both RCT and NRS were to be included in the systematic review, we recorded whether the authors planned to use different tools for these designs, or whether they used a single tool for both types. If the authors stated that they were following Cochrane guidelines and only included RCT, we assumed they were using the Cochrane RoB Tool. Data was extracted by one reviewer (K.F.).
Longitudinal analysis of PROSPERO records
To determine usage trends over time of RoB tools that are in common contemporary use, we searched PROSPERO records for the names of the most frequently cited RoB tools identified in the above cross-sectional analysis to determine how often each tool was mentioned on an annual basis. We assessed annual trends for tools that were named in five or more of the protocols included in the random sample of 2018 PROSPERO records. Tools that were not developed for risk of bias assessment, e.g., reporting guidelines, were excluded. Using keywords and name variants for each tool, we searched PROSPERO records by year since the inception of the database (2011) to December 7, 2018, restricting the keyword search to the “Risk of bias (quality) assessment” field. Searches were limited to protocols for reviews of interventions. Cochrane review protocols were excluded, as it was assumed that they followed the risk of bias procedures outlined in the Cochrane Handbook. The number of records retrieved for each tool per year was recorded. We did not further verify the text of the protocol records. Tools were classified by the types of designs they were intended to assess: RCT only, NRS only, multi-design tools, and suites of tools.
Statistical analysis
Descriptive statistics were used to summarize the frequency and proportion of the RoB tools in the random sample and year-by-year analysis of PROSPERO records.
Discussion
In this study, two-thirds of PROSPERO protocols on health interventions in the 2018 sample intended to include evidence from NRS in addition to RCT, while the remaining protocols restricted to RCT only. When protocols were restricted to RCT, the choice of RoB tool was highly consistent, with 85.2% planning to use the Cochrane RoB Tool. A few additional protocols (1.9%) planned to use Cochrane RoB 2 Tool, which was first introduced in 2016 as an update to the original Cochrane RoB Tool [
20]; however, the uptake of Cochrane RoB 2 Tool may be underestimated, as authors may not have specified the version number in their protocol.
In protocols that intended to include both RCT and NRS, the choice of tools was more heterogeneous, consistent in finding with current opinion that there is no consensus on the preferred tools for evaluating bias in NRS [
3,
8,
12,
13]. This finding is also consistent with previous research from Seehra et al., which described quality appraisal tool use in systematic reviews as “varied and inconsistent” [
14]. Just over half of protocols including both RCT and NRS listed only one tool for risk of bias assessment, most frequently the Cochrane RoB Tool, which was designed to assess risk of bias in RCT [
36]. In a review of 686 systematic reviews, Quigley et al. found that RoB tools designed for RCT were often misapplied to NRS [
13]. The choice to use a RoB tool for a study design that it was not intended to be used for might be made for several reasons, such as the convenience of using one tool for multiple study designs, misinformation on appropriate RoB tools, or a lack of a gold standard RoB tool available for NRS. It is also possible that authors had not planned on assessing the quality of NRS. For example, Briere et al. observed that many meta-analyses and health technology assessments using real-world evidence from NRS did not critically appraise these studies [
3], and in Deeks et al.’s review of 511 systematic reviews that included NRS, only a third performed quality assessment for NRS [
37]. We also found that some protocol authors were not being specific in identifying RoB tools a priori or were inappropriately applying tools to assess risk of bias for NRS. To compound the challenges in appraising the quality of NRS, most of the commonly cited RoB tools for NRS, such as the Newcastle-Ottawa Scale, ROBINS-I, and MINORS, have not been sufficiently validated [
13].
When systematic reviews that intended to include NRS planned to use multiple tools to assess risk of bias, the Newcastle-Ottawa Scale was the most commonly listed RoB tool to assess NRS (39%), followed by ROBINS-I (33%). Although some have pointed out that the Newcastle-Ottawa Scale has several weaknesses, including low inter-rater reliability [
38] and “uncertain validity” of some items [
39], this scale appears to be the most popular choice of all the NRS tools and is considered easy to use [
40]. Both Quigley et al. and Seehra et al. also found that the Newcastle-Ottawa Scale was the most frequently used tool to assess risk of bias in NRS. In the trend analysis of commonly listed tools, the Newcastle-Ottawa Scale was the dominant NRS appraisal tool each year, from 2011 to 2018. However, the ROBINS-I tool (previously ACROBAT-NRSI) appears to be gaining in popularity in recent years.
Limitations
Because this study was conducted using systematic review protocols, we do not know whether the final systematic reviews actually used the tools listed in these protocols. The analysis of PROSPERO protocols for the trend analysis relied on keywords and counts from the search results without further verification in the text of protocols, which may have overestimated the use of certain tools, particularly for Cochrane tools and suites of tools. However, keywords were restricted to the risk of bias section of the registered protocol. As not all systematic reviews are registered prospectively in PROSPERO, results of this study may not be generalizable to the wider body of systematic reviews on health interventions. Authors who are motivated to register systematic reviews in PROSPERO or publish their protocols in peer-reviewed journals, both of which are recommended by the AMSTAR systematic review quality appraisal tool [
41], may be more likely to use RoB tools recommended in institutional guidelines, such as the Cochrane Handbook. An additional limitation is that the trend analysis was conducted for only the most commonly cited tools planned for use in systematic reviews in 2018. Therefore, this analysis does not capture complete trends for the planned use of RoB tools over the last 8 years in PROSPERO.
Conclusions and implications for practice
Results of this analysis emphasize that the Cochrane RoB Tool has become the standard for systematic reviews of RCT. Despite the existence of dozens of tools for assessing NRS, relatively few are commonly used in practice, with the Newcastle-Ottawa Scale and ROBINS-I being the most frequently used. There is also evidence that the Cochrane RoB Tool for RCT may be used inappropriately to assess NRS, indicating a need for more education and awareness on the appropriate use of tools for the quality assessment of non-randomized designs.
With a lack of gold standard for assessing risk of bias in NRS, some have called for the development of an improved tool that could effectively evaluate different kinds of quasi-experimental studies [
12]. Others have suggested using different tools based on the types of study designs that are identified by the review [
3,
13]. The development of a “meta” quality appraisal tool, such as the one created by Public Health Ontario [
42], which recommends particular tools by study design, may be a coherent way to address the lack of guidance on risk of bias assessment for systematic reviews incorporating NRS evidence. Future research should focus on the development and validation of tools for specific NRS designs.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.