Background
Systematic reviews are seen as having an important role in decision-making processes, especially in medicine and health, and the numbers of published reviews are increasing [
1]. Under such circumstances, the issue of the quality of these reviews is also becoming increasingly important. A recent analysis of the epidemiology and reporting characteristics of systematic reviews by Page et al. [
1] found that the quality of the conduct and reporting of many reviews was often poor. This finding was based on an assessment of the reporting of 300 systematic reviews published in a single month in 2014 and, in part, by comparison with data on the reporting of 300 reviews published in the equivalent month in 2004 [
2]. The research concluded that reporting had improved since 2004, in large part due to developments such as the publication of the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) statement [
3] as the relevant standard for the reporting of systematic reviews. However, it noted that poor conduct and reporting was still frequent (indeed, it was considered to be ‘suboptimal’ for many characteristics) and strategies are needed to improve this if systematic reviews are to offer genuine rather than just hypothetical value to ‘patients, health care practitioners and policy-makers’ [
1]. This is important because the results of poorly-conducted, reported or flawed systematic reviews include ‘misleading conclusions’, which can have major implications for decision-making [
1]. Consequently, Page et al. made a series of highly appropriate and manageable recommendations to improve the conduct and reporting of systematic reviews generally, and thus to re-establish the method as trustworthy and robust as far as possible [
1]. These included using certain ‘writing tools’ to enforce reporting guidelines; editorial application of guidelines; involvement of relevant stakeholders from beyond the review team; and assessing the reporting of bias within the systematic review itself [
1].
The work also concluded that Cochrane reviews were ‘superior’ to ‘non-Cochrane’ reviews in terms of the ‘completeness’ of their reporting. This difference in relative quality has been found in other work too, though with a rather more critical assessment of the Cochrane reviews themselves [
4]. However, it is questionable whether ‘non-Cochrane’ reviews represent a homogenous group: they are published in many different journals and formats. Some ‘non-Cochrane’ systematic reviews are commissioned by policy-makers, or programmes related to policy-makers, and as such already involve many safeguards against the reporting of potentially ‘misleading conclusions’. Health Technology Assessment (HTA) represents a specific field in which the systematic review of clinical effectiveness evidence plays a fundamental role in healthcare decision-making [
5,
6]. These systematic reviews therefore represent a distinct type of ‘non-Cochrane’ review. One such group of systematic reviews are those conducted for the UK National Institute of Health Research, (NIHR) HTA programme; the full versions of these systematic reviews are published in that programme’s own journal series, Health Technology Assessment (Winchester) [
7]. This series of ‘non-Cochrane’ systematic reviews represents an interesting and informative comparison with Cochrane reviews given that, like Cochrane reviews, these reviews have clear reporting standards and are not constrained by word limits or the absence of online appendices unlike many other ‘non-Cochrane’ reviews published in more conventional peer-reviewed journals, an issue acknowledged by Page et al. [
1]. In fact, Page et al. do acknowledge that the apparent bias in quality of reporting in favour of Cochrane reviews might not always be bias in the conduct or reporting of the reviews at all, but rather might be due to the restriction of word limits [
1].
The aim of this research therefore is to compare the reporting of systematic reviews published in the UK Health Technology Assessment journal series with the reporting of systematic reviews generally, and Cochrane reviews specifically, from the years 2004 and 2014, as reported by Page et al. [
1]. A brief assessment will be made of developments between 2004 and 2014 for UK HTA systematic reviews, but the key comparison is across the UK HTA, Cochrane and ‘non-Cochrane’ reviews from 2014 because this offers the best evidence on current practice and standards. This comparison is important because it seeks to underline not only how so-called ‘non-Cochrane reviews’ are not all the same, but that the context in which systematic reviews are produced is a key driver in standards of conduct and reporting. Further, some of the recommendations for improved practice made by Page et al. are arguably already being met by the staff and editorial processes involved in the production of the Health Technology Assessment journal systematic reviews, and so a comparison with these systematic reviews might potentially offer further support for the authors’ proposals.
Methods
The data from the Cochrane and ‘non-Cochrane’ systematic review samples from a single month in each of 2004 and 2014 were published in the paper by Page et al. [
1]. In order to generate the data for the proposed comparison between UK HTA systematic reviews and the systematic reviews reported in Page et al., [
1], a simple search was performed in August 2016 to identify the systematic reviews published in the Health Technology Assessment monograph series from 2004 and 2014. This involved a structured search of MEDLINE combining title or abstract terms for ‘systematic review’ with the Health Technology Assessment journal identifier (‘health technology assessment winchester england.jn’), and the application of date limits (2004 and 2014). The current paper has included Health Technology Assessment systematic reviews from the whole of the designated years (rather than just one month) in order to generate a sufficiently sizeable sample from this single journal for comparison with the data published by Page et al. [
1]. All of the systematic reviews were published in English. The searching and screening process was conducted by one reviewer (CC) and checked by a second (EK).
Data extraction used a version of the form published by Page et al. [
1] with some modifications, e.g. the addition of a ‘Not Applicable’ option. The form was piloted by the two reviewers (CC, EK) on four Health Technology Assessment journal systematic reviews to enhance consistency and accuracy of extraction. Each reviewer then extracted half of the Health Technology Assessment systematic reviews. The data extracted included type of review, number of included studies, availability of protocols, details of search strategies, inclusion criteria, critical appraisal tools, synthesis methods etc.. Data were summarized as frequencies and percentages for categorical items and as median and range for continuous items. Results from the Health Technology Assessment systematic reviews from 2004 and 2014 were tabulated alongside the previously published results from the Cochrane reviews and ‘non-Cochrane’ systematic reviews from the same years from the paper by Page et al. [
1] These data were then compared and discussed.
Discussion
The principal findings of this assessment are twofold. First, and previously undocumented, the standard of reporting is often comparable between UK Health Technology Assessment systematic reviews and Cochrane reviews, and the reporting in both appears to be more complete than other non-Cochrane reviews. Second, there were clear improvements in the conduct and reporting of Health Technology Assessment systematic reviews between 2004 and 2014, consistent with the findings of Page et al. regarding other ‘non-Cochrane’ and Cochrane systematic reviews [
1].
Based on these data, the reporting of the Health Technology Assessment journal systematic reviews was comparable to Cochrane reviews and ‘superior’ to other non-Cochrane, across many characteristics: identification as a systematic review; registering of reviews and availability of protocols; conflicts of interest; reporting of the type of literature included; details of the database search strategies; searching of trial registries and grey literature; the use of PRISMA flow diagrams; the provision of details of any excluded studies; and the reporting of limitations at the level of the review and of the included studies. All of these elements have particular implications for informing decision-making, especially the importance of sourcing unpublished data [
17,
18] and clarifying uncertainties within the evidence-base and review itself. This was all achieved despite the much broader scope of the Health Technology Assessment reviews compared with Cochrane reviews: Health Technology Assessment reviews covered a range of topics and included a diversity of study designs, while the Cochrane reviews exclusively covered therapeutic topics, mostly evaluating pharmacological interventions based almost exclusively on RCT or quasi-RCT evidence (100% of reviews compared with 53% of HTA reviews). Some of the other differences between Health Technology Assessment and Cochrane reviews, such as one reviewer in the former extracting and appraising studies, and a second checking those data and judgements, compared with double-extraction in Cochrane reviews, might also be explained by the relative scale of the task being faced by Health Technology Assessment reviewers compared with Cochrane reviewers. For example, the 2014 Health Technology Assessment reviews included reports with up to 200 studies, compared with 17 at the highest end of the interquartile range (IQR) in Cochrane reviews. Given the much larger number of included studies, the diverse questions and study designs, and the time constraints that govern HTA reviews (UK HTA reviews typically must be completed six to nine months after protocol approval) [
19], unlike Cochrane reviews, some best practice ‘short-cut’ approaches are understandable. As a result, conduct and reporting is arguably much more straightforward within the narrower bounds of Cochrane systematic reviews, than in the broader, more diverse context in which certain non-Cochrane reviews are produced, especially Health Technology Assessment reviews.
At first glance, Health Technology Assessment reviews have weaker reporting than Cochrane reviews across several particular characteristics: total number of participants across the included studies; details of risk of bias assessments being reported in abstracts; specification of a primary outcome; a consideration of economics; the application of GRADE to ‘weigh’ evidence for the purposes of recommendations [
16]; and publication bias. Some of these ‘deficiencies’ are easily explained and arguably should not be considered an issue. For example, Health Technology Assessment systematic reviews tend to report the range of participants across the included studies rather than a total, conduct economic evaluations as a distinct but related piece of work, and include risk of bias assessments in the larger Executive Summary rather than the brief Abstract, where the results of the accompanying cost-effectiveness analysis must also be reported. Health Technology Assessment systematic reviews are also not expected to develop recommendations based on the quality of the evidence – it is simply not a function of this type of systematic review– the responsibility for making recommendations lies with other groups in the HTA process [
7,
19]. As a result, not one of the Health Technology Assessment reviews in 2014 reported using GRADE [
16]. The one definite issue of poor conduct and reporting, however, concerns publication bias. This is simply almost never taken into account in Health Technology Assessment systematic reviews, but should be, without question.
The conduct and reporting of UK Health Technology Assessment journal reviews is therefore of the same standard as Cochrane reviews and generally more complete, and better, than many other reviews. Consequently, they should arguably not be ‘lumped together’ with other ‘non-Cochrane’ reviews, but considered as the equivalent of that perceived Cochrane ‘gold standard’. This raises the question why this set of reviews is so well-conducted and reported. The reason lies in the context in which they are produced: UK Health Technology Assessment reviews are commissioned by bodies associated with the UK Department of Health (National Institute of Health and Care Excellence [NICE] and the National Institute of Health Research [NIHR]) and with a specific policy-making and decision-making purpose and audience in mind [
7]. While all reviews theoretically exist to support evidence-based practice [
20,
21], not all reviews are produced within a process that exists to do so [
19] and such a relationship is needed if research is to stand a chance of being influential [
22]. So, while many reviews might be produced with the potential to influence practice, few are actually commissioned as an integral part of a process which is able to develop policy and inform practice [
19]. Rather, as Page et al. have pointed out, it is to be suspected that many reviews are undertaken, and published, with rather less lofty aims in mind, such as a need to publish in order to secure promotions or funding for the authors [
1]. There is also a concern that these authors do not have the skills to conduct and report systematic reviews as well as they should [
1]. By contrast, the UK Health Technology Assessment journal reviews covered here often form part of a wider policy-making process – including taking into account cost-effectiveness of health technologies– with the aim of answering specific, policy-led questions, often about clinical decision-making and resource allocation, which can directly influence guidance and practice [
7,
23]. The fact that Health Technology Assessment reviews are produced within such a context of evidence-based medicine, with the genuine potential to influence practice and be useful, means that the processes have to be rigorous, transparent and of the highest quality: ‘bias’ in the review needs to be minimized; the possibility of ‘misleading conclusions’, attributed by Page et al. to a sizeable number of non-Cochrane reviews, cannot be permitted in Health Technology Assessment reviews because there is too much riding on their findings [
21,
23]. Given these implications, the HTA process has long included multiple specialist staff, as well as stakeholders outside of the review team who interrogate the work, and a rigorous editorial process for the Health Technology Assessment journal (which includes the requirement of a completed and checked PRISMA checklist). Such collaborations between funders, journals and academic institutions are all recommended by Page et al. as strategies to improve the conduct and reporting of systematic reviews generally [
1]. This is an important consideration given that more than 8000 systematic reviews are published each year and, based on the sample of ‘non-Cochrane’ reviews analyzed by Page et al., a sizeable proportion are likely to be of ‘questionable quality’ [
1].
Finally, there were apparent differences in conduct and reporting of systematic reviews published by the UK HTA programme, like all other reviews, between 2004 and 2014 across most domains, with the exception of the conduct and description of the searches for the reviews, which were of a very high standard in 2004 also. Overall, however, these differences are most likely the result of the publication and general acceptance of the PRISMA statement (Preferred Reporting Items for Systematic Reviews and Meta-analyses) as the relevant standard for the reporting of systematic reviews [
3]. The 2009 PRISMA statement has doubtless led to improved reporting across all domains, but in this sample it unquestionably led to the following: higher numbers of Health Technology Assessment teams registering their review (70% of HTA reviews in 2014 compared with 0% in 2004) and making protocols available (90% compared with 17%); more complete reporting of limitations affecting the review itself, rather than the reporting of the limitations of included studies only (73% of HTA reviews in 2014 compared with 49% in 2004); and more complete reporting of decision-making regarding the inclusion (97% of HTA reviews in 2014 compared with 57% in 2004) and exclusion (91 and 65% respectively) of studies. Before 2009 the QUORUM (Quality of Reporting of Meta-analyses) statement existed to outline how inclusion and exclusion should be reported, but this was not applied in Health Technology Assessment journal reviews nearly as rigorously as the PRISMA statement and its related checklist [
24]. It is also clear that conduct and reporting might have benefited from a greater standardization both of guidelines for systematic reviews and of the tools for the critical appraisal of included studies. Higher proportions of Health Technology Assessment reviews in 2014 were likely to report having followed guidelines in the conduct and reporting of the review than those in 2004 (63% did so compared with 24% in 2004) and, given the context in which they were produced, were as likely to cite the University of York Centre for Reviews and Dissemination (CRD) guidance [
15] as the PRISMA statement [
3]. The variety of available critical appraisal tools used in 2004 had been replaced by a smaller, more standardized set of tools by 2014, i.e. usage of the Cochrane Risk of Bias tool [
25] increased from 9% in 2004 to 43% in 2014, and the use of ‘other’ tools ‘decreased’ from 61% in 2004 to 32% in 2014 (the vast majority of these ‘other’ tools in 2014 consisted of the application of CRD criteria for RCTs [
15], which is ‘standard’ for Health Technology Assessment reviews). By 2014, a version of the QUADAS tool was being used for all of the HTA diagnostic reviews and the Jadad score had been dropped completely (from 17% of HTA reviews in 2004 to 0% in 2014), despite continuing to be used in 11% of other ‘non-Cochrane’ reviews.
The limitations of this work need to be acknowledged: this study only looked at a sample of HTA reviews from two years, in order to offer a direct comparison with the data presented by Page et al. [
1], so results might be a little different if other years were selected. This work also only considered those characteristics previously examined by Page et al. [
1]; different characteristics might have generated some different findings. Also, the data in Page et al. [
1] were not re-checked, but taken as reported. Finally, some data are not straightforward (e.g. a search for ‘unpublished’ data might not be noted explicitly in the Methods of a Health Technology Assessment systematic review, but trial registers and grey literature are searched and often included and have much higher percentages), and so there might be some inconsistency in data interpretation across this study and the study by Page et al. [
1] It should also be noted that this paper is not making generalizations about systematic reviews as a whole or all HTA systematic reviews.
Conclusion
The aim of this research was to compare the reporting of systematic reviews published in the UK Health Technology Assessment journal series with the reporting of systematic reviews generally, and Cochrane reviews specifically. It found that UK Health Technology Assessment systematic reviews present standards of conduct and reporting equivalent to so-called ‘gold standard’ Cochrane reviews and superior to systematic reviews more generally. It therefore makes sense to view any systematic review produced within a genuine policy-context, and not just those from the UK Health Technology Assessment journal, as being on a par with Cochrane reviews and representing something quite different from other ‘non-Cochrane’ reviews. Indeed, knowing the purpose behind the production of a review – who commissioned the work, who funded the work, why the work was being undertaken – might be an obvious means by which users of reviews can ‘rate’ a systematic review’s likely reliability and rigour, [
21] alongside assessments using the AMSTAR (Assessing the Methodological Quality of Systematic Reviews) or ROBIS tools [
26,
27]. This is because reviews produced within a genuine policy-making context will have enjoyed resources and been through processes, given the implications of getting things wrong, that apply to few other reviews. This is not to say that other ‘non-Cochrane’ reviews produced outside of this context cannot adhere to high standards of conduct and reporting, of course they can; only that, based on the evidence as presented here, in the hierarchy of systematic reviews, UK Health Technology Assessment reviews deserve to be categorized as ‘gold standard’ alongside the more universally known Cochrane library of reviews. This is because the purpose behind these reviews resembles most closely the purpose originally intended for systematic reviews [
20,
28]: to inform evidence-based medicine. Further, as pieces of research, given the context in which they are produced, they are more likely to be useful than many other reviews, in the sense that their findings almost certainly will be used, and all such research should be capable of being used [
29,
30]. Again, they satisfy many of the requirements for genuinely useful research because they form part of an explicit policy-driven process in which time and money is expended with the specific aim of telling practitioners what to do and what not do.