Background
Increasing numbers of systematic reviews evaluating diagnostic technologies are being published in the field of Health Technology Assessment (HTA). In response to the needs of policy-makers in this field, in the last years, the National Institute for Health and Care Excellence (NICE) has established a Diagnostics Assessment Programme and a Diagnostics Advisory Committee, having run a pilot project to develop methods in this area [
1,
2]. Systematic reviews or individual studies of diagnostic test accuracy usually compare an index test with the best available test or current standard procedure for making a diagnosis. The methodological challenges of undertaking systematic reviews of diagnostic accuracy studies are well known and have been extensively discussed in the academic literature [
3,
4]. Searching for and identifying evidence is one challenge when undertaking such a systematic review. Search filters, including validated filters, are available from various sources, but their use is now not recommended by some organisations because the results from applying these filters are variable [
4,
5]. This is due in part to inconsistency in the reporting and indexing of papers. Consequently, diagnostic study filters compare less favourably with other search filters, e.g. for Randomised Controlled Trials [
4]. The Cochrane Collaboration Diagnostic Test Accuracy Working Group is working on the publication of diagnostic test accuracy systematic reviews within the Cochrane Library and recognises the challenges of searching for diagnostic studies. The Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy has a chapter on searching for studies, which recommends that “a range of databases be considered for searching”, including MEDLINE, EMBASE and regional databases (to account for the differing disease prevalence within different geographical regions) [
6].
Within the information science community, there is a growing interest in search efficiency, in particular whether it is possible to identify the same sample of included studies for a systematic review by searching fewer databases than the traditionally large number deemed necessary [
7,
8]. This perhaps inevitable move has been driven by several factors, including the improved indexing and searching capabilities of databases and the need to produce high-quality reviews within time and resource constraints [
9]. Consequently, it has been argued that a well-structured search undertaken in only two or three databases (supported by additional methods to identify evidence, such as reference list checking, citation searching, contact with manufacturers and experts) might identify evidence more efficiently than a similar search undertaken in more databases [
7].
Recent research evaluated whether searches for studies of diagnostic test accuracy for systematic review and meta-analysis could be limited to MEDLINE alone [
10]. Appraising 44 reviews of diagnostic test accuracy studies containing 76 meta-analyses, the authors found that in 65 of the 76 meta-analyses (85.5 %), all of the studies were identifiable in MEDLINE. Of the remaining 11 meta-analyses, 87.5–99 % of the studies were identifiable in MEDLINE. Therefore, the authors suggest that extensive searching in databases other than MEDLINE has minimal effect on the identification of studies for inclusion in diagnostic reviews. However, this conclusion assumes that the actual searches undertaken in MEDLINE for all 44 reviews would have had 100 % sensitivity: that is, they would have retrieved all of the relevant studies indexed in that database. In a separate study by the same authors, statistical tests were also undertaken on a sub-set of those meta-analyses for which not all included studies were indexed in MEDLINE. This found that the omission of any of the “missed” studies would not have impacted on the basic findings of that sample, though precision might be slightly affected [
11].
An earlier study [
12] sought to estimate the yield of searches for studies of diagnostic test accuracy across seven different databases by re-running the searches as they were described for eight specific systematic reviews. Taking the included studies from these reviews, the authors created a gold standard set of included studies (
n = 522) and then categorised them as follows: 1) being indexed in the databases and retrieved by the published searches as they were described; 2) being indexed in the databases but not retrieved by those searches; and 3) not being indexed in any of the databases. The study found that no search identified all of the included studies in the gold standard set for any one of the eight reviews—even across all seven databases; that more than 20 % of the studies in any review were not identified by the search of MEDLINE (EMBASE, Science Citation Index and BIOSIS all contained studies that were not in MEDLINE); that another 22/522 were not retrieved from any of the seven databases using the reported searches, and that 8/522 studies were not indexed in any of the seven databases.
Given the different findings of these two studies [
10,
12] (i.e. the potential value of MEDLINE alone vs the requirement to search multiple databases), there is a strong case for further exploratory research in the area of searching for diagnostic test accuracy studies for systematic reviews.
The aim of this study is therefore to examine whether it would be worthwhile to limit searching for diagnostic test accuracy studies to MEDLINE and EMBASE alone (rather than searching a longer list of databases), along with the standard systematic review supplementary technique of checking the references of included citations and relevant reviews. This is the proposed strategy. MEDLINE and EMBASE have been chosen as they are the two major general bibliographic databases in the health sciences and have been found to be the most important sources of evidence in Health Technology Assessment [
13]. They are routinely recommended as a minimum for searches by bodies such as the Cochrane Collaboration [
6] and NICE [
1], and they are the databases with the majority of published search filters. The addition of reference checking, as a supplementary method, is also being assessed because it should be a standard technique to identify literature in all systematic reviews but its value as a search strategy has not yet been evaluated by previous research into systematic reviews of diagnostic test accuracy.
The specific objectives of this study therefore are to analyse a convenience sample of systematic reviews of diagnostic test accuracy studies in order to: 1) identify which citations were indexed on MEDLINE or EMBASE; 2) to identify the number and proportion of citations that were retrieved by the MEDLINE and EMBASE search strategies reported for these reviews; 3) to identify the number and proportion of studies that could have been retrieved by the searches of MEDLINE and EMBASE plus reference checking of studies identified as relevant (any that could not be found by this proposed strategy are referred to as “missing” citations); and, finally, 4) to detail the reported search strategies and consider implications for literature searching for systematic reviews of diagnostic test accuracy.
Discussion
The sample of systematic reviews covered here searched between seven and nine databases although some of these databases are principally or exclusively index systematic reviews (CDSR, DARE and HTA) and so were unlikely to produce many individual diagnostic test accuracy studies. However, the reported searches of MEDLINE and EMBASE alone, plus the checking of the reference lists of relevant papers, would have accounted for 280 (93 %) of the total included citations across all nine reports and 100 % of the included citations in four of the nine reports ([
15,
16,
19,
21]).
In terms of indexed citations, the findings for MEDLINE (91 %) are similar to those reported by Van Enst and colleagues [
10]. However, this percentage does not indicate what was identified by the searches that were developed and run for these particular reviews but only what could potentially have been identified based on the proportion of indexed citations. In the present study, the proportion of citations found by the actual searches across both MEDLINE and EMBASE in the reviews ranged from 60 to 100 %. Consequently, the evidence of this sample suggests that, on the whole, MEDLINE alone cannot be relied on to act as a single source database for systematic reviews of studies of diagnostic test accuracy.
The reported searches were constructed according to standard principles, but more sensitive searches might have had the potential to identify all of the indexed and included citations in each of these four reviews and to miss only the 15 non-indexed citations across the other five reviews. Searches could have been made more sensitive by the addition of further keywords or free-text terms or the removal of certain terms or sets of terms: for example, the reports by Kaltenthaler [
21] and Sutcliffe [
19] did not use filters, and the former did not use terms for the population and the latter did not use terms for the index test (Table
5). However, a more sensitive search would have also increased the number of hits and the size of the task involved in screening citations for inclusion, which can create practical problems for Health Technology Assessments which are required to produce reports within time constraints [
9,
22]. For example, the searches conducted in MEDLINE and EMBASE for the review by Holmes et al. [
5] only retrieved 37 of a possible 51 of the included citations (73 %) that were indexed in these databases. This represents a relatively low retrieval rate. The search strategy does appear to have been less sensitive than most other reviews with the exception of those by Simpson [
15] and Goodacre [
16], which applied the same or more limitations in terms of including all possible elements of a search (see Table
5). However, the searches from this review also generated the second largest number of citations for screening (13,075); the largest was one of the reviews with the least sensitive searches: Simpson [
15] with 15,824 citations. The need to maintain manageable numbers for study selection screening would explain the development of these less sensitive searches.
Although the use of filters is not recommended for reviews of diagnostic test accuracy studies, some of the reviews in this sample pre-date this guidance from Cochrane and NICE. More importantly, it should be noted that the aims of Health Technology Assessments differ from Cochrane reviews: HTA reports address questions that are more complex than just a question of diagnostic accuracy, for example, the opportunity costs of implementing diagnostic strategies vs going straight to treatment. It is often strategies that are being compared, not just tests. So, in many of these reports, there are a number of other questions also being addressed and searched for, including adverse events, quality of life and cost-effectiveness. Working within time and resource constraints to produce such reports might require a more pragmatic approach to searching, such as the application of filters, when otherwise sensitive searches produce unmanageable numbers of citations [
8,
22].
Twenty-two citations (7 %) could not be identified by the proposed method of searching just MEDLINE, EMBASE and the references of retrieved citations. Twelve of these citations were abstracts. Published abstracts should not be ignored, especially because studies included in systematic reviews of diagnostic test accuracy can take the form of ad hoc analyses rather than registered trials. They might offer key data for assessing and managing publication bias [
23] and, for some topics, these data might be vital for a review’s findings, especially for tests about which little has been published, for instance if the technique is novel [
24,
25].
It is true that the usefulness of abstracts might be limited by their lack of detail, which can prevent a meaningful assessment of risk of bias and can render data more uncertain. However, in this case study, all of the abstracts missed by the searches of MEDLINE, EMBASE and the reference lists did satisfy the reviews’ inclusion criteria and were used in their analyses. It should be mentioned also that the majority of the reviews performed narrative synthesis rather than meta-analysis, so the impact of their possible omission is difficult to quantify. However, given the very small proportion of “missing” studies, their impact on the findings on the respective reviews is likely to have been minimal.
The diagnostic topics covered by the nine systematic reviews were diverse and were undertaken over an extended period (2004–2014), so, other than their conduct by a single centre, this does not represent a particularly restricted sample. This evidence indicates that an approach that involves searching MEDLINE and EMBASE using strategies constructed by applying standard systematic review techniques, then carefully checking the references of included papers, is likely to be more than sufficient for a systematic review of diagnostic studies. In this way, only 22/302 (7 %) of citations would have been missed across nine reviews and, in four reviews, no citations would be missed at all. Such a level of omission is unlikely to adversely affect the findings of systematic reviews of diagnostic test accuracy: it has been demonstrated that a larger percentage of missed studies had little effect on meta-analyses of a sample of diagnostic test accuracy reviews [
11]. This approach would also save a great deal of time and effort and, given the smaller numbers of citations needing screening, would possibly also reduce the risk of reviewer error in selecting citations for potential inclusion. This would permit a more rapid evidence synthesis, whilst not compromising systematic review principles or increasing the risk of bias [
22].
Limitations
This study used a small, non-random sample of diagnostic test accuracy systematic reviews. This was done for reasons of pragmatism: first, because the authors had full access to the search strategies and reference databases of these reviews and, second, because of the exploratory nature of this project. We also assumed that the vast majority of the included citations in the reviews were located through screening of titles, abstracts and full papers. We have also assumed, because the number of studies missed by operating the proposed MEDLINE, EMBASE and reference tracking strategy is so small that the findings of the systematic reviews would not have been greatly affected by their omission. However, this is uncertain and can only be assessed statistically by excluding those particular studies from the many analyses reported in the reviews, although, as noted above, most of these reviews conducted narrative synthesis. Such an analysis is a major task to undertake retrospectively and has therefore not been completed in this exploratory study. Future work should test the findings of this small study in a larger, preferably prospective sample of systematic reviews from multiple institutions. If possible, statistical analysis should also be undertaken to quantify fully the impact of omitting any data from studies that might otherwise be missed [
11].
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
CC conceived of the study, participated in its design, coordination and conduct and drafted the manuscript. LB coordinated the project and helped to draft the manuscript. SP and EK helped to coordinate the project and to draft the manuscript. PG carried out the majority of the database searching. All authors read and approved the final manuscript.