Discussion
We described the general framework and details of different search strategies applied during the systematic review conducted by GUiDEG to collect evidence for the GBD study. We performed this work between 2007 and 2011, soon after the STROBE consensus guideline [
8] on conducting systematic reviews of observational epidemiologic studies became available, and before the GATHER guideline was published [
5]. The presented results not only satisfy the requirements of these guidelines, but also contain several innovative features. Specifically, one of our main goals was to estimate the effect of different search strategies not only on full-text article retrieval, but also regarding the number of data rows extracted from them, and their geographical coverage. For this purpose we used universal metrics to compare bibliographic search strategies: number needed to retrieve (NRR) records to obtain one full-text article for data extraction, mean number of extracted rows per article, and number of covered countries. We found that PubMed was much more efficient than EMBASE, with a NRR of 77 and 303, and a mean number of extracted rows per article of 20.3 and 11.6, respectively. To the best of our knowledge, the relative comparison of PubMed and EMBASE for searching for epidemiologic evidence has not been reported so far. The PRESS guidelines [
9] recommended peer review of search strategies before conducting systematic reviews, which would improve their performance, but any data on this type of peer review is rarely mentioned in the published literature. This makes it impossible not only to judge how comprehensive a search strategy was, but also to estimate the comparative effectiveness of different systematic reviews on the same topic. Moreover, the majority of systematic reviews do not compare the effectiveness of different search strategies, and use only one without clearly defining performance indicators (such as NNR, number of extracted data rows, or number of covered countries). Nevertheless, from the articles reporting results according to PRISMA, [
10] we can calculate the NNR metric for the published systematic reviews on epidemiology of certain diseases, and see substantial heterogeneity. Thus, in other systematic reviews of chronic kidney disease prevalence, the NNR varies between 42 in cases of limitation by country names [
11] to 157 for a word-based strategy [
12]. For the systematic reviews on acute kidney injury epidemiology, NNR varied between 12 [
13] and 65 [
14]. Systematic reviews performed on global epidemiology for GBD conditions by other Expert Groups had a NNR of 39 in cases of untreated caries, [
15] 63 for otitis media, [
16] 73 for visual impairment and blindness, [
17] 132 for stroke, [
18] and 220 for peripheral artery disease, [
19] with wide heterogeneity in NNR also in case of risk factors presented in GBD – 29 for fasting plasma glucose and diabetes, [
20] 105 for systolic blood pressure, [
21] and 201 for total serum cholesterol [
22]. Most systematic reviews for GBD conditions were not reported according PRISMA guidelines that do not make it possible to calculate their NNR or other bibliographic metrics.
NNR depends on many factors, including the availability of published evidence, restriction of search by controlled vocabulary of subject headings provided by the bibliographic database (MeSH in case of MEDLINE or Emtree in case of EMBASE) or other specific fields, and intercept of search terms with the common clinical terminology. Due to this, NNR could not be used to estimate the quality of a systematic review itself. Nevertheless, our analysis, which focused on systematic reviews in epidemiology, suggests that a NNR below 20 explicitly indicates the exclusion of a substantial number of useful articles, and with such a NNR, the authors of systematic reviews would need to consider making some changes to their search strategy by making it more comprehensive. Similarly, an extremely high NNR, of more than 150, would beg the question whether the search strategy was useful and cost-effective. A rather high NNR in the aforementioned systematic reviews (including ours) could be related to the frequent use of epidemiologic terminology (such as ‘incidence’, ‘prevalence’, ‘mortality’ and ‘survival’) in descriptions of clinical studies or highly selected non-representative populations that are inappropriate for epidemiologic estimates. The wide use of this terminology for clinical purposes in kidney-related literature refers to a much higher number of total records obtained in our epidemiologic search (33,707 records) compared with dentistry, [
15] ophthalmology [
17] or otolaryngology [
16] (12,143, 14,908 and 7168 records, respectively). Restricting a search by subject headings could exclude a substantial number of useful articles and extracted from them data rows, as shown by the application of our SuHeSS strategy that did not catch almost 15% of the relevant information (Table
2). Systematic reviews in clinical fields, though to a lesser extent, also suggest the exclusion of a proportion of useful articles from the results of a search restricted to subject headings. For example, 4.2% of articles relevant to breast cancer did not have the MeSH term ‘Breast Neoplasms’, [
23] and 2.6% of relevant articles on congenital vocal paralysis were not caught by the MeSH-restricted strategy [
24]. Moreover, the use of MeSH precludes researchers from obtaining records that have not yet been MEDLINE-indexed, and the application of the free-text search strategy in PubMed provides, on average, an additional 160 unique records for the set of systematic reviews [
25]. The negative effect of restriction by MeSH in our analysis was accompanied by the positive effect in NRR reduction to 40, implying to a reduced workload. Importantly, the intercept search strategy further reduced the NNR to 23, and excluded a percentage of relevant information similar to SuHeSS, with about 20% of useful full-text articles and data rows extracted from them compared with our ‘gold standard’ PubMed set. Thus, if there is a severe lack of resources for conducting a systematic review, or in case of a preliminary search, it is possible to suggest not using the SuHeSS strategy but applying the intercept search strategy.
The search engine interface itself could substantially influence a number of retrieved records. Because of the workload required to classification by MeSH that performed in NLM, the average time lag for a record to move from PubMed to MEDLINE In-Process was 3.3 months, and from PubMed to MEDLINE it was 10.5 months [
25]. These data would favour the use of PubMed for performing systematic reviews, but a search by Ovid MEDLINE (but not Complete Ovid MEDLINE, which also covers In-Process and not indexed content) is frequently used due to its more convenient search query construction. Moreover, due to internal mechanisms, even absolutely identical queries could provide different results using different search engines, as was demonstrated by running identical searches in the Allied and Complimentary Medicine Database, which is rarely used in systematic reviews, with an almost twofold difference between records obtained by the DIALOG, Ovid and EBSCOhost interfaces [
26]. The difference in the numbers of returned records between the PubMed and Ovid interfaces for the much more commonly used MEDLINE database could reach about 1% for similar strategies adapted to the interface, [
27,
28] but the effect of identical search queries has not been studied yet. Excessively complex queries could substantially decrease the number of relevant articles found, and removing excessive limits by simplifying search queries could increase recall from 27 to 79% [
29]. Last but not least, the availability of a uniform method of classification and terminology for describing diseases could substantially influence both the number of retrieved records and NNR, as demonstrated in our analysis by year of publication: soon after the introduction of the modern classification and the term ‘chronic kidney disease’ in 2002, it became widely used in titles, abstracts, and MeSH, which facilitated the retrieval of useful articles for data extraction. Further development of search strategies for obtaining epidemiologic evidence of disease burden would reduce the NNR to facilitate the initial steps in conducing systematic reviews, while maintaining the number of finally selected articles, data rows extracted from them, and geographical coverage.
Acknowledgements
The authors acknowledge all collaborators of the GBD Genitourinary Disease Expert Group (GUiDEG), as explained below. Authors thanks Kerstin Mierke for editorial assistance during preparation of the manuscript.
GUiDEG Collaborators
Boris Bikbov, Claudia Cella, Monica Cortinovis, William Couser, Patricia Veronica Espindola Estevez, Flavio Gaspari, Felipe Antonio Rodriguez de Leon, Catherine Michaud, Valeria Miglioli, Christopher Murray, Mohsen Nagavi, Bishnu Pahari, Norberto Perico, Esteban Porrini, Giuseppe Remuzzi, Andrea Alejandra Panozo Rivero, Bernadette Thomas, Marcello Tonelli, Karen Courville de Vaccaro, Theo Vos, Natasha Wiebe, Sara Wulf.
Affiliations of GUiDEG collaborators
Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Bergamo, Italy (BB, CC, MC, FG, VM, NP, GR); Azienda Socio-Sanitaria Territoriale Papa Giovanni XXIII, Bergamo, Italy (GR); University of Milan, Milan, Italy (GR); Institute for Health Metrics and Evaluation, Seattle, USA (CMi, CMu, MN, BT, TV, SW); University of Washington, Seattle, USA (WC); Hospital Maciel, Montevideo, Uruguay (PVEE); Complejo Hospitalano Metropolitano, Panama City, Panama (FARL); BP Koirala Institute of Health Sciences, Dharan, Nepal (BP); UCICEC Centre for Biomedical Research of the Canary Islands, La Laguna Tenerife, Spain (EP); Hospital Juan XXIII, La Paz, Bolivia (AAPR); University of Calgary, Alberta, Canada (MT, NW); Hospital Dr. Gustavo N Collado, Puerto Chitre, Panama (KCV).
GUiDEG Contributors
Genitourinary Disease Expert Group acknowledges researchers who contributed with their work to different steps of this systematic review: Developing search strategies (step 1): BB, CMi; Implementing search strategies (step 2): BB; Selection of potentially useful abstracts (step 3): BB, FARL, EP, AAPR; Classification of potentially useful abstracts (step 4): BB, MC, PVEE, FG; Selection of articles for the full-text retrieval (step 5): BB, CC, MC, PVEE, FG, NP, KCV, BP; Retrieving full-text articles (step 6): BB, MC, PVEE, VM, KCV; Extraction of data from the full-text articles (step 7): BB, CC, MC, PVEE, FG, NP, KCV, SW, BT; Organization of the GUiDEG: WC, CMu, GR; Coordination of the GUiDEG: BB, CMi, CMu, MN, NP, GR, TV; Development of the web-system
kidneyepidemiology.org/gbd: BB; Consultation for systematic review forms: MT, NW; Preparation the first draft of the manuscript: BB; Production of the final version of the manuscript: BB, NP, GR.