Introduction
Clinical practice guidelines (CPGs) play an integral part of medical practice, policy, and politics; and wherever possible should be informed by systematic reviews of the best available evidence, while considering benefits and harms. Clinical practice guideline development groups, researchers and clinicians enjoy considerable resources to support and inform guideline development [
1], reporting [
1‐
3] and critical appraisal, all developed by a variety of guideline organisations across the globe [
4‐
7]. However, despite the advances of such international repositories, the variation in the quality of CPGs in different countries and regions around the world, speaks to its overall complexity and multiplicity [
8‐
10]. Moreover, development approaches varies (and perhaps rightfully so), especially across countries, ranging from developing guidelines de novo (starting anew) to adopting or adapting CPGs to local contexts [
11]. Approaches such as these that rely on using existing high-quality guidelines instead of de novo, provide opportunities to save time and resources, especially relevant in resource-poor settings.
For example, in a study evaluating African hypertension guidelines, recommendations were reported as a mixture of standard treatment guidelines (adapting), WHO guidelines (adopting) and de novo CPG development [
12]. Notably, low- and middle-income countries (LMICs) continue to face increasing challenges and complexity, in terms of developing and implementing high quality CPGs; not only regarding capacity and funding, but also an increased burden of diseases (especially infectious diseases), healthcare worker shortages, and weaker health systems [
13]. This was recently noted anew in relation to COVID-19 responses and its challenging effect on evidence synthesis groups in LMICs specifically [
14], contributing to the call for guideline developers to stratify CPG recommendations more effectively for low-resource settings [
15].
Over the last two decades, steady inroads have been made in terms of levelling CPG quality using quality appraisal tools [
16]. These tools have become somewhat of a landscape architect, supporting journal editors when reviewing guidelines, underpinning and assessing guideline validity, and allowing the trust placed in guidelines, to be strengthened. The internationally accepted standard for the quality appraisal of CPGs, is the Appraisal of Guidelines for Research and Evaluation (AGREE) tool [
17‐
19], notably the latest AGREE II tool and others (AGREE-REX and -GRS). This tool is comprised of six domains including ‘Scope and purpose’, ‘Stakeholder involvement’, ‘Rigor of development’, ‘Clarity of presentation’, ‘Applicability’, and ‘Editorial independence’. Appraisal of guidelines using these six domains allows CPG developers, researchers, and decision makers to critically evaluate fundamental elements of guideline construction, quality, and implementation ability.
The AGREE tools have been used for a variety of reasons, from appraising individual CPGs for guideline adaptation, to appraising specific grouped CPGs, including: a sample from the National Guideline Clearinghouse when the AGREE-REX tool was developed [
20]; several mixed medical topics [
16], as well as targeted medical topics [
21]. Multiple countries and regions worldwide have assessed their local, national, or regional CPGs with this tool and reported on this in methodological or review synthesis studies [
10,
22‐
24]. Some countries periodically appraise CPGs using AGREE to gauge progression of guideline quality over time [
8,
25,
26]. Added to this, studies have presented quality assessments of CPGs regarding specific disciplines, across various regions [
12,
27,
28]. However, there is a paucity of evidence that has sought a worldwide overview; evaluating and comparing how studies focused on assessing the quality of guidelines in specific countries and/or regions, exploring whether the CPG quality differ among these jurisdictions.
This scoping review aimed to fill this knowledge gap by describing and mapping national and regional CPG synthesis studies that used the most recent versions of the AGREE tool (i.e., AGREE II, AGREE REX and AGREE GRS) towards unpacking how AGREE is used and reported in guideline assessment studies. This allowed a comprehensive global view of the characteristics of these national and regional synthesis studies; quantity and quality of the included CPGs; and a global and regional evaluation of the six AGREE domain scores. Additionally, it allowed a focus on specific domains
Rigor of development and
Editorial independence. These two important domains have been historically considered [
20], and again recently indicated [
29], as having the most direct effect on overall CPG content quality.
Methods
This scoping review described the methodology and characteristics of CPG synthesis studies and its included CPGs; and subsequently mapped and compared studies that used later versions of the AGREE tool, to assess CPG quality country-wide and/or regionally. This includes describing the use of the AGREE tool, comparing domain scores, and ascertaining use of the overall assessment. A protocol was developed a priori (Additional file
1: Appendix A) and this study was conducted in accordance with the Joanna Briggs Institute methodology for scoping reviews [
30], where results were reported according to the Preferred Reporting Items extension for Scoping Reviews (PRISMA-ScR) [
31].
Search strategy
A predefined search strategy was used to conduct a comprehensive search in the following databases: Embase (Ovid), Medline (Pubmed), Epistemonikos, and grey literature (Web of Science grey literature, greylit.org, and contacting key experts). The search was conducted on 5 October 2021. Studies published in any language were included until full-text stage. The full Medline and Embase search strategies are listed in Additional file
1: Appendix B.
Eligibility criteria
Participants
Secondary research on CPG quality including scoping reviews; methods studies (including meta-epidemiological studies); reviews; systematic reviews of CPGs; and evaluation/analysis of quality of CPGs were considered. Synthesis studies on country-wide, regional, and topical CPGs were considered. Grey literature including thesis, dissertations and unpublished studies were considered. Exclusions based on study types, were international scoping reviews collating different countries into a topical review.
Concept
Any guideline synthesis study that used AGREE II, AGREE GRS (Global Rating Scale: a short item tool especially useful when time and resources are limited), and AGREE-REX (designed to evaluate the clinical credibility and implementation of CPGs) were considered. This tool uses six domains with 23 items, each scored 1–7 (strongly disagree to strongly agree) as well as two overall assessments. The overall assessment requires each assessor to firstly rate the overall quality of the guideline (on a scale of 1–7) and secondly to make a judgement as to whether this guideline is recommended for use (i.e., recommended; recommended with modifications; or not recommended). Exclusions included studies that used other tools or scores to appraise the quality of its CPGs.
Context
All countries or regions worldwide and their medical specialities and sub-specialities, including allied health and traditional medicine, were considered. Regions according to WHO, United Nation or Sustainable Development Goals (SDG) regional groupings were considered. Only CPGs answering human, health-related questions were considered. Exclusions included humanitarian, military combat, health-system related and non-human studies.
Study/source of evidence selection
We exported the retrieved records into a Mendeley Library and subsequently uploaded it to the Rayyan web platform [
32] and removed duplicates. Two reviewers (MMA, SS) screened titles and abstracts independently for assessment against the inclusion criteria for the review. Potentially relevant records were retrieved in full, and citation details imported into a Microsoft Excel sheet. A single reviewer (MMA) assessed the full text of selected records in detail against the inclusion criteria. At full-text screening stage, only English (or studies translated into English), and Spanish studies were included. Reasons for exclusion of sources at full text screening stage are reported in the table of excluded studies (see Additional file
1: Appendix C). Any disagreements between the reviewers at each stage were resolved through discussion.
Data extraction and analysis
A single reviewer (MMA) extracted data from most included records using a data extraction tool (created in Microsoft Excel), assisted by a reviewer extracting the four Spanish records (IF) and checked by another reviewer (SS). This tool was piloted on a small sample of possible included studies, identified in a previous review. Only English and Spanish records were included, as the study was limited in its access to translation services. Data extracted included study types and methodology; characteristics of included CPGs; and AGREE tool use including domain and overall score results. Included synthesis studies were the units of analysis, and simple descriptive statistics was used to conduct the analysis using STATA 14.
Categorical variables were described as percentages; whereas continuous variables were described by means and standard deviations (sd) where data was normally distributed, otherwise reported as median and interquartile range. Data normality was determined graphically and using the skewness-kurtosis test. Studies that calculated median overall scores for AGREE domains were converted to a mean score, as recommended by Hozo et al. [
33]. This allowed one standard summary statistic across domains. Regions were measured according to United Nation Sustainable Development Goals (SDG) regional groupings [
34], however other regional groupings were also considered. SDG regions were chosen, due to the meaningful geographical presentation of different regions with comparable income status.
AGREE scores were analysed across subgroups into one of the seven SDG regions. A list of included studies is found in Additional file
1: Appendix D.
Conclusion
When looking at the landscape of guideline quality, there has been various attempts to level the playfield and inroads have been made. There is a current tendency to critically evaluate guideline quality in country-wide and regional approaches and AGREE is overarchingly used well in this practice. However, guideline Rigor of development varies between HICs and LMICs, necessitating building further guideline development capacity, including use of GRADE for guidelines. Improved reporting of funding and competing interests, as well as guideline development approaches and their underlying evidence sources, can further enhance regional quality of guidelines. Assessing country-wide or regional guidelines with quality appraisal tools, could advance overall guideline quality for all areas globally. This is an important step forward and toward global guideline uniformity, as clinicians are in dire need of high-quality guidelines to improve delivery and quality of care.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.