Background
Thyroid nodules are a common clinical problem. They are more common in women, and their incidence increases with age. Epidemiologic studies have shown the prevalence of palpable thyroid nodules in iodine-sufficient parts of the world to be approximately 5% in women and 1% in men [
1,
2]. Thyroid nodules are clinically important because they can indicate thyroid cancer, which occurs in 5% to 15% of the population, depending on age, sex, history of radiation exposure, family history, and other factors [
3,
4]. Thyroid cancer is the most common malignant endocrine tumor, but represents approximately 1% of all malignancies [
5]. Differentiated thyroid cancer (DTC), which includes papillary and follicular cancer, comprises the vast majority (90%) of all thyroid cancers [
6].
The standard for the diagnosis and management of thyroid nodules and cancer is still inconclusive. Several theories and practices, including the indication of fine-needle aspiration (FNA), the role of the thyroid scan, the extension of thyroid surgery for DTC, the role of cervical lymph-node dissection, and the indication of radioiodine ablation (I
131), are questionable. These issues need to be clearly addressed by valid, reliable, independent, and easily applicable clinical practice guidelines (CPGs). Several notable organizations have developed guidelines containing recommendations for thyroid nodules and cancer management. However, guidelines on the same topic can conflict with each other, and the quality and independence of the guidelines are of concern. Therefore, clinicians require guidelines that are systematically developed, and that provide transparent estimates of the benefits and harms of interventions [
7‐
9].
The Appraisal of Guidelines, Research, and Evaluation (AGREE) instrument is a tool used for thoroughly assessing the quality of guidelines [
10]. The original AGREE instrument was published in 2003 by a group of international guideline developers and researchers, the AGREE Collaboration. The updated version, the AGREE-II instrument, was released in 2010 and was funded by the Canadian Institutes of Health Research [
11]. AGREE has become the standard in the evaluation and development of CPGs [
12,
13]. Using the AGREE-II instrument, we systematically reviewed and assessed the quality and consistency of the recommendations of CPGs on the diagnosis and management of thyroid nodules and cancer.
Methods
Selection criteria
We selected CPGs that provided recommendations on the diagnosis and management of thyroid nodules or cancer. For inclusion in our study, the guidelines were required to (1) have published in English, and (2) examine all subgroups of the population to ensure that the CPGs catered for the needs of those with comorbidities in various settings. When more than one set of guidelines was produced by the same professional body, only the most recently issued was considered. We excluded guidelines that (1) focused exclusively on thyroid disease among special groups (for example, anaplastic thyroid cancer, pregnant women or children); (2) focused entirely on a unique technique, such as the procedure guideline for radioiodine therapy; (3) concentrated on a non-nodular disease, such as thyroid dysfunction; (4) contained recommendations for other diseases, such as neuroendocrine tumors or head and neck cancer; or (5) reported non-original recommendations (referring to other sets of guidelines).
Search strategy and guideline selection
Two reviewers (K-WT and T-WH) searched for relevant studies using keyword searches of the following electronic databases: MEDLINE, EMBASE, CINAHL, the National Guideline Clearinghouse, the National Institute for Health and Clinical Excellence, the Scottish Intercollegiate Guidelines Network (SIGN), and the Guidelines International Network (G-I-N) International Guideline Library. The following terms and Boolean operators were used in MeSH and free-text searches: thyroid, cancer OR carcinoma OR neoplasm, nodule OR mass OR tumor, and guidelines OR recommendations. The ‘related articles’ facility in PubMed was used to broaden the search. The last search was performed in June 2013.
Recommendation extraction and analysis
Two reviewers (K-WT and T-WH) independently extracted the details of the guidelines pertaining to the CPG characteristics (for example, country or region, year of dissemination, development team, and funding organization), the goals of the guidelines, the target population and audience, the recommendations related to the diagnosis of thyroid nodules, the recommendations related to the management of thyroid nodules and cancer, and the evaluation of options for postoperative follow-up. The individually recorded decisions of two reviewers were compared, and any disagreement was resolved based on the evaluation of a third reviewer (J-HL).
We constructed a table to compare the recommendations from the selected guidelines. The table was divided into the following sections and items, based on the types of clinical practices that focus on thyroid nodules and cancer: (1) diagnosis: an indication of FNA, the role of routine serum calcitonin, and an indication of a thyroid scan; (2) treatment: an indication of total thyroidectomy for DTC, and the role of cervical lymph node dissection in node-negative patients; and (3) postoperative care: an indication of I131 ablation, and a target level of thyroid-stimulating hormone (TSH) suppression therapy.
Guideline quality assessment
Four investigators (K-WT, T-WH, J-HL and M-YW) independently appraised all the selected guidelines by using the AGREE-II instrument [
10]. AGREE-II consists of 23 key items organized into 6 domains: (1) ‘scope and purpose’thinsp;, (2) ‘stakeholder involvement’thinsp;, (3) ‘rigor of development’thinsp;, (4) ‘clarity and presentation’thinsp;, (5) ‘applicability’thinsp;, and (6) ‘editorial independence’. Each domain captured a separate dimension of the guideline quality with a seven-point scale (from 7 (strongly agree) down to 1 (strongly disagree)). For each reviewer, AGREE-II scores were calculated as a percentage by using the sum of the seven-point scale and the maximum possible score (range 0% to 100%). Item scores were discussed by the four reviewers, and large scoring discrepancies (defined as ≤3 points difference in the score assigned by the appraisers to the same item) were resolved by consensus. We considered satisfactory any guideline that scored at least 50% in all six domains, as defined by AGREE-II. Upon completing the 23 items, each reviewer provided an overall assessment of the guideline. We compared the mean values of each of the six domain scores and the overall scores obtained by the four reviewers to evaluate the possible risk of bias and the recommendation for future use for each CPG appraised.
Discussion
This study assessed the quality and consistency of the recommendations of international CPGs on the diagnosis and management of thyroid nodules and cancer to assist physicians in considering the appropriate recommendations. We identified ten guidelines involving thyroid nodules and cancer management, three of which had been published between 2000 and 2007 [
16,
19,
22]. As a general rule, CPGs should be reassessed for validity every 3 years [
24,
25]. Therefore, one of the CPGs reviewed here are likely to be outdated because they have not been updated in over 10 years [
25]. A distinct variation in the applicability and transparency of funding sources was found among the guidelines. After applying the AGREE-II instrument to the ten guidelines, we found that guidelines developed by the BTA [
16], the IKNL [
19], and the NCCN scored above 50% in all six domains [
21]. Moreover, the view of the target population (domain 5) on guideline development was inadequate in all ten guidelines. We found differences among guidelines with respect to the indication of FNA in low-suspicion nodules, the routine measurement of serum calcitonin, and the role of cervical lymph node dissection in node-negative patients.
The application of AGREE-II allows for an evaluation of the various aspects of guidelines. We measured the development methods of the guidelines by using the AGREE-II instrument, based on the rationale that a high methodological quality is fundamental for the integrity, reproducibility, and transparency of guidelines. Our study showed that the methodological quality of the guidelines was optimal for ‘scope and purpose’ and ‘clarity and presentation’, but received the lowest scores for ‘applicability’. Most guidelines lacked explicit statements on whether the patients’ views and preferences had been sought (item 5), whether the various options for management of the condition were clearly presented (item 16), whether the potential cost implications of applying the recommendations were considered (item 20), and on key review criteria for monitoring and/or auditing purposes (item 21). Although the AGREE-II instrument provides six independent scores for six corresponding aspects of the guidelines, we believe that clinicians would be more concerned about the ‘rigor of development’. However, the quality of ‘applicability’ domain also plays a critical role in implementation of the guideline. For a guideline to be effective, it should provide advice as to how the recommendations can be implemented, it should present a discussion of the potential impact of recommendations on resources, and it requires clearly defined criteria derived from the key recommendations. Therefore, we recommend that clinicians rely preferentially on the guidelines that performed better regarding the ‘applicability’ domain [
16,
19,
21].
The AGREE-II instrument is used to establish a universal standard for the rigor and transparency of guideline development, and to suggest how to improve existing guidelines [
11]. However, some limitations exist. One serious limitation concerns conflicts of interest. The AGREE-II instrument advocates that guidelines always report clearly, irrespective of whether conflicts exist, but several of the guidelines lacked statements about conflicts of interest. The use of a guideline-adaptation framework such as ADAPTE should be considered to develop high-quality CPGs in the future [
26].
In general, the recommendations of the CPGs on the diagnosis, management, and postoperative care of thyroid nodules and cancer were consistent, despite the discrepancies between scores for the ‘rigor of development’. All guidelines advocated thyroid sonography, as well as measured the TSH and free thyroxine levels in all patients. However, no firm recommendations were made for the routine assessment of serum calcitonin and the indication of thyroid scintigraphy. Similarly, major differences across the CPGs were related to the indication of radioiodine ablation and the optimal level of TSH suppression therapy in patients with DTC (Table
4). Such situations revealed that, even when CPG developers claimed to have paired their grade of recommendations with the level of evidence, recommendations were not graded or were inconsistent. This variation may be related to the developers’ search strategy, the process of selecting scientific evidence, and the way the recommendations had been formulated [
27,
28].
In prospective trials, conclusions regarding the optimal selection of treatments must be based on retrospective analysis and the consensus of expert opinions [
15‐
17,
29]. Several CPGs recommend total thyroidectomy if the primary tumor is at least 1 cm in diameter or if extrathyroidal extension or metastases are present [
15‐
18,
22], but some guidelines advise that total thyroidectomy may be performed in patients with large tumors (>4 cm) in the absence of clinical suspicion [
21,
23]. Whereas some CPGs recommend considering routine central-neck dissection for most patients with papillary thyroid cancer [
15,
16,
20], the guidelines from the NCCN recommend only central-neck dissection in the presence of grossly positive metastasis [
16,
19,
21].
The strengths of our review included a comprehensive search for eligible guidelines, the systemic and explicit application of eligibility criteria, the careful consideration of guideline quality by using the AGREE-II instrument, and a rigorous analytical approach. Therefore, this study can be of additional value to already available guideline compendia and libraries such as the National Guideline Clearinghouse and the National Institute for Health and Clinical Excellence because these libraries depend on submissions from guideline organizations. However, several limitations could have biased our study. First, only CPGs written in English were included, and guidelines written entirely in other languages might have been overlooked. Second, CPGs that focus on non-DTC such as medullary thyroid cancer or unique techniques such as procedural guidelines for radioiodine therapy were excluded from our study. Third, the AGREE-II instrument is used to evaluate the guideline as a whole, and is not intended for specific, individual recommendations. However, a global appraisal on a guideline’s construction process may reflect the strength of the individual recommendations to an extent. Finally, we used only the AGREE-II instrument in evaluating the quality of the guidelines. Other instruments such as the four-item Global Rating Scale (GRS) may also play a role in guideline assessment [
30]. Although the GRS is less sensitive than AGREE-II in detecting differences in guideline quality, its items did predict outcome measures related to guideline adoption.
Competing interests
The authors declare they have no conflicts of interest or financial ties to disclose.
Authors’ contributions
K-WT and T-WH devised the study. K-WT, T-WH, J-HL, and M-YW extracted, analyzed and interpreted the data. K-WT and T-WH wrote the first draft. All authors contributed to subsequent versions, and approved the final article. K-WT is the corresponding author. All authors read and approved the final manuscript.