Development of the list of criteria
Fifteen documents with recommendations and 16 with assessment tools for health information were identified. Among the recommendations, n = 2 referred to cancer screening [
16,
18], n = 1 to screening [
19], and n = 1 to orthopedic interventions [
20]; n = 11 had no special focus [
4,
9,
21‐
29]. Among the assessment tools, n = 1 referred to colorectal cancer screening [
15], n = 1 to diagnostic breast tests [
30], n = 3 refer to mammography screening [
31‐
33], and n = 2 to patient decision aids [
14,
34], and n = 7 had no special focus [
11,
12,
35‐
41]. Criteria for assessing health information were systematically extracted from these documents, and the single criteria were grouped into seven categories (Table
1): formal issues, information on CRC screening, information on screening colonoscopy, information on the fecal occult blood test, readability/comprehensibility, layout and neutrality and balance. These categories were further aggregated into four domains, one representing CRC-specific content issues and three describing generic issues applicable to different cancer screening procedures.
Table 1
Content structure of the list of criteria for evaluating consumer information materials on colorectal cancer (CRC) screening (n = 230 criteria*)
Specific
|
A. Content issues (130)
| Information on CRC and CRC screening (32) | CRC screening (12) | Reported: yes / no |
| | |
Aetiology and epidemiology of colorectal cancer (German data) (20)
| Correct: yes / no / unclear |
| | | | Presentation format: text / number / chart / Table / figure |
| | | | Evidence level reported: yes / no / lack of evidence indicated |
| | | | Inclusion of quotes / notes |
| | Information on screening colonoscopy (66) | Colonoscopy preparation (7) | Reported: yes / no |
| | | Colonoscopy sedation (4) | Correct: yes / no / unclear |
| | | Procedure (13) | Presentation format: text / number / chart / Table / figure |
| | | Test characteristics (7) | |
| | | Conduct in response to test results (3) | Evidence level reported: yes / no / lack of evidence indicated |
| | |
Benefit (disease-specific incidence and total mortality) (9)
| |
| | | | Inclusion of quotes / notes |
| | |
Risks and adverse effects including overdiagnosis (23)
| |
| | Information on FOBT (32) | Procedure (9) | Reported: yes / no |
| | | Test characteristics (8) | Correct: yes / no / unclear |
| | | Conduct in response to test results (3) | Presentation format: text / number / chart / Table / figure |
| | | Benefit (disease-specific incidence and total mortality) (9) | |
| | | | Evidence level reported: yes / no / indication of lack of evidence |
| | | Risks and adverse effects including overdiagnosis (3) | |
| | | | Inclusion of quotes / notes |
Generic
|
B. Formal issues (33)
| Formal issues (33) | Author and stakeholders involved (14) | Reported: yes / no |
| | | Editorial independence (6) | Inclusion of quotes / notes |
| | | Sources and currentness of data (8) | |
| | | Aim and target group (5) | |
|
C. Presentation & understandability (59)
| Readability / comprehensibility (29) | Language (18) | Present: yes / mostly yes / mostly no / no / not applicable |
| | |
Sentences (4)
| |
| | | Content structure (3) | Inclusion of quotes / notes |
| | |
Numerical data (4)
| |
| | Layout (30) | Structure (11) | Present: yes / mostly yes / mostly no / no / not applicable |
| | | Writing/font (6) | |
| | |
Visual elements (9)
| Inclusion of quotes / notes |
| | | Design (4) | |
|
D. Neutrality & balance (7)
| Neutrality and balance (7) | Calls for participation (1) | Present: yes / no / unclear |
| | | Fear / downplay (4) | Inclusion of quotes / notes |
| | | Uneven presentation of procedures (2) | |
Table 2
Dimensions of the list of criteria (excerpt)
Overall risk of adverse effects of screening colonoscopy is indicated | □ yes | □ yes | □ text | □ yes | |
□ no | □ no | □ number | □ no |
□ unclear | □ chart | □ lack of evidence indicated |
□ table |
□ image |
Risk of pain is indicated | □ yes | □ yes | □ text | □ yes | |
□ no | □ no | □ number | □ no |
□ unclear | □ chart | □ lack of evidence indicated |
□ table |
□ image |
Risk of cardiovascular symptoms is indicated | □ yes | □ yes | □ text | □ yes | |
□ no | □ no | □ number | □ no |
□ unclear | □ chart | □ lack of evidence indicated |
□ table |
□ image |
The preliminary list of criteria was modified in response to the experts’ reviews, mainly by including additional criteria (e.g. inability to drive after sedation, further risks in the preparation phase of colonoscopy, possibility of being unable to work on the day of examination, and the need to sign a consent form and give a blood sample before the examination).
Final list of criteria
The final list of criteria contains 230 criteria (Table
1). Most of the single criteria are rated multi-dimensionally: reporting: yes/no; correctness: yes/no/unclear; presentation: text, numbers, diagrams, tables and/or images; level of evidence: yes, no, lack of evidence indicated (Table
2). To enhance the rating transparency of each criterion, space for free text is provided for verbatim quotes or reported numbers, to document whether a number was presented as a natural frequency [
42,
43], and to specify whether a denominator was included, etc.
Elements of the four domains are explained in detail below, including assessment examples, where appropriate.
Domain A: (CRC-specific) content issues
Domain A includes three categories (see Table
1). The subtopic “Information on the etiology and epidemiology of CRC” of the category “Information on CRC and CRC screening” is presented in Table
3 to elucidate the procedure for detection of epidemiological frequencies. It becomes clear that not all criteria have to be met for information material to qualify as being of high quality. Examples of how information in flyers and brochures of this category were assessed are shown below.
Table 3
Criteria for the aetiology and epidemiology of CRC (n = 20) (Domain A, Category: Information on CRC and CRC screening)
1 | Meaning of premalignant conditions like polyps is stated. |
2 | Frequency of polyps/adenomas is stated. |
3 | Risk factors are stated. |
4 | Protective measures are stated. |
5 | Incidence is stated. |
6 | Sex-specific incidence is stated. |
7 | Age-specific incidence (age-stratified incidence) is stated. |
8 | Mortality is stated. |
9 | Sex-specific mortality is stated. |
10 | Age-specific mortality is stated. |
11 | Residual lifetime disease risk is stated. |
12 | Residual lifetime risk of death is stated. |
13 | Age-specific disease risk within a given time interval is stated. |
14 | Age-specific mortality risk within a given time interval is stated. |
15 | The disease risk compared to other cancer disease risks is stated. |
16 | The disease risk compared to other risks is stated. |
17 | The mortality risk compared to other cancer mortality risks is stated. |
18 | The mortality risk compared to other risks of death is stated. |
19 | The natural course of CRC is stated. |
20 | Incidence and mortality are not stated in one sentence. |
Example 1: “CRC is the second most common type of cancer in both men and women.”
This statement would be rated as criterion 15 (Table
3):
–Reported? “Yes”
–Correct? “Yes”
–How presented: “Text” (not “Number”)
–Evidence level reported? “Not applicable”
–Quotes: Citation
Example 2: “22,000 people die each year from CRC.”
This statement would be rated as criterion 8 (Table
3):
–Reported? “Yes”
–Correct? “No” (number is too low for Germany)
–How presented: “Number”
–Evidence level reported? “No”
–Quotes: “Denominator is lacking, outdated number”
Both examples show the importance of having a manual that provides the correct answers and numbers and, in the second case that defines what extent of deviation from the actual number is acceptable as “correct”. Therefore, the manual is a core part of the list of criteria.
The categories of the two screening procedures, fecal occult blood test and colonoscopy, are constructed similarly. They begin with information on the procedure itself and are supplemented by further criteria on colonoscopy preparation and sedation. Both procedures incorporate criteria on test characteristics (such as sensitivity, specificity, predictive value), on conduct in response to test results and, most importantly, on benefit and risks, including overdiagnosis.
Table
4 shows the criteria on the subtopics of benefits and risks of screening colonoscopy. Benefits include three relevant outcomes: CRC incidence, CRC mortality and all-cause mortality. Each outcome is divided into absolute and relative risk reduction and the number needed to screen. Risk criteria for screening colonoscopy are divided into risks during colonoscopy preparation (including colon cleansing), risks related to adverse effects of sedative drugs, risks of the procedure itself, and risks of overdiagnosis. The subject of overdiagnosis is included because it is known to occur in cancer screening to a varying extent depending on the type of cancer [
44‐
46]. Nevertheless, the extent of overdiagnosis or overtreatment of harmless polyps that would never turn into cancer in colorectal cancer screening is unknown and may be low as there are strong hints that colonoscopy will decrease CRC incidence like it is already shown for flexible sigmoidoscopy-based screening [
47]. The rating procedure for benefits and risks is illustrated below.
Table 4
Criteria on benefits (n = 9) and risks (n = 23) of screening colonoscopy (Domain A, Category: Information on screening colonoscopy)
Outcome: CRC incidence |
1 | Absolute risk reduction is stated. |
2 | Relative risk reduction is stated. |
3 | Number needed to screen is stated. |
Outcome: CRC mortality |
4 | Absolute risk reduction is stated. |
5 | Relative risk reduction is stated. |
6 | Number needed to screen is stated. |
Outcome: All cause mortality |
7 | Absolute risk reduction is stated. |
8 | Relative risk reduction is stated. |
9 | Number needed to screen is stated. |
Risks of screening colonoscopy
|
Preparation |
1 | Common risk of side effects is stated. |
2 | Risk of cardiovascular symptoms is stated. |
3 | Risk of nausea is stated. |
4 | Risk of allergies is stated. |
5 | Risk of cramps is stated. |
6 | Risk of pain is stated. |
Sedation |
7 | Common risk of side effects is stated. |
8 | Risk of respiratory distress/failure is stated. |
9 | Risk of cardiovascular symptoms is stated. |
10 | Risk of Nausea is stated. |
Procedure itself |
11 | Common risk of side effects is stated. |
12 | Number needed to harm is stated. |
13 | Risk of pain is stated. |
14 | Risk of cardiovascular symptoms is stated. |
15 | Risk of Nausea is stated. |
16 | Risk of bleeding is stated. |
17 | Risk of infection is stated. |
18 | Risk of perforation is stated. |
19 | Risk of mortality is stated. |
Overdiagnosis |
20 | Risk of overdiagnosis/overtreatment is stated. |
21 | Frequency of overdiagnosis is stated. |
23 | Consequences of overdiagnosis are stated. |
Example 3: “According to experts, more than three-quarters of CRC patients could be saved by early screening colonoscopy.”
This statement would be rated as criterion 5, CRC mortality (Table
4: Benefits):
–Reported? “Yes”
Correct? “Yes”
–Presented as: “Number”
–Evidence level reported? “Yes”
–Quotes: “No natural frequency, denominator is lacking, no absolute risk reduction is given, evidence from level 3 (case-control) studies is falsely presented as experts’ evidence”.
Example 4: “80% of all CRCs can be prevented by screening colonoscopy.”
This statement would be rated as criterion 2, CRC incidence (Table
4: Benefits):
–Reported? “Yes”
–Correct? “Yes”
–Presented as: “Number”
–Evidence level reported? “No”
–Quotes: “No natural frequency, denominator is lacking, no absolute risk reduction is given, evidence from level 3 (case–control) studies
Example 5: “… is a harmless drug preparation”
This statement would be rated as criterion 1 (Table
4: Risks):
–Reported? “Yes”
Correct? “No”
–Presented as: “Text”
–Evidence level reported? “No”
–Quotes: Citation
Example 6: “… no pain”
This statement would be rated as criterion 12 (Table
4: Risks):
–Reported? “Yes”
–Correct? “No”
–Presented as “Text”
–Evidence level reported? “No”
–Quotes: Citation
Domain C: presentation and understandability (generic)
“Understandability (readability/comprehensibility)” assesses the language, sentences, content structure, and numbers of information materials, whereas “presentation (layout)” concerns the structure, font, visual elements and design of the materials (see Table
1). These criteria (e.g., “Sentences are of appropriate length”) require more detailed rating, such as that achieved by four response categories. Therefore, all criteria in this domain were rated on a four-point-scale (yes / mostly yes / mostly no / no). Furthermore, it makes no sense to rate the correctness of these criteria. For most of the assessments in this domain, it is essential to aggregate information: For example, when assessing the length of a sentence, the assessor must search the entire health information material for sentences that are too long. To ensure an unambiguous assessment, the manual should provide a definition of what is “too long” and what proportions of run-on long sentences should lead to which specific ratings. Table
5 provides a detailed list of criteria for sentences, numbers and visual elements followed by a rating example for this category.
Table 5
Criteria for sentences (n = 4), numerical data (n = 4) and visual elements (n = 9) (Domain C, sub topics from both categories)
|
Sentences
| |
Visual elements
|
1 | There is one message per sentence. | 1 | Visual elements are included. |
2 | Sentences are of appropriate length. | 2 | Drawings are used instead of photos. |
3 | Complex sentences are avoided. | 3 | Visual elements are explained in the text. |
4 | Identical repetitions are avoided. | 4 | The explanatory text is near the visual element. |
|
Numerical data
| 5 | The visual element is not surrounded by text. |
1 | Natural frequencies are used. | 6 | Visual elements are clearly labeled. |
2 | Reference parameters are given. | 7 | Biased scaling is avoided. |
3 | Same denominators are used. | 8 | Important spots of the visual element are marked by arrows, circles etc. |
4 | Loss and gain framing is balanced. | 9 | Visual elements include a legend. |
Example 7: “every year 70,000 persons are newly diagnosed with colorectal cancer.”
Provided that this is the only number given in the health information, this statement would be rated as follows:
–Numerical data criterion 1: Natural frequencies are used? “Yes”
–Criterion 2: Reference parameters are given? “No”
–Criterion 3: Same denominators are used? “No”
–Criterion 4: Loss and gain framing is balanced? “No”
Usually, several numbers are stated in a text. In that case, an aggregated assessment is required.
Domain D: neutrality and balance (generic)
The last domain comprises seven criteria for assessment of neutral and balanced presentation:
1 “Is free of persuasive language”
2 “Is free of scare language”
3 “Is free of scary pictures or graphs”
4 “Is free of fear appeals”
5 “Is free of downplay or minimization”
6 “Is free of one-sided presentation of benefits without risks”
7 “Is free of unbalanced presentation of screening procedures”
The first five criteria are rated “no”, if any persuasive, scary or down-playing language is used to increase participation in screening. We initially defined these criteria as “Does not contain….”, but this phrase was abandoned because the possible double-negative reply might be confusing. The last two criteria combine benefits and risks and presentation of the procedures. To handle this aggregate information, careful operationalization within the manual is needed. Rating examples for this category are given below.
Example 8: “… should participate in bowel cancer screening.”
This statement would be rated as criterion 1:
Met? “Yes”.
Example 9: “… is a wicked disease”
This statement would be rated as criterion 2:
Met? “Yes”
Applications / practicability
For trained reviewers, the assessment takes about 15–30 minutes for flyers and 15–45 minutes for brochures. Documentation of the corresponding citations took up much of the time. Although this approach may be time-consuming, it may hasten consensus and, most importantly, ensures the transparency of quality assessment.
Inter-rater reliability was not evaluated because the final assessment was achieved by consensus in each case. Discrepant findings were mainly caused by overlooked aspects. Consensus usually took 5 to 15 minutes. Finally, data entry is very time-consuming due to the citations.