Background
PHASE I Topic scoping and work plan development | PHASE II Data abstraction and critical appraisal | PHASE III Data analysis and synthesis | PHASE IV Reporting and interpretation | |
---|---|---|---|---|
AAFP [42] | ||||
AAP [43] | ||||
ACOG [44] | ||||
CTFPHC [12] | ✓ | ✓ | ||
✓ | ✓ | ✓ | ||
✓ | ✓ | ✓ | ✓ | |
✓ | ✓ | |||
NICE [13] | ✓ |
Main text
Key concepts and definitions
Term | Description/definition | Example(s) |
---|---|---|
Subgroup | The term “subgroup” describes an analysis of a subset of participants (e.g., selected set of individuals with specific patient characteristics within an individual study or across studies in the case of individual patient data meta-analyses). | “Subgroup analyses are often performed to identify characteristics within the study population that are associated with greater benefit from the intervention, with no benefit, or even with harm” [37]. |
Subpopulation | The term “subpopulation” describes a specific group of individuals with common patient characteristics (e.g., race/ethnicity, age, risk factors) that is the target of an intervention or a policy recommendation. | “If a subpopulation may not benefit from the therapy, it is important to identify the subpopulation and verify this finding in an appropriate clinical trial” [37]. |
Clinical heterogeneity | Patient characteristics • Socio-demographics • Baseline risk Study characteristics • Intervention • Comparators • Outcome measurement | |
Methodological heterogeneity | Risk of bias • Study design • Study conduct • Study analysis | |
Statistical heterogeneity | Statistical tests • I
2
• Cochran’s Q test | |
Within study | The term “within study” refers to the framework in which comparisons or analyses are conducted; in this case, researchers are examining the variation or impact of factors (e.g., populations, interventions, outcomes) within one study or trial. | "In single trials, the comparison [between subgroups] is always within studies: that is, the two groups of patients (e.g., the older and younger) or the two alternative ways of administering the intervention (e.g., higher and lower doses) were assessed in the same RCT" [26]. |
Between study | The term “between study” refers to the framework in which comparisons or analyses are conducted; in this case, researchers are examining the variation or impact of factors (e.g., populations, interventions, outcomes) across multiple studies or trials. | "The inference regarding the effect is, however, limited because this was a between study rather than a within study comparison. As a result there are a number of competing explanations for the observed differences between the high- and low-dose studies” [26]. |
Study level | The term “study level” is used to describe the unit of inquiry or data source being considered by systematic reviewers; in this case, data from a single study or trial are evaluated. | "The ideal way to study causes of true variation is within rather than between studies. In most situations however, we will have to make do with a study level investigation \and hence need to be careful about adjusting for potential confounding by artefactual factors such as study design features" [51]. |
Body of evidence level | The term “body of evidence level” is used to describe the unit of inquiry or data source being considered by systematic reviewers; in this case, data from a group of studies are evaluated. | "Systematic review and guideline authors use this [GRADE] approach to rate the quality of evidence for each outcome across studies (i.e., for a body of evidence)" [52]. |
Existing guidance
Proposed approach
Phase I: topic scoping and work plan development
Topic scoping
-
How other guideline groups have recently handled subpopulation considerations for the topic
-
How other recent, well-conducted systematic reviews have handled subpopulation considerations for the topic
-
Data on incidence, prevalence, morbidity, and mortality for the condition of interest by age, sex, race/ethnicity, and important topic-specific clinical characteristics
Potential drivers of heterogeneity of intervention effects | |||||
Screening | Baseline risk | Vulnerability/risk of harms | Variable responsiveness to preventive intervention | Specific primary and patient-important outcomes | |
A. When might screening for disease risk factors result in differing net benefits for subpopulations defined by agea, race-ethnicity, sex, or other targeting factors? | |||||
Screening test for risk factors | Does [the targeting factor] modify the prognostic significance of risk factors or change their prevalence (e.g., does family history carry the same relative risk in older individuals as in younger)? Does baseline risk for the disease outcome of interest directly vary by [the targeting factor] (e.g., risk of hypertension increases with age)? | Does screening involve invasive, complicated screening tests, and [the targeting factor] that might influence risk of screening harms or willingness to be screened? | Do risk factors for the disease outcome of interest for this screening test vary by differences in [the targeting factor]? Does presence of the [targeting factor] change approaches to risk factor detection (e.g., fat redistribution with age)? | Would patient-important outcomes or values about this test differ substantially in the subpopulation defined by [the targeting factor]? | |
Risk factor modification | Does [the targeting factor] modify the relative benefits associated with risk factor reduction (e.g., is the same degree of weight reduction in men and women associated with the same health benefits)? | Does [the targeting factor] increase the vulnerability to harms from risk factor modification (e.g., caloric restriction may induce malnutrition)? | Does [the targeting factor] modify the responsiveness to interventions for risk factor reduction (e.g., caloric restriction may not induce weight loss due to lower resting metabolism with older age)? | Would values about this intervention or about the patient-important outcomes resulting from this risk factor intervention differ substantially in the subpopulation defined by [the targeting factor]? | |
B. When might screening for diminished function result in differing net benefits for subpopulations defined by agea, race-ethnicity, sex, or other targeting factors? | |||||
Screening test for reduced function | Does risk of reduced function vary substantially by [the targeting factor]? | Does screening involve invasive, complicated screening tests, and [the targeting factor] that might influence risk of screening harms or willingness to be screened? | Does screening for reduced function vary in effectiveness or by optimal approach, depending on [the targeting factor] (e.g., is thyroid stimulating hormone equally effective in screening for hypothyroidism in all subpopulations)? | Would patient-important outcomes or values about this test differ substantially in the subpopulation defined by [the targeting factor]? | |
Usual treatment to restore function or ameliorate dysfunction | Does risk of functional impairment without treatment vary by [the targeting factor] (e.g., different natural history)? | Do potential harms of usual treatment increase due to vulnerabilities associated with [the targeting factor] (e.g., greater falls with vision correction in older adults)? | Is there variable responsiveness to interventions to improved function among those defined by [the targeting factor] that would further support the value of early intervention (e.g., visual correction also affecting cognitive development in young children)? | Would values about this functional intervention or about the patient-important outcomes resulting from this treatment to restore function differ substantially in the subpopulation defined by [the targeting factor]? | |
C. When might screening for potentially fatal or disabling conditions result in differing net benefits for subpopulations defined by agea, race-ethnicity, sex, or other targeting factors? | |||||
Screening test for fatal or disabling conditions | Does risk of disease vary substantially by [the targeting factor]? Does natural history vary substantially by [the targeting factor]? | Does screening involve invasive, complicated screening tests that might influence risk of screening harms or willingness to be screened in those with [the targeting factor]? Does risk of overdiagnosis vary by [the targeting factor]? | Does screening for potentially fatal or disabling conditions vary in effectiveness or by optimal approach, depending on [the targeting factor]? | Would patient-important outcomes or values about this test differ substantially in the subpopulation defined by [the targeting factor]? Is there heterogeneity in the components of a composite outcome by [the targeting factor]? | |
Treatment of screen-detected disease | Does screen-detected disease generally have different prognosis by [the targeting factor]? | How do treatments with potential harms, med-med interactions, or comorbidity interactions affect vulnerability/risk of harms by [the targeting factor]? | Is there variable responsiveness to preventive intervention if disease detection differs by [the targeting factor]? | Would values about this disease treatment or about the patient-important outcomes resulting from this disease treatment differ substantially in the subpopulation defined by [the targeting factor]? | |
Chemoprevention | Potential drivers of heterogeneity of intervention effects | ||||
Variable responsiveness to preventive intervention | Vulnerability/risk of harms | Variable responsiveness to preventive intervention | Specific primary and patient-important outcomes | ||
D. When might chronic or acute disease chemopreventive interventions result in differing net benefits for subpopulations defined by agea, race-ethnicity, sex, or other targeting factors? | |||||
Chemoprevention of chronic disease | Identify candidates for chemopreventive medication | Does [the targeting factor] affect baseline risk for one or more outcomes of interest and help define candidates for chemoprevention (e.g., age and risk of gastrointestinal bleeding with aspirin use)? Is [the targeting factor] part of formal risk assessment tools to identify medication candidates? | Does [the targeting factor] interact with other factors (e.g., medical history, comorbidities, cotreatments) to modify the eligibility for chemoprevention? | Do chemopreventive medications require certain supports and systems that are variously available to subpopulations defined by [the targeting factor] in order to be effective? | Does the disutility associated with the requirements for chemoprevention vary by [the targeting factor]? Are there other barriers or facilitators that vary by [the targeting factor]? |
Deliver chemopreventive medication | Is [the targeting factor] an independent risk factor for the outcomes to be prevented, thereby increasing potential absolute benefit? | Does [the targeting factor] directly increase the risk of chemoprevention-related adverse effects, or indirectly, through a greater likelihood of treatment risk modifiers, such as comorbidities, or cotreatments? | Is [the targeting factor] associated with loss of/difference in responsiveness to mechanisms of prevention? Is [the targeting factor] associated with important differences in compliance needed for benefit (e.g., dementia)? | Would patient-important outcomes or values about chemopreventive medications differ substantially in the subpopulation defined by [the targeting factor]? | |
Acute disease chemoprevention | Identify candidates for chemopreventive medication | Is there a higher baseline risk of infectious disease diagnosis or sequelae [the targeting factor]? | Does [the targeting factor] interact with other factors (e.g., medical history, comorbidities, cotreatments) to modify the eligibility for chemoprevention? | Do chemopreventive medications require certain supports and systems that vary in groups defined by the targeting factor] in order to be effective? | Are there barriers or facilitators for taking chemopreventive medication that vary by [the targeting factor]? |
Deliver chemopreventive medication | Is there a larger absolute benefit by [the targeting factor] due to higher risk of disease-related outcomes? | Does risk of harms with the chemopreventive medication vary in those defined by [the targeting factor]? | Is [the targeting factor] associated with loss of/difference in responsiveness to mechanisms of prevention? Is [the targeting factor] associated with important differences in compliance needed for benefit (e.g., dementia)? | Would patient-important outcomes or values about chemopreventive medications differ substantially in the subpopulation defined by [the targeting factor]? | |
Intervention | Potential drivers of heterogeneity of intervention effects | ||||
Baseline risk | Vulnerability/risk of harms | Variable responsiveness to preventive intervention | Specific primary and patient-important outcomes | ||
E. When might complex interventions for potentially fatal or disabling conditions result in differing net benefits for subpopulations defined by agea, race-ethnicity, sex, or other targeting factors? | |||||
Identify complex intervention candidates | Is [the targeting factor] a marker for selected deleterious health events that are amenable to complex interventions? | Is [the targeting factor] also a marker for potential harms from complex interventions? | Do complex interventions require certain supports and systems that vary by [the targeting factor] in order to be effective? | Are there barriers or facilitators of complex interventions that vary by [the targeting factor]? | |
Complex or behavioral intervention delivery | Is [the targeting factor] associated with increased baseline risk of some adverse health events (e.g., falls prevention, suicide, fatal motor vehicle accidents if age is the targeting factor)? | Most interventions have few harms other than opportunity costs. Are there any harms related to [the targeting factor] that may not be hypothesized (e.g., increased visual acuity correction as an age-related harm)? | Do certain conditions (e.g., dementia), decreased function, or inadequate environmental support affect intervention effectiveness? | Would patient-important outcomes from or values about complex or behavioral interventions differ substantially in the subpopulation defined by [the targeting factor]? |
Are there important advances in research or clinical thinking since [insert year of previous review] that would suggest looking at the same (e.g., age, sex, risk-defined) and/or other specific subpopulations (e.g., race/ethnicity, co-morbidities, co-interventions)? Which subpopulations are most important? What streams of evidence since [insert year of previous review] support your perspective? Are there key studies we should be aware of in formulating our approach to subpopulations? |
Greater benefits from screening can occur in those who are more likely to be undiagnosed, and from intervention in those at higher risk.
Does under-diagnosis vary by age, sex, race/ethnicity or other characteristics? Does absolute risk vary by age, sex, race/ethnicity or other characteristics? For which subpopulation(s) would benefit from screening and intervention be substantially greater than “average”? Why? |
Lesser benefits from screening and intervention can occur in those with competing risks, health states, or limited life expectancy, which reduce the likelihood of benefit from successful intervention or affect the ability to accurately screen for this condition.
Are there subpopulations that might be substantially less likely to benefit from detection and intervention? Why? |
Do the values that patients place on important outcomes (benefits or harms) associated with this topic differ by age, sex, race/ethnicity or other characteristics? Please be specific. Based on your answers to these questions, which subpopulations differ substantially enough in the likelihood of benefits (and/or risk of harms) from screening and intervention of [insert topic] that they may warrant different clinical preventive recommendations? What criteria would you use to define these clinically relevant subpopulations? Should this topic be scoped to specifically include a high-risk approach in addition to (or instead of) a general population approach? What are the validated risk assessment tools that are applicable to this topic? Are some of the tools better than others for framing a potential high-risk approach to [insert topic]? Do any tools vary in their applicability to specific subpopulations based on age, sex, race/ethnicity, comorbidities, or other factors? Is the epidemiological information below [paste data below this question] that we have located to frame this topic complete, current, and representative of the issues for subpopulations in [insert topic] (i.e., Do the data adequately capture the extent to which death or morbidity from [insert condition(s)] differ by age, sex, race/ethnicity, or other clinical characteristics?)? Are there other data sources we should use to frame this topic? |
Work plan development
Potential subpopulation | Applicable to review updates only | Applicable to new reviews and review updates | ||||
---|---|---|---|---|---|---|
(A) Previous systematic review’s approach for this subpopulation | (B) Previous separate subpopulation recommendation statement? | (C) Importance of a priori designation | (D) Rationale for importance determination for this review | (E) Policy context | (F) Proposed work plan approach | |
Age | - Age not explicitly addressed in key questions - Reported results of age-specific subgroup analyses from primary papers - Recommendation statement cites substantial evidence for differential benefits by age in the form of risk assessment tables | Yes | High | Increasing potential benefit for aspirin as people get older due to increased baseline risk for cardiovascular disease (CVD); this is balanced against increasing potential harm as people get older and experience increased risk for gastrointestinal bleeding. | - Addressed in recent meta-analyses - Age is a principal component of CVD risk and risk assessment; there is wide availability of validated risk assessment tools including age in user-friendly formats | In 2009, the USPSTF recommended aspirin for men 45–79 and women 55–79 when the potential benefit of CVD event reduction outweighs the risk of gastrointestinal bleeding. There was insufficient evidence for adults 80 and older and a recommendation against aspirin for men younger than 45 and women younger than 55. Continue to address age-specific subgroups. Establish age as an a priori subgroup; gather, analyze, and report evidence by age-specific subgroups. Attend to age 80 and older for evidence sufficiency and future research |
Sex | - Separate key questions for men and women for benefits and harms | Yes | Highest | Epidemiology of CVD events is different for men and women; men have a higher risk for events and have events at younger ages. Men are also at higher risk for gastrointestinal bleeding. | - Recently cited as most important subgroup by key informants - Controversial: recent meta-analyses including new trial data suggest no differences in the benefit of aspirin by sex, which is different from the previous review that found a significant benefit in women for stroke (but not myocardial infarction (MI)) and a significant benefit for men in MI (but not stroke) | Continue to explicitly address sex-specific subgroups. Establish sex as an a priori subgroup; gather, analyze, and report evidence by sex-specific subgroups. |
Race/ethnicity | - Race/ethnicity not addressed in the previous review | No | Unknown due to lack of evidence Perceived need for information | Due to disparities in incidence and mortality of CVD, particularly among Blacks, there is the potential for greater benefit from aspirin in this group. | - Not addressed in recent meta-analyses - Key informant indicates a lack of evidence for this subgroup | Based on the recent work of others and key informant input, evidence reported by race/ethnicity is not expected. However, if any subgroup data is reported in the literature, it should be captured, analyzed, and reported. Confirmed lack of subpopulation data should be reflected in Future Research section of the report. |
Other risk-related subgroups | - Not explicitly addressed in previous key questions - Results of subgroup analyses from primary papers reported for the following groups: • Diabetics • Baseline blood pressure levels • Smoking status • Kidney function | No | Moderate | Possible biological plausibility for subgroup differences. Factors related to diabetes (e.g., hyperglycemia, hyperinsulinemia, increased oxidative stress, advanced glycosylation end products) may influence platelet activity. Patients with peripheral artery disease (PAD) or diabetes may have less response to aspirin due to high inflammatory burden and platelet activation. Concomitant medications (e.g., statins, angiotensin-converting enzyme inhibitors, fibrates, selective serotonin re-uptake inhibitors) influence platelet activity and bleeding risk. | - 3 new RCTs since last review in higher risk populations (2/3 in diabetics and 2/3 in patients with PAD) - Key informant identified patients with PAD as a priority - Recent meta-analyses have addressed the following subgroup considerations: CVD risk, smoking, diabetes, and cholesterol and blood pressure - Recent meta-analyses in diabetic and elevated blood pressure patients - A public comment cited a subgroup analysis from the Women’s Health Study showing that aspirin use was associated with increased harm in current female smokers. Because smoking is associated with both increased cardiovascular risk and gastrointestinal complications, this reviewer called for a cautious approach to aspirin use in female smokers | Establish CVD risk groups a priori for consideration of benefits and harms, including diabetes, PAD, blood pressure, and smoking. |
Phase II: data abstraction and critical appraisal
Data abstraction
Audit of subgroup analysis results | ||||
---|---|---|---|---|
(1) List all a priori subpopulations from work plan | (2) List all studies conducted in a subpopulation only (e.g., older adults, males, females, diabetics) | (3) List all studies that reported subgroup analyses for this subpopulation | (4) List all outcomes reported for each subgroup analysis for this subpopulation | (5) Summarize decisions regarding further investigation of subgroup analysis results |
- Age | - Study A (n) - Study B (n) - Study C (n) - Study D (n) | Outcome #1 - Study A - Study B Outcome #2 - Study C | - The majority of the studies (x/y) in the review reported age-related subgroup analyses. Age-related subgroup results will be abstracted in a separate table. | |
- Sex | - Study X (men) (n) - Study Y (men) (n) - Study Z (women) (n) | - Study A (n) - Study B (n) - Study C (n) | Outcome #1 - Study A - Study B Outcome #2 - Study C | - The majority of the studies (x/y) in the review were either conducted in males or females only, or some type of sex-related subgroup analysis was conducted. Sex-related subgroup results will be abstracted in a separate table. |
Critical appraisal
Study name | Sub-population | Was a subgroup effect detected? | Likelihood that subgroup effects are SPURIOUS | Likelihood of CONFOUNDING of subgroup analysis | Likelihood of inadequate POWER to detect subgroup differences | Overall ratinga
|
---|---|---|---|---|---|---|
Study A | Ex. Age | Indicate whether the study found a difference in effects for this subgroup (i.e., yes, no) | Enter credibility assessment here (e.g., very likely, somewhat likely, unlikely, unclear, not applicable) and any relevant notes | Enter credibility assessment here (e.g., very likely, somewhat likely, unlikely, unclear) and any relevant notes | Enter credibility assessment here (e.g., very likely, somewhat likely, unlikely, unclear) and any relevant notes | Enter overall credibility rating here (e.g., low, moderate, high, or uncertain) and any relevant notes, including overall quality concerns for the study |
Questions to consider for credibility assessment | |
---|---|
Likelihood that subgroup effects are SPURIOUS | MAIN DOMAIN: Was a statistical test for interaction performed and did it indicate effect modification? [24, 53] The statistical test of subgroup-intervention effect interaction assesses whether the effect differs significantly between subgroups, rather than only assessing the significance of the intervention effect in one subgroup or the other [54]. If the p value for the test result is <0.05 (or a more stringent alpha), then the effects between subgroups are not the same [54]. If there are multiple subgroup-treatment effect interactions, further statistical analyses are required to confirm whether the effects are independent [54]. When was the subgroup-specific analysis specified? Determine when the subgroup analyses were specified in the study [24, 54]. An a priori subgroup analysis is one that is planned and documented before examination of data, preferably in the study protocol, and ideally includes a hypothesized direction of effect. When reported, this information can often be found in the methods section of the article. Subgroup treatment effect interactions identified post hoc must be interpreted with caution. There are no statistical tests of significance that are considered reliable in this scenario [54]. Was the total number of subgroup analyses limited to a small number of clinically important questions (i.e., <5)? This is a study-specific factor, rather than a subgroup-specific one. Subgroup analyses should be limited to a small number of clinically important questions in each study, and ideally limited to the primary trial outcome [8, 54]. Sun et al. suggest there should be five or fewer subgroup hypotheses tested [24]. If conducting a large number of subgroup analyses, was the statistical significance threshold adjusted (e.g., using a lower p value than 0.05)? This is a study-specific factor. Because the probability of a false positive result is high when a large number of subgroup analyses are conducted, studies can correct for the inflated false positive rate by adjusting the significance threshold for their interaction tests [55]. For example, if 10 tests are conducted, each one could use a 0.005 threshold; if 20 are conducted, each one could use a 0.0025 (these thresholds were calculated using 0.05/K, where K is the number of independent tests conducted; this equation ensures that the overall chances of a false positive result are no greater than 5%) [55]. |
Likelihood of CONFOUNDING of subgroup analysis | MAIN DOMAIN: Was the subgroup analysis potentially confounded by another study variable? In subgroup analyses in RCTs, the primary intervention is randomized but the secondary factors defining subgroups usually are not [56]. Controlling for confounding variables for the secondary factor that defines a particular subgroup is important when investigators are interested in intervening using the subgroup factor to increase intervention effect. This information may help judge the concern given to possible confounding. Were the intervention arms comparable at baseline for the subgroup of interest? For example, if the subgroup of interest is sex, the systematic reviewer should try to confirm that males in the intervention group were comparable to males in the control group. Similarly, females in the intervention group should be comparable to females in the control group. If the stratified intervention arms are not comparable at baseline, secondary factors affecting comparability could be confounding study variables [54]. Was the subgroup variable a characteristic specified at baseline (in contrast with after randomization)? This ensures that the benefits of randomization are maintained throughout the duration of the study, and reduces the possibility of confounding [8]. The credibility of subgroup hypotheses based on post-randomization characteristics can be severely compromised, since any apparent difference in intervention effect could potentially be explained by the intervention itself or different prognostic characteristics in subgroups that emerge after randomization [57]. Analyses based on characteristics that emerge during follow-up violate the principles of randomization and are less valid [26]. Was the subgroup variable a stratification factor at randomization? Randomization stratified for a priori subpopulations ensures comparable distribution of other characteristics, including potential confounding factors between subgroups on this factor [24, 54]. Stratified randomization ensures there is a separate randomization procedure within each subset of participants. |
Likelihood of inadequate POWER to detect subgroup differences | Was the trial powered to detect subgroup differences? If important subgroup-intervention effect interactions are anticipated, trials should be powered to detect them reliably [18, 54]. If a trial is underpowered for the main outcomes of interest, it is almost never adequately powered for a subgroup analysis. If a study did detect a difference in subgroup effect, then this domain would be assessed as very unlikely (i.e., that power was inadequate) because the power calculation, which was based on assumptions such as an estimate of the difference that might exist, is no longer very important after a significant difference has been revealed. If a study does not detect a difference, then it is very relevant to assess whether or not the study was underpowered. To inform judgments made about the evidence, the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) Working Group suggests that systematic reviewers consider the optimal information size (OIS) threshold as an additional criterion for adequate precision. OIS is reached if the total number of patients included in a systematic review is the same or more than the number of patients generated by a conventional sample size calculation for a single adequately powered trial [58]. Another potential application of the OIS criterion could be to indicate potential power issues in important subgroup analyses. |
Phase III: data analysis and synthesis
Investigating potential sources of heterogeneity at the body of evidence level
1. Population | 2. Intervention | 3. Comparator | 4. Outcomes | 5. Timing and tools | 6. Study design and conduct |
---|---|---|---|---|---|
Heterogeneity factors for each major domain driving heterogeneity | |||||
- Baseline risk for primary outcome (without intervention) as well as for intervention-related harms - Other main population differences hypothesized to drive differences in intervention effects | - Differences in the approach, intensity, modalities, or components of interventions that could drive differences in intervention effects | - Components of comparison condition that might influence the size/direction of intervention effects | - Comparability of inpatient outcomes across studies that might influence intervention effects | - Appropriateness and comparability of outcome assessment timing considering hypothesized intervention effects and natural history | - Variability in design and conduct of studies within a body of evidence |
Potential categories of variable approaches by individual studies | |||||
- Risk based (low, average, high, unclear, mixed) - Other selected (age, race/ethnicity, sex, education, socioeconomic status) | - Approach (generic, targeted, tailored) - Intensity/dose (hours, duration, staff) - Modalities (simple, multiple) - Components (single, co-interventions) | - Placebo - Usual care - Active/alternative treatment - Incremental effect (intervention and comparator only, vary by one or minimal components) | - Type of outcome (primary, secondary, incidental) - Number and type of beneficial outcomes (one main, multiple, composite) - Number and type of harmful outcomes (one main, multiple, composite) - Validity of outcome measurement | - Appropriateness (measured or timed after intervention ended, delayed measurement at meaningful timeframes - Comparability (consistent timeframe between studies, variable timeframe for study) | - Quality rating (good, fair, poor) - Risk of bias (lack of allocation concealment, lack of blinded outcome assessment, inappropriate randomization) |
Summarizing findings at the subpopulation level
Study name (quality rating) | Subgroup analysis credibility rating (from phase II) | (A) What is the definition of the subgroup in this study? | (B) What are the results of the subgroup-specific interaction test? | (C) What are the results of subgroup-specific analyses for this subpopulation in this study? | (D) What are the results of other subgroup-relevant analyses for this subpopulation in this study? |
---|---|---|---|---|---|
Study A (enter quality rating) | Enter subgroup analysis credibility rating from phase II | Clearly define the subgroup (or subpopulation) as described in the study (e.g., ages 65 years and older). | Abstract results of formal tests for interaction, and indicate the presence or absence of statistical significance (i.e., not significant (NS), significant (S), or not reported (NR)) and all available p values. | Abstract results of subgroup-specific stratified analyses conducted in the study (e.g., p values and intervention-effect measures of association [odds ratios, relative risks, mean changes] reported by subgroup) [53]. Enter the intervention effect with 95% confidence intervals for the main average and subgroup-specific analyses. Report results of subgroup analyses as absolute and relative risk reductions. Absolute risk reduction estimates give the probability an individual will benefit from an intervention [60]. | Abstract numerical results and statistical tests of other types of relevant subgroup analyses (e.g., findings with and without subgroup-adjustment in a logistic regression model, multivariable analyses predicting outcomes including subgroup variables). |
Women’s Health Study Ridker, 2005 [63] (Good) | Moderate | Age groups: 45–55 years 55–64 years ≥65 years | Interaction test for outcome of total myocardial infarction (MI): p = 0.03 (S) | Relative risk reduction (95% CI) for total MI: Main average effect: 1.02 (0.84 to 1.25), p = 0.83 45–55 years: 1.23 (0.87 to 1.75), p = 0.25 55–64 years: 1.17 (0.86 to 1.59), p = 0.32 ≥65 years: 0.66 (0.44 to 0.97), p = 0.04 Absolute risk reduction (95% CI) for total MI (calculated): Main average effect: 0.000 (−0.002 to 0.002) 45–55 years: −0.001 (−0.003 to 0.001) 55–64 years: −0.002 (−0.006 to 0.002) ≥65 years: 0.010 (0.001 to 0.020)
p = 0.04 | NA |
Phase IV: reporting and interpretation
Reporting
Report section | Reporting elements |
---|---|
Abstract | - Report valid, a priori subgroup or subpopulation findings in the structured abstract. - Report non-valid or insufficient evidence if it is a critical clinical or policy issue. |
Introduction | - Summarize the rationale for specific subpopulation considerations, including disease burden and potential differences in expected harms or benefits from the clinical preventive service, based on previous research [18]. |
Methods | - Briefly summarize the approach used to identify important subpopulation considerations in the review (e.g., literature searches, clinical and content expert consultation, and public comments). - Identify the a priori subpopulations the review addressed and the approaches taken for locating these data. - Clearly report how subgroups were defined (e.g., by categorical predictors or continuous risk scores) [18]. - Describe methods for abstracting subgroup and related analyses and any quality control processes, such as dual reviewing extracted data from primary studies [18]. - Describe methods for assessing the credibility of subgroup analyses related to a priori subpopulations at the study level and for focusing the report on clinically meaningful subpopulation results in the body of evidence. |
Results | - Summarize qualitative heterogeneity of body of evidence at methodological and clinical levels. - Report all proposed and actual investigations of clinical heterogeneity differentiating prespecified and post hoc, including all subgroups and outcomes analyzed [18, 19, 61]. - Summarize the frequency of subgroup analyses for a priori subgroups, the credibility of available subgroup analyses, and overall coherence of findings. - Report whether within-study results showed statistical evidence of effect modification by baseline subpopulation or other important characteristics across studies [64]. - Report results of meta-regression or other pooled subpopulation analyses if conducted. Report judgments or findings of clinical, methodological, or statistical heterogeneity. - Summarize results of subgroup analyses as absolute risk reductions and relative risk reductions. - Report any subpopulation differences in rates of serious harms. Report any other factors strongly associated with these harms [65]. - Any reported results from post hoc subgroups or subpopulations should be labeled exploratory. |
Discussion | Summary of evidence - Summarize the main findings for the overall body of evidence and subpopulations of interest [37]. - Report on all a priori subgroups, whether reporting on the absence of data to evaluate, an absence of detected effect modification (for relative or absolute measures), or detectable effect modification (on which scale), and its clinical significance. - Clearly report and distinguish between evidence of no effect, uncertain or incomplete evidence, or lack of evidence. - Clearly state when evidence may warrant separate considerations of net benefit in subpopulations. - Clearly indicate if caution is warranted in applying the average effect for some types of patients, even if evidence is unavailable or limited. Limitations - Summarize limitations of subgroup and subpopulation findings at the study, outcome, and review levels based on gaps in the evidence. Future research - Reference important exploratory findings from post hoc subgroups. - Provide recommendations on how future research could proceed or build upon results vis-à-vis important subpopulations. Conclusions - Provide a general interpretation of any a priori subpopulation findings in the context of other evidence. |
Interpretation
-
Are the subgroup analyses upon which any subpopulation analyses are based credible and consistent across studies and outcomes?
-
Do subpopulation findings avoid ecologic fallacy (i.e., are they based upon meta-regression involving only appropriate study-level variables or using appropriate individual participant data meta-analyses for patient-level variables)?
-
Were the subpopulation analyses in the systematic review specified a priori in a specific hypothesized direction?
-
Was the total number of subpopulation investigations in the systematic review limited to a small number?
-
Does statistical analysis suggest chance is an unlikely basis for subpopulation differences?
-
Are subpopulation findings supported by within-study findings rather than, or in addition to, between-study comparisons?
-
To what extent are subpopulation findings biologically plausible? [10]
Availability of subpopulation-specific data | Caveats |
---|---|
Presence of subpopulation differences in intervention effects | - When interpreting the presence of subgroup or subpopulation-specific findings, recall that evidence is usually observational [7]. Consider methodological heterogeneity, confounding and other sources of bias (e.g., publication, misclassification), magnitude and direction of effect and confidence intervals, and plausibility of causal relationships. Confounding can lead to spurious or misleading subgroup results, particularly when subgroup factors are correlated [61]. - When interpreting reported subgroup effects, beware of false positive effects. If multiple subgroup analyses are conducted, the probability of a false positive finding can be high [55]. Results are more likely to be real if they are based on a priori analyses because these have prior evidence supporting them. - When claiming an intervention effect in a subgroup, consider whether appropriate methods (e.g., p value adjustment, false discovery rates, Bayesian shrinkage estimates, adjusted confidence intervals, or internal or external validation methods) were used to account for the number of contrasts examined [18]. |
Absence of subpopulation differences in intervention effects | - Subgroup analyses are typically underpowered, thus the risk of false negatives is even higher. One should be aware of the remaining possibility of false negatives in the absence of relative intervention effect differences [59]. - Lack of relative intervention effect differences between subgroups may still result in clinically important variations in absolute benefit due to the impact of differences in baseline risk on absolute intervention effect. - Lack of difference between subgroups defined on single factors (e.g., age, race/ethnicity) is not sufficient reasoning that subpopulation differences do not exist. Subgroups defined through multivariable risk prediction tools are more likely to be clinically applicable and robust, particularly with larger studies. If a body of evidence has similar multivariable subgroup definitions within studies, pooling can increase power [66]. - Even without heterogeneity of intervention effects, not everyone who receives a “proven” intervention will benefit. (For an intervention with a constant 25% relative risk reduction, one-quarter of expected events will be averted, but 75% of events will still occur despite intervention) [67]. Reminding readers of this fact and emphasizing absolute effects within overall event rates is informative. Further, this approach can help clarify why even modest risk of serious harms may, in the end, exert a strong impact on net benefit calculations for the population as well as for individuals [66]. - When data are not definitive and overall benefits are modest, or overall benefits are moderate but intervention is costly, retaining the possibility of heterogeneity of intervention effects in the absence of evidence may be warranted. Consideration of individualized or targeted intervention approaches may still be applicable for future studies. - In the absence of compelling evidence, the best estimate is the average intervention effect [40]. |
Overall | - If meta-analyses were conducted, reviewers should consider possible explanations of variations between clinical and statistical heterogeneity. - Caution is warranted for definitive subgroup conclusions in the absence of patient-level meta-analysis or valid study-level methods and replication (or pooling) of within-study subgroup-specific findings across trials [54]. - Intervention-related risks are substantial (at least for some) and factors that appear to predict increased risk for serious harms can be related to subpopulations. When serious harms are a key issue, consider looking for validated risk prediction tools for serious harms to assist in net-benefit considerations, whether or not reviewed data support subgroup differences [40]. - Data to robustly support subgroup and heterogeneity of intervention evaluations are generally not available given the current state of clinical trial reporting [68]. As a result, predicting individual effects occurs less often, even though it is an area of growing interest as the field of precision medicine develops [18, 69]. Recent recommendations may improve the assessment and reporting of heterogeneity in clinical trials going forward [59]. |