Background
Methods
Search strategy
Keywords for exposures or interventions of interest (lifestyle) | Keywords for outcomes of interest (longevity) |
---|---|
• Diet | • Aging well |
• Nutritional status | • Longevity |
• Stress | • Successful aging |
• Psychological | • Healthy aging |
• Exercise | |
• Social support | |
• Family relations | |
• Social isolation | |
• Substance related disorders | |
• Sleep |
Inclusion/exclusion criteria
1) Criteria used in abstract screening | |
Included exposures/interventions | |
• Diet | |
• Exercise | |
• Stress or stress reduction | |
• Social relationships | |
• Addiction(s) | |
• Sleep | |
• Genetic-based factors | |
Excluded exposures/interventions: | |
• Pharmaceutical-based interventions | |
• Studies on research methods (e.g., validation of a health questionnaire), medical devices, tests or other assays | |
• Supplement interventions (e.g., micronutrient supplements, protein supplements) without accompanying lifestyle modifications | |
• Genome-wide association studies (GWAS, i.e., an analysis comparing the allele frequencies of all available polymorphic markers in unrelated patients with a specific symptom or disease condition, and those of healthy controls to identify markers associated with a specific disease or condition) | |
• Target population is children or healthcare workers | |
2) Additional criteria used in full-text screening | |
Included strength of evidence (SOE) tools that evaluated one of the following outcomes: | |
• Longevity | |
• Vitality and healthy or successful aging | |
• Disease risk or disease incidence | |
Excluded SOE tools addressed outcomes related to: | |
• Disease prevalence | |
• Injury severity | |
• Efficacy or effectiveness of diagnostic tools, medical devices, or other assays |
Study selection process
Data extraction
Risk of bias (ROB) in individual studies
Data synthesis
Results
Name of SOE method, year | Audience and Purpose for Evaluation | Number of levels of SOE | Definition of the highest level of SOE | Placement of prospective cohort studies in the framework of SOE |
---|---|---|---|---|
Tools developed by major agencies, for application in a variety of domains | ||||
Grading of Recommendations, Assessment, Development and Evaluation (GRADE), 2004 (35) from Cochrane Collaboration | Audience: Users of systematically developed clinical practice guidelines and recommendations (e.g., clinicians, patients, policymakers) Purpose: To provide a systematic and explicit approach to making judgments about the quality of evidence and the strength of recommendations | 4 levels: - High - Moderate - Low - Very low | Randomized trials begin as high quality of evidence and observational studies as low quality of evidence. Randomized trials remain high if they provide: • Direct evidence without important study limitations • Low imprecision (i.e., large number participants and/or higher number of events with small confidence intervals), and • Low publication bias | Observational studies without special strengths constitute low quality evidence, though study characteristics can increase or decrease a study’s starting quality. The following strengths can increase the SOE rating from observational studies: • Strong evidence of association—significant relative risk (RR) > 2 (< or 0.5) based on consistent evidence from ≥2 observational studies, with no plausible confounders (+ 1), or • Very strong evidence of association—significant RR > 5 (< or 0.2) based on direct evidence with no major threats to validity (+ 2); • Evidence of a dose response gradient (+ 1); • Presence of all plausible residual confounding would have reduced the observed effect (+ 1) Note: Rigorous observational studies provide stronger evidence than uncontrolled case series. |
Community Preventive Services Task Force (CPSTF), 2000 (47) No specific titled tool. | Audience: Community interventionists and clinical practitioners who need effectiveness recommendations for various treatments Purpose: To develop evidence-based, clinically effective recommendations for community-based interventions, various clinical treatments, and population-based interventions | 3 levels: - Strong - Sufficient - Insufficient | 3 possible paths to a “Strong” ratinga: • ≥2 studies with “good” execution, “greatest” design suitability, and consistent effect sizes of “sufficient” size • ≥5 studies with “good” execution, “greatest or moderate” design suitability, and consistent effect sizes of “sufficient” size • ≥5 studies with “good or fair” execution, “greatest” design suitability, and consistent effect sizes of “sufficient” size | It is possible for a prospective cohort study to fulfill the requirements for the “Greatest” rating. Specific study designs are not rigidly placed within the framework; the suitability for answering the research question is assessed in reference to potential threats to validity. |
US Preventive Services Task Force (USPST), 2012 (48) No specific titled tool. | Audience: Primary: primary care clinicians Secondary: consumer organizations, federal agencies, and other stakeholders involved in primary care delivery Purpose: To develop evidence-based recommendations about clinical preventive services and health promotion and evidence-based practice to improve the health of Americans | 5 levels: - A: High certainty of substantial net benefit - B: High certainty of moderate net benefit or moderate certainty of moderate to substantial net benefit - C: Moderate certainty net benefit is small - D: Recommends against service, no net benefit or harm outweighs benefits - I: Insufficient evidence | • > 1 well-designed study • Consistent study results • Conducted in representative primary-care populations • Unlikely to be strongly affected by results of future studies | Prospective cohort studies and other specific study designs are not directly mentioned in this method. The highest level of evidence is described as coming from “... well-conducted studies in representative, primary care populations… [to]… assess the effects of preventive service on health outcomes...” |
US Food and Drug Administration assessment of health claims for food products, 2003 (36) No specific titled tool. | Audience: Consumers of products with authorized or qualified health claimsb Purpose: To systematically evaluate the SOE for a proposed health claim,b including both authorized and qualified health claims | 2 levels: - (1): Authorized health claim (has significant scientific agreement among qualified experts) - (2): Qualified health claims- weaker scientific evidence must be accompanied by a disclaimer or be qualified in their wording (e.g., limited, very little, or highly uncertain scientific evidence) | • Studies with overall high methodologic quality rating • Results from intervention studies (as compared to observational studies) provide stronger evidence • Larger number of studies and sample sizes • Body of scientific evidence supports a health claim relationship for the US population or the target subgroup • Study results supporting the proposed claim have been replicated • Overall consistency in the total body of evidence showing a beneficial relationship | Observational studies: • Cannot be used to rule out the findings from well done intervention studies • Only included when findings are consistent with several RCTs • Any number of observational studies are trumped by several consistent RCTs • Hierarchy of evidence: Cohort design >nested case-control or case-cohort studies > case-control studies > cross-sectional studies > ecological studies and case reports |
American College of Cardiology / American Heart Association Task Force on Practice Guidelines Levels of Evidence, 2005 (54) | Audience: Clinicians and researchers with an interest in cardiovascular health Purpose: To summarize SOE for the purpose of assigning classes of clinical practice recommendations | 3 levels: - A: Data derived from multiple randomized clinical trials (RCTs) or meta-analyses. - B: Data derived from a single RCT or non-randomized studies. - C: Consensus opinion of experts, case studies, or standard of care | Multiple RCTs or meta-analyses of RCTs | Prospective cohort studies are not referenced in this method. |
National Evidence Library Grading Rubric, 2015 (49) | Audience: Primary: US Dietary Guidelines Committee Secondary: Health professionals and the public who read the Dietary Guidelines for Advisory Committee Report Purpose: To summarize the SOE to make conclusion statements possible to inform policy (e.g., informing the Dietary Guidelines) | 4 levels:* - Grade I: Strong - Grade II: Moderate - Grade III: Limited - Grade IV: Grade Not Assignable *Grading based on 5 elements: risk of bias; quantity of studies; consistency of findings; impact (directness of studied outcomes and magnitude of effect); generalizability to the US population of interest | • Bias - Studies of strong design free from design flaws, bias and execution problems • Quantity - Several good quality studies; large number of studies with sufficiently large sample size for adequate statistical power • Consistency - Findings generally consistent in direction, effect size or degree of association, and statistical significance with very minor exceptions • Impact - Studied outcome relates directly to the question and effect size is clinically meaningful • Generalizability - Studied populations, intervention and outcomes are free from serious doubts about generalizability | Prospective cohort studies are not directly mentioned in this method. The “risk of bias” component of the rubric mentions “studies of strong design” and “studies of weaker design for answering the question” but does not define them further. |
Evidence Analysis Library® Methodology and Process Evidence Grading System from the Academy of Nutrition and Dietetics, 2016 (50) | Audience: Dietitians, clinicians, and researchers Purpose: To summarize the SOE for the purpose of making dietary recommendations | 5 levels*: - I: Good/Strong - II: Fair - III: Limited/Weak - IV: Expert Opinion Only - V: Grade Not Assignable * Levels based on quality, consistency, quantity, clinical impact, and generalizability | • Quality: Strong study design for question; free from design flaws, bias and execution problems • Consistency: Findings generally consistent in direction and size of effect or degree of association, and statistical significance with minor exceptions • Quantity: ≥1 good quality studies with large sample sizes; studies with negative results have sufficiently large sample size for adequate statistical power Clinical impact: Studied outcome relates directly to the question; size of effect is clinically meaningful; large, statistically significant difference • Generalizability: Studied populations, interventions and outcomes are free from serious doubts about generalizability | Specific study designs are not mentioned or explicitly tied to a specific level of evidence. The quality rating for the highest level of evidence specifies “studies of strong design for the question.” |
Evidence-based Practice Center (EPC) method for grading SOE, 2009 (51) | Audience: Clinicians, researchers, and other health professionals Purpose: Summarize SOE for the purpose of guiding clinical practice recommendations and to improve the quality of healthcare | 4 levels: - High - Moderate - Low - Insufficient | Evaluation is based on 5 required domains and, where appropriate, 3 more optional domains: 5 required domains: • Study limitations/risk of bias: Low • Directness: High • Consistency: High • Precision: High • Reporting bias: Low 3 optional domains: • Dose-response association: Present • Uncontrolled confounding that can diminish an observed effect: Low • Strength of association (i.e., large magnitude of effect): High | • Domain and total SOE grading should be done separately for RCT evidence and observational study evidence. • Initially, RCTs start with a provisional high SOE grade and observational studies with a provisional low SOE grade. • These grades are adjusted as stronger or weaker based on study limitations or other factors. |
Joanna Briggs Institute Levels of Evidence*, 2013 (52) *No longer in current use, organization recently switched to using GRADE; grading for research questions of effectiveness is presented here as the most relevant domain for lifestyle medicine-type interventions | Audience: Researchers Purpose: Summarize the SOE | 4 levels under effectiveness heading:* - Level 1: Experimental Designs - Level 2: Quasi-Experimental DesignsLevel 3: Observational-Analytic Designs - Level 4: Observational-Descriptive Studies - Level 5: Expert Opinion and Bench Research Each level contains sub-levels | Effectiveness Level 1 categories are defined as follows: • Level 1.a – Systematic review of RCTs • Level 1.b – Systematic review of RCTs and other study designs • Level 1.c – RCTs • Level 1.d – Pseudo-RCTs | Prospective cohort studies* appear only in Level 3 categories (not Levels 1 or 2) • Level 3.a – Systematic review of comparable cohort studies • Level 3.b – Systematic review of comparable cohort and other lower study designs • Level 3.c – Cohort study with control group • Level 3.e – Observational study without a control group) “Inception cohort studies” do appear in Level 1 under prognosis heading |
Oxford Centre for Evidence-Based Medicine (OCEBM) Levels of Evidence, 2011 (53) | Audience: Physicians Purpose: To provide traditional critical appraisal and summarize SOE for clinicians and patients to quickly guide decisions to clinical questions | 5 levels: - Level 1 - Level 2 - Level 3 - Level 4 - Level 5 Each of the 5 levels are defined separately for each of the 7 clinical questions. | Level 1 evidence definitions for each of seven clinical questions: • 1. How common is the problem? Local and current random sample surveys (or censuses) • 2. Is this diagnostic or monitoring test accurate? (Diagnosis) Systematic review of cross-sectional studies with consistently applied reference standard and blinding • 3. What will happen if we do not add a therapy? (Prognosis) Systematic review of inception cohort studies • 4. Does this intervention help? (Treatment Benefits) Systematic review of randomized trials or n-of-1 trials • 5. What are the COMMON harms? (Treatment Harms) Systematic review of randomized trials, systematic review of nested case-control studies, n-of-1 trial with the patient you are raising the question about, or observational study with dramatic effect • 6. What are the RARE harms? (Treatment Harms) Systematic review of randomized trials or n-of-1 trial • 7. Is this (early detection) test worthwhile? (Screening) Systematic review of randomized trials | Prospective cohort studies c appear in the following clinical questions: • 3. What will happen if we do not add a therapy? (Prognosis) o Level 1: Systematic review of inception cohort studies o Level 2: Inception cohort studies o Level 3: Cohort study or control arm of randomized trial. Level may be graded down on the basis of study quality, imprecision, indirectness (study PICO does not match questions PICO), because of inconsistency between studies, or because the absolute effect size is very small; Level may be graded up if there is a large or very large effect size.) • Does this intervention help? (Treatment Benefits) o Level 2: includes observational study with dramatic effect o Level 3: Non-randomized controlled cohort/follow-up study • 7. Is this (early detection) test worthwhile? (Screening) o Level 3: Non-randomized controlled cohort/follow-up study |
Author-defined / lesser-known methods | ||||
Modified form of coding system, 2000 (37) | Audience: Researchers Purpose: To evaluate SOE related to correlates of physical activity in children and adolescents | 3 levels: - Association (either positive or negative): 60–100% of studies reviewed support association - Indeterminate: 34–59% of studies reviewed support association - No association: 0–33% of studies reviewed support association | Highest level is achieved when 60% or more of studies (regardless of design or total N) reviewed have a consistent positive or negative association. | Study design is not referenced in this method. All studies’ results would count equally towards SOE score; no instructions are given with respect to weighting of different study designs. |
Topic-specific SOE rating system for evaluating research on back pain, 1996 (38, 39) | Audience: Researchers and clinicians with an interest in back pain Purpose: To guide clinical practice guidelines for back pain | 4 levels: - Strong - Moderate - Limited - No evidence | Multiple high-quality RCTs with consistent positive outcomes | Prospective cohort studies are not referenced (i.e., they are not relevant to this kind of evaluation). |
Best evidence synthesis: a rating system based on a best-evidence synthesis used previously for PA interventions, 1995 (40–43) | Audience: Researchers Purpose: To summarize the SOE | 4 levels: - Level 1: Strong - Level 2: Moderate - Level 3: Limited - Level 4: No evidence | Multiple RCTs of high quality with consistent positive results. | Prospective cohort studies are not referenced (i.e., they are not relevant to this kind of evaluation). |
Criteria for determining level of evidence in meta-analyses of RCTs for walking training in stroke, 2008 (44) | Audience: Researchers and clinicians Purpose: To determine SOE in relation to rehabilitation after stroke | 4 levels - High - Moderate - Low - No evidence | At least 2 high-quality RCTs with similar results | Prospective cohort studies are not referenced (i.e., they are not relevant to this kind of evaluation) |
Overall SOE, 1999 (45, 46) | Audience: Researchers and clinicians Purpose: To predict the onset of functional status decline in people without initial functional status impairment | 4 levels: +++ [Strong] ++ [Moderate] + [Limited] (+) [Weak]d | • Evidence in > 3 “high quality studies” with a consistent positive or negative association • Analyses have no identified methodological limitations • Studies exclude individuals with functional status impairment at baseline • Studies report a significant positive association between risk factor and functional status decline in people | Study design is not referenced in this method. All study designs can count equally in the SOE score, provided they were not identified as having methodological limitations (so were therefore classified as “appropriate”); no instructions are given with respect to weighting of different study designs. |
Conceptualization of SOE approach specific to lifestyle Medicine
Research Method | Unique contribution to understanding |
---|---|
Basic science | Mechanisms of action |
Intervention studies in humans / RCTs | Reliable attribution; control of bias, confounding |
Observational epidemiology; large and diverse population-based samples | Effects at scale |
Observational epidemiology; long time periods | Duration of effects |
Is the question definitively addressable with RCTs?1 YES if: The outcome of interest would be measurable in < 5 years, subjects can ethically be randomized, a control group is plausible and ethical, blinding is potentially possible, a sample size of < 10,000 would provide adequate statistical power If YES, have RCTs been conducted? ➔(1) If YES, then use GRADE2 ➔(2) If NO, then use an alternative tool, consider OCEBM3 NO if: A duration > 5 years adherence to the intervention is required, randomization is not plausible or ethical, exposure of interest is the cumulative, lifetime effect of health behaviors. ➔ (3) Consider HEALM4 |
HEALM contains three scoring** levels of SOE: Grade A (Strong/decisive); Grade B (Moderate/suggestive); Grade C (Insufficient/inconclusive) | |
---|---|
As in other SOE evaluation methods, included studies’ methodological quality and risk of bias should be graded prior to assessment with HEALM established tools for rating individual study quality. Two examples are Cochrane’s Risk of Bias Tool54 for randomized controlled trials (RCTs) and the Newcastle-Ottawa Tool55 for cohort and case-control studies. Q1: Are there established mechanisms of action? (a plurality*** of evidence from bench science and animal models) Yes = 2 Uncertain*** = 1 No = 0 Q2: Are there intervention studies in people that provide evidence of causality/attribution? (a plurality*** of high-quality intervention trials, randomized controlled trials, interim measures, and surrogate markers as outcomes) Yes = 3 Uncertain = 1 No = 0 Q3: Are there observational studies to establish generalizability to large, populations? (a plurality*** of high-quality evidence from large prospective, cohort studies) Yes = 2 Uncertain = 1 No = 0 Q4: Are there observational studies to support effects over time periods measured in decades, lifetimes, or generations? (a plurality*** of evidence from high quality, long-term observational studies; retrospective cohort studies; ethnography; transcultural studies) Yes = 2 Uncertain = 1 No = 0 | |
*The HEALM tool is presented here to illustrate potential approaches to scoring evidence across research categories; it does not represent the single, specific approach recommended by the project expert panel on the basis of a formal process consensus process. **Scoring Answers to scoring questions should be based on expert consensus in evaluating available evidence. Evidence is conclusive when it can be identified as sufficient in quantity and quality, and consistent in findings, fostering clear consensus among experts. This would generally mean a replicated finding, and consistent effects among a clear plurality** of high quality, related publications.Evidence is uncertain when studies are few, small, poor quality, or conflicting- but generally suggestive of a particular finding. While expert consensus is critical in evaluation, a framework to inform discussion based on quantitative criteria used in previous umbrella reviews56 is suggested: 1. Total sample and number of cases of included studies 2. Significance of association based on p-values (highly significant defined as p < 0.0001 vs. nominally significant defined as p < 0.05) and confidence intervals that exclude vs. include the null value 3. When considering studies that include meta-analyses, a target threshold of 1000 cases, no evidence of small-study effects or excess significance bias, a 95% prediction interval excluding the null value and no large, unexplained, between-study heterogeneity (I2 < 50%) Grade A: Strong evidence = ≥7 (this would require decisive evidence in all other categories, AND at least suggestive evidence from intervention trials in people; OR- strong evidence from intervention trials in people, and decisive evidence in other two categories; OR strong evidence from intervention trials, decisive evidence in any other category, and suggestive evidence in the remaining two. Lends a primacy to RCT evidence but allows for strong evidence even with nothing more than suggestive evidence in intervention trial category. Grade B: Moderate/suggestive = 5 or 6. Achievable with decisive intervention trial evidence, and strong evidence in ANY other category. OR, strong evidence in all categories other than intervention trials. Grade C: Insufficient/weak/C = < 5 **Plurality may vary depending on the total number of existing studies conducted on a particular research question and must be determined on a case-by-case basis. For example, three consistent studies from a variety of study design with no opposing studies may constitute a plurality. Were there to be opposing studies the target number would be more than three. A clear numerical plurality of studies but with overall poor quality may constitute a rating of “Uncertain”. |