Study selection
Two levels of screening will be completed independently using
Synthesi.SR (proprietary online software developed by the Knowledge Translation Program of St Michael’s Hospital, Toronto, Canada,
http://www.breakthroughkt.ca/login.php). Two reviewers will independently review the title and abstract of articles retrieved from the literature search to determine if a study is eligible for inclusion. At the initiation of article screening, a calibration exercise will occur whereby each reviewer will independently screen 10% of a random sample of articles to ensure appropriate inter-rater agreement (at least 80% agreement). Discrepancies between the two reviewers will be resolved by consensus; otherwise, a third reviewer will be available to make a final decision about an article’s inclusion. The full-text of articles retained from level one screening will then be reviewed to confirm each article’s eligibility for inclusion. If a conference abstract is retained for level two screening, study authors will be contacted for further information as to whether a related manuscript has subsequently been published or to ensure the study meets our outlined eligibility criteria, as required. Whenever it is unclear if a study meets our outlined eligibility criteria, authors will be contacted for further information.
Data abstraction
Prior to data abstraction, we will complete a charting exercise to better inform the structure of our data abstraction form in terms of: (1) types of studies retrieved, (2) outcomes reported, and (3) effect measures used by study authors [
23]. All data will be abstracted independently by two reviewers from those studies retained in level two screening using a data abstraction form. The form will be piloted by each team member on a random sample of five included studies to ensure adequate inter-rater reliability (at least 80% agreement). The form will be modified as necessary to ensure clarity for reviewers based on our charting exercise. Disagreements will be resolved by a third person. When multiple studies report data from the same study population (e.g., companion reports), the study with the primary outcome of interest (fractures or aggression) or the largest sample size will be considered the major publication and the others will be retained for supplementary material only.
Information to be abstracted as potential effect modifiers will include study characteristics (e.g., year of study publication, authorship, location(s) of study, journal of publication, study sponsorship), patient characteristics (e.g., average (mean or median) age of study population, proportion of female patients, care setting, type(s) of dementia, severity of dementia, and standard of care in each care setting), and intervention characteristics (e.g., to whom the intervention was directed (e.g., patient, caregiver, clinician, and surrounding environment), and details of the intervention (e.g., intervention protocol or medication dosing schedule)).
Primary and secondary arm- or trial-level outcomes associated with intervention safety and efficacy (Table
2) will be extracted from included studies. Outcomes of efficacy and safety will be extracted at short-term (≤30 days), medium-term (31–364 days), and long-term (>364 days) follow-up because many interventions have been evaluated at many different time-points in our preliminary searches [
7,
24,
25]. All doses and schedules of drug administration will be extracted from included studies.
We expect this review to identify numerous interventions for BPSD. There is no established taxonomy for classifying interventions for BPSD; however, we will begin with the broad categories of patient-, care provider-, and environment-oriented interventions [
6]. In order to build a framework, we propose a qualitative consensus-based categorization procedure [
26]. This will involve the following four steps by two researchers at each step: (1) identifying, coding and defining all interventions from the systematic review, (2) independent categorization of interventions into relevant domains (e.g., all interventions coded as relating to a multi-sensory intervention would be sorted into this domain), (3) resolving any discrepancies in the categorization of interventions through discussion, and (4) emailing a representative group of stakeholders (e.g., clinicians, caregivers, allied health professionals) to review, reach consensus through discussion, and finalize the domains. This will provide feedback and ensure stakeholder validation of the proposed domains. At the initiation of step one, a calibration exercise will occur whereby each reviewer will independently identify, code, and define interventions from 10% of a random sample of articles to ensure appropriate inter-rater agreement (at least 80% agreement). This process ensures a rigorous approach to the categorization of interventions by using a qualitative method of independent multiple coding of the interventions and a consensus approach integrating the stakeholders early in the analysis.
Risk of bias and quality assessment
Risk of bias assessment of each included study will be completed independently by two reviewers. If there is disagreement between reviewers, a third reviewer will be available. In the case multiple outcomes are reported in a single study, we will use the hierarchy outlined by Kirkham et al., to establish our order of preference for selecting an outcome on which to complete our assessment of bias [
27].
The risk of bias of included clinical trials will be assessed as per the methodology of the Cochrane Handbook for Systematic Reviews of Interventions [
28]. The quality assessment of observational studies will be assessed with the Newcastle-Ottawa scale [
29]. In the assessment of case-control and cohort studies, a control patient might also be living in an institutional setting, in which case the study would still be awarded a star for an appropriate selection of control group. The most important confounder to adjust for in an observational study would be age, but other important confounders will include sex, comorbidities, dementia severity, caregiver availability, care setting, and other current or prior treatments for BPSD. For certain outcomes of intervention efficacy (e.g., change in aggression or agitation), the symptom may be present at the start of the study, but still be awarded a star if a change from baseline is reported. An appropriate length of follow-up for safety outcomes could be as little as 30 days, while most studies of efficacy outcomes would be expected to last at least 4 to 6 weeks—many are 10 weeks or longer [
24,
30]. We plan to assess other study designs with the Cochrane Effective Practice and Organization Care Risk-of-Bias Tool [
31].
Measures of treatment effect
If studies consistently report continuous data outcomes that are measured on the same scale, then mean differences (MDs) will be used. An odds ratio (OR) will be used if studies report an outcome as dichotomous data. To derive summary effect measures that combine both dichotomous and continuous effect measures, MDs or standardized mean differences (SMDs) will be transformed to OR estimates [
28,
32,
33]. For outcomes that are reported with a number of different scales across studies, the SMD will be derived and will be transformed into an OR to facilitate the outcome’s interpretation by knowledge users [
34]. The order of preference for selecting source data, when multiple options are reported by study authors (e.g., 2 × 2 tables, adjusted and unadjusted ORs, MDs, SMDs) is described in Additional file
3 (Additional file
3. Order Preference for Combining Data Types). In the case where authors report several scales for the same outcome, we will use our charting exercise to better inform our choice of scale used in the derivation of our summary effect measures. If a cluster design is reported, outcome measures will be extracted from the primary study that account for the clustering; however, if these data are not available, then the method of Rao and Scott will be used to account for the correlation in these data [
28,
35,
36]. For the presentation of results, the summary relative effect sizes (e.g., MDs or ORs) and associated 95% credible intervals (CrIs) for each possible pairwise comparison will be used.
Missing data
Where adjusted summary effect measures are reported, study-level data as provided by study authors will be included in our analyses. The type of data imputation method used for missing data will be noted on our data abstraction form so that the quality assessment of each study will reflect the appropriateness of the data imputation method used to account for missing data. For example, attrition in a trial for dementia treatments may be related to side effects of the treatment, and using the last observation carried forward approach can introduce important bias favoring the treatment, as outcomes tend to deteriorate with time. Informative missing odds ratios (IMORs) for dichotomous outcomes and informative missingness difference of means (IMDoM) for continuous outcomes will be derived to capture the uncertainty in our estimates from missing data under the missing at random assumption [
37,
38]. For continuous outcomes that are not reported as means with associated standard deviations, imputation methods will be used to derive approximate effect measures [
28,
39]. Study authors will be contacted for further information prior to applying data imputation methods, as needed.