Introduction
As evidenced by their growing presence in the scientific literature [
1,
2], cluster randomised trials (CRTs) have become a key instrument to evaluate public health interventions [
1,
3‐
7], particularly in low- and middle-income countries (LMICs) [
3,
8]. Randomised controlled trials (RCTs) are widely considered to provide the highest quality of evidence on the effectiveness of health interventions [
9‐
12], and CRTs are a form of randomised trial in which clusters of individuals (such as families, villages, hospital services, or schools) rather than independent individuals are randomly allocated to intervention or control groups [
2]. Increasingly, public health researchers recognize the importance of developing health interventions that are directed not only to individuals but also to populations, communities, and a wide range of social and environmental factors influencing health [
13,
14]. CRTs offer an appropriate design to assess such public health interventions and also to measure the overall effect of an intervention at the population level [
3,
5,
8,
13,
15], heterogeneity of impact among population subgroups, and equity [
16,
17].
Implementation fidelity in CRTs of public health interventions
Although the scientific debate is ongoing [
18], randomised trials are generally viewed as the gold standard for establishing evidence of intervention effectiveness. Despite this, the use of CRTs to evaluate public health interventions raises unique methodological challenges. Recent systematic reviews of CRT methods have found evidence of improvements in the design and analysis of CRTs while noting deficiencies in trial implementation that may compromise their validity [
19,
20]. Previous systematic reviews have emphasised the importance of process evaluation to mitigate these methodological problems, which can affect the internal and external validity of trial results [
3,
19,
21‐
23].
“Implementation fidelity” refers to the degree to which an intervention is delivered as initially planned [
24]. Fidelity assessment is an aspect of process evaluation that aims to understand and measure to what extent the intervention is being implemented as intended, with a view to clarifying relationships between intervention and its intended outcomes, and learning what specific reasons have caused the success or failure of the intervention [
9,
24,
25]. Evaluation of implementation fidelity within trials has multiple benefits, which may include increased confidence in scientific findings, increased power to control for confounding factors and detect intervention effects, and increased ability to evaluate the performance of an intervention based on theory [
26]. Several studies have found that interventions implemented with high fidelity achieved better results in comparison with low-fidelity interventions [
27‐
33]. Fidelity assessment can improve the internal and external validity of CRTs [
19] by providing evidence that the trial results are due to the intervention itself rather than to confounding variables and facilitating generalization of results to contexts that may differ substantially from the original trial setting [
9,
24]. Fidelity assessment may be particularly important for trials of public health interventions, as these interventions tend to be complex and constituted by multiple components [
10,
34] that may act independently or interdependently [
35], leading to a greater potential for variation during implementation [
24].
Framework for the evaluation of implementation fidelity used in this review
Table
1 outlines the conceptual framework for evaluation of implementation fidelity used in this review. The framework is based principally on the work by Carroll et al. [
24] and includes elements of implementation fidelity and moderating factors that may affect the delivery process. The framework was further refined by Hasson, who expanded the list of moderating factors considered in the framework [
36]. We selected this framework to guide the review because it provides a comprehensive synthesis of previous work on implementation fidelity and has been widely influential.
Table 1
Conceptual framework for implementation fidelity used in this review
Fidelity components |
Content | Defined as an attempt to establish the “active ingredients” of the intervention, for example, in a theory of change or logic model, and assess whether they have been delivered as planned |
Coverage | Refers to the degree to which all persons who met study inclusion criteria received the intervention |
Frequency | Refers to whether the intervention was delivered with the regularity or frequency planned by its designers. |
Duration | Establishes whether the intervention was delivered with the duration planned by its designers. |
Moderating factors |
Comprehensiveness of intervention description | Factors such as the degree of intervention complexity and whether the intervention description is complete or incomplete, vague, or clear may influence the degree of implementation fidelity. |
Strategies to facilitate implementation | Several support strategies may be used to optimise and to standardise implementation fidelity. |
Quality of delivery | Concerns whether an intervention is delivered in a way that increases the likelihood of achieving the desired health outcomes |
Participant responsiveness | Intervention uptake depends on its acceptance by and acceptability to those receiving it. Low participant involvement or responsiveness may negatively impact intervention fidelity. |
Recruitmenta | Refers to procedures that were used to attract potential programme participants. |
Contexta | Refers to the surrounding social systems, such as structures and cultures of organizations and groups, and historical and concurrent activities and events |
Fidelity assessment in CRT reporting guidelines
The Consolidated Standards of Reporting Trials (CONSORT) group was created to provide guidance to improve the quality and transparency of reporting of RCTs [
37]. The CONSORT Statement offers a checklist of essential items that should be included in reporting a RCT [
37]. Due to the increasing use of CRT designs, the CONSORT group proposed a version of the CONSORT Statement for the reporting of cluster randomised trials in 2004 and updated these guidelines in 2012 [
2,
38].
The CONSORT Statement recognises that the trial protocol for a given study may not have been followed fully for some trial participants for a wide variety of reasons, including failure to receive the entire intervention as planned [
37]. Cases of protocol nonadherence may influence the interpretation and credibility of the results and thus the validity of the conclusions [
19,
26,
39,
40]. To preserve the ability to make strong inferences about the intervention effect, CONSORT offers recommendations on how issues of nonadherence should be handled at the level of analysis. Specifically, it recommends that all participants randomised be retained in the analysis and analysed according to their original assigned groups, an approach known as “intention-to-treat” or “ITT” analysis. This approach ignores noncompliance, protocol deviations, and anything that occurs after randomisation. The rationale for the ITT approach is that random allocation procedures avoid bias when assigning interventions to trial participants and thus facilitate causal inference. Any exclusion of patients from the analysis risks compromising the randomisation and may lead to biased results. This ITT approach can be contrasted with a “per protocol” or “PP” analysis, which restricts the analysis to participants who fulfil the protocol in terms of eligibility, interventions, and outcome assessment [
19,
26,
39,
40]. According to the CONSORT, although a PP analysis may be appropriate in some instances, due to the exclusion of participants, it should be considered as a non-randomised, observational comparison.
The CONSORT guidance on handling protocol nonadherence has been primarily developed in relation to individually randomised parallel group trials. However, reasons for protocol nonadherence in individually randomised RCTs may differ from those in CRTs. In a clinical trial setting, nonadherence depends largely on the actions of the trial participant (e.g. failure to adhere to therapy) and the treatment provider (e.g. failure to follow treatment protocol), which may in turn be related to issues such as treatment side effects and safety. In CRTs of public health interventions, protocol nonadherence may occur because complex interventions that include multiple components are delivered with poor fidelity. However, despite the scientific importance of protocol nonadherence, the current CONSORT guidelines for individually randomised parallel group trials [
37] and CRTs [
2,
38] offer no advice on the methods to assess its occurrence during the course of a trial.
Rationale for undertaking this review
LMIC governments and other development partners have strengthened research and intervention efforts to support the UN Millennium Development Goals (MDGs) and Sustainable Development Goals (SDGs) agenda. As the global community intensifies the search for the best evidence on public health interventions to improve health and development outcomes in LMICs, CRTs have become an essential tool. Policymakers are interested in using the best available evidence to make decisions about the effectiveness of specific interventions in LMICs facing considerable budget constraints. Although CRTs have been widely implemented to evaluate public health interventions in both high-income countries and LMICs, country context, interventions, approaches, and outcomes may differ substantially between settings. We therefore limit our focus to LMICs.
As earlier methodologically-oriented systematic reviews have demonstrated, CRTs of complex public health interventions may be particularly at risk of experiencing protocol deviations and nonadherence, and these may compromise the validity of their findings [
19,
20]. Although process evaluation techniques such as evaluation of implementation fidelity can help to assess the extent of these problems and mitigate their negative effects, current reporting guidelines for CRTs offer no specific guidance on the assessment of intervention fidelity within CRTs. Wide divergence in current practices is therefore likely. We will undertake a methodologically-oriented systematic review of current practices related to the assessment of intervention fidelity within CRTs of public health interventions in LMICs, with a view to informing the best practices for these CRTs. To our knowledge, no other systematic review has been conducted on this question.
Objective
We will conduct a systematic review of the published scientific literature to study current practices concerning the assessment of intervention fidelity in CRTs of public health interventions in LMICs.
This review will address the following research questions:
1.
Based on information from the trial registry (and the published study protocol, if applicable): What proportion of recent CRTs of public health interventions in LMICs planned to assess implementation fidelity (IF)?
2.
Based on information from the published trial report (or a complementary document such as a published article, a grey literature report, or an online appendix reporting the assessment of IF), what proportion of recent CRTs of public health interventions in LMICs reported assessing IF?
3.
For those studies that assessed IF, which fidelity components were examined, and which data collection methods were employed to assess each component?
4.
Is there evidence of divergent practices between planned and reported studies, or of outcome reporting bias related to the assessment of IF?
a.
Based on comparison of the results of questions 1 and 2, what is the overall agreement between planned and reported assessment of IF?
b.
Are trial reports with negative findings for the ITT analysis more likely to report a PP analysis?
c.
For the subset of studies that included both ITT and PP analyses, what is the overall agreement between ITT and PP analyses concerning the intervention’s effectiveness?
d.
Does the magnitude of the intervention effect differ for PP as compared to ITT analyses?
To answer our research questions, we will first identify all CRTs from 2012 onwards of public health interventions conducted in LMICs with an available study protocol registered in a public trial registry. A given CRT will be included in the review if the protocol, the trial report, or both address IF. For each CRT included in the review, we will compare planned assessment methods for IF as described in the trial registry (and published study protocol, if applicable) with published methods and results from the main trial report (and related documents, if relevant). We will use a variety of measures to summarise the results.
Methods
We describe the study methods in seven steps adapted from the 2015 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)-P reporting guidelines for systematic reviews and meta-analysis protocols [
41]. The PRISMA-P checklist is provided as an additional file (see Additional file
1). As this review focuses on methodological issues rather than on health-related outcomes, it was not eligible for inclusion in the PROSPERO registry [
42]. In the event of protocol amendments, we will provide the date of each amendment, a description of the change, and the rationale for the change.
Eligibility criteria
Studies will be selected from the peer-reviewed scientific literature according to the following study and report characteristics.
Literature search strategies were developed in collaboration with an academic librarian experienced in conducting systematic review searches. Search strategies use Medical Subject Headings (MESH) and text words related to cluster randomised trials, developing countries, and public health interventions. The electronic database search was developed first for MEDLINE (Ovid) (for the full search strategies, see Additional file
2) and then adapted for the following electronic databases: EMBASE (Ovid), CINAHL (Ovid), PubMed, and EMB Reviews (Ovid). Search terms are a combination of “cluster-randomized”, “cluster analysis”, “health program”, “public health service”, “health education”, “public health”, “health promotion”, “health behavior”, “health knowledge/attitudes practice”, “Preventive health services”, “health care system”, “health education”, and “developing countries”. The search strategy will span the time period from January 2012 to May 2016 and will be updated towards the end of the review. Searches will be filtered to articles concerning humans and written in English, French, or Spanish. To augment this list, we will add relevant studies suggested by members of the systematic review team. Identified records will be uploaded into the EndNote reference management software (version X7.5.3, Thompson Reuters, 2016), and duplicates will be eliminated.
Study screening and selection
Study screening and selection will be done manually within the EndNote based on the inclusion and exclusion criteria for this systematic review. To ensure the availability of study protocols, we will limit the search to CRTs that have the word stem “regist*” in the abstract and use these results to begin the process of screening and selection. We validated this procedure by examining a subset of excluded articles. Screening and selection will be done in two stages by two independent reviewers (MCP and NM). In the first stage, reviewers will independently screen the titles and abstracts of each identified reference against the inclusion criteria to eliminate irrelevant publications. In the second stage, we will screen the full text of all studies that appear to meet the inclusion criteria or for which there is uncertainty as to eligibility. For each study, we will identify additional articles of potential relevance, such as a published protocol or a process evaluation, by reviewing references from the main trial report, consulting the trial registry record, and searching the PubMed database for new publications by the lead trial author. To aid in article screening and selection, the team will develop and test a screening sheet for full-text review. Any disagreement between reviewers will be resolved through discussion and, as necessary, through arbitration by a third author (MJ). The process of study selection will be documented in a flow diagram describing studies identified and excluded at each stage. We will also provide a summary table describing studies excluded at the stage of full-text review, along with reasons for their exclusion.
Outcomes and prioritisation
The search and selection process for this review is designed to identify two quantities required for calculation of outcomes based on proportions: (1) numerator: These are studies that meet all the inclusion and exclusion criteria. As for all systematic reviews, these studies are our principal focus and will be included in the review and given detailed analysis. (2) Denominator: This is the total N for the study, which we defined as all studies that satisfy all the inclusion and exclusion criteria, with the exception of the outcome criterion (planned or reported IF assessment). It is essentially the universe of cluster randomised trials of public health interventions in LMICs. Both quantities will be clearly indicated in the study flow diagram.
Primary outcome
The primary outcome for this study will be the proportion of overall agreement between the protocol and trial report concerning occurrence of IF assessment. This corresponds to research question 4a.
Data will be summarised in a two-by-two table comparing the assessment of intervention fidelity in the trial report to that in the protocol.
N represents the set of recent CRTs of public health interventions in LMICs that have registered the study protocol in a publicly availably trial registry. For each CRT in
N, we will determine whether IF was assessed in the registered (or published) protocol or in the trial report (or associated documents). Studies judged to have assessed IF will be coded as “1”; others will be coded as “0”. Judgements will represent reviewer consensus (MCP and NM, with appeal to MJ in case of divergences). The proportion of overall agreement is defined as the proportion of eligible CRTs for which judgements concerning the occurrence of implementation fidelity assessment agree in the protocol and in the trial report (i.e. both positive or both negative). It will be computed as (
a +
d)/
N.
| | Protocol | | |
| | + | − | |
Trial report | + |
a
|
b
| a + b |
| − |
c
|
d
| c + d |
| | a + c | b + d |
N
|
Secondary outcomes
To address research questions 1, 2, and 3, we will also calculate the following:
-
The frequency and proportion of trial protocols reporting the assessment of intervention fidelity, out of N
-
The frequency and proportion of trial reports reporting the assessment of intervention fidelity, out of N
-
The proportion of positive agreement among those that agree, computed as a/(a + d)
-
The frequency counts and percentages summarising fidelity components examined and data collection methods proposed or employed
To address research question 4b, for all studies included in the trial, we will also record the authors’ judgments as to whether the intervention was effective. Studies that concluded that the intervention is more effective than the control will be coded as “1”; studies that were unable to reject the null hypothesis that there are no significant differences between groups will be coded as “0”. We will calculate as follows:
-
The conditional probability that a PP analysis is performed given that the ITT analysis shows no difference between groups.
-
The conditional probability that a PP analysis is performed given that the ITT analysis shows a positive intervention effect.
These measures will be calculated using a standard formula for conditional probabilities:
$$ P\left(B\Big|A\right)=\frac{P\left(A\ \mathrm{and}\ B\right)}{P(A)} $$
To address research questions 4c and 4d, we will examine the subset of trial reports containing both ITT and PP analyses. For studies comparing several interventions (e.g. factorial design), data on each intervention will be extracted separately.
To address research question 4c, we will study the proportion of the overall agreement between the ITT and PP analyses concerning intervention effectiveness.
Data will be summarised in a two-by-two table comparing the assessment of intervention effectiveness in the ITT analysis to that in the PP (intervention fidelity) analysis.
T is the total number of included CRTs reporting both an ITT and PP analysis. Studies that concluded in favour of the intervention group will be coded as “1”; those that are unable to reject the null hypothesis that there is no significant difference between groups will be coded as “0”. Judgements will represent reviewer consensus (MCP and NM, with appeal to MJ in case of divergences). The proportion of overall agreement is defined as the proportion trial reports for which judgements concerning intervention effectiveness agree in ITT and PP analyses (i.e. both positive (favour the intervention group) or both negative (unable to reject the null hypothesis of no difference between groups)). It will be computed as (
w +
z)/
T.
| | ITT analysis | | |
| | + | − | |
PP analysis | + |
w
|
x
|
w + x
|
− |
y
|
z
|
y + z
|
| |
w + y
|
x + z
|
T
|
We will also calculate
-
The frequency and proportion of ITT analyses that conclude in favour of the intervention, out of T
-
The frequency and proportion of PP analyses that conclude in favour of the intervention, out of T
To address research question 4d, we will compare intervention effect sizes reported for ITT and PP analyses. Comparisons will be summarised as the percentage change in effect size, computed as the effect size for the PP analysis/effect size for the ITT analysis *100.
Risk of bias in individual studies
To assess possible risk of bias for included studies, we will use the Cochrane Collaboration tool to assess the risk of bias in randomised trials [
51] based on the following factors: random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, selective reporting, and other sources of bias. Because the Cochrane Collaboration tool was developed for individually randomised studies whereas our study focuses on CRTs, we will also include several additional criteria specifically relevant to assessing risk of bias in CRTs, recommended by the Cochrane Collaboration [
51] and other key sources [
51‐
53]. These additional criteria will consider issues related to the following: recruitment bias (potential for participant self-selection to occur if individuals are recruited to the trial after the clusters have been randomised); baseline imbalances (because CRTs generally randomise a limited number of clusters, chance imbalances may affect comparability of intervention and control groups); loss of clusters (complete clusters may sometimes be lost from a trial and thus be omitted from the analysis; these missing data may lead to biased outcome assessments); and unit of analysis (failure to properly account for clustering in the analysis) [
51]. For each domain or criterion of interest, we will assess each criterion as low risk, high risk, or uncertain risk and provide a sample text that illustrates the reasons for this judgement. This evaluation will be done independently by two reviewers (MCP and NM). Disagreements between reviewers will be resolved by consensus or, if consensus cannot be achieved, by consulting a third reviewer (MJ). Judgements related to risk of bias will be summarised graphically using RevMan 5.1 [
51]. Risk of bias assessments will be used to create categories of high-, uncertain-, and low-risk studies to be used in subgroup analyses.
Systematic reviews of health outcomes often assess the quality of a body of evidence using standardised tools such as the GRADE system [
54]. However, as this review focuses on methodological issues rather than on health-related outcomes, we will not use this tool.
Data extraction and data items
Two review authors will extract data independently (MCP and NM). From each study protocol and trial report, reviewers will extract data on (i) the study characteristics (study location, aims, intervention); (ii) all applicable descriptors of the CRT trial design (for example, parallel group, stepped wedge, factorial, adaptive, pragmatic); (iii) concepts related to the assessment of IF (assessment of fidelity reported in protocol and/or main study, fidelity components and moderating factors evaluated, data collection methods, and any dimension used by the authors to evaluate intervention fidelity distinct from those proposed by Caroll and Hasson [
24,
32]); (iv) whether events taking place in the control group were monitored, as these can influence the effectiveness of the intervention [
27,
55,
56]; and (v) information for assessing the risk of bias of included studies. We will also extract (vi) statistical results concerning the intervention effectiveness and the authors’ qualitative conclusions regarding the intervention effect for the primary (generally, ITT) analysis and one or more subgroup analyses relevant for intervention fidelity (generally, the PP analysis). If studies investigate more than one intervention, we will extract data relevant for each comparison. To reduce bias and errors in data extraction, reviewers will use a pre-defined template pilot tested on a subset of studies and a guide for data extraction. To ensure consistency, reviewers will receive training prior to commencing extraction for the review and undertake calibration exercises. Reviewers will resolve disagreements by discussion and by appeal to a third author (MJ) where necessary. All data extraction tools will be available as online supplementary documents.
Data synthesis
Results will be presented in accordance with the PRISMA Statement [
41]. A narrative synthesis will be provided, with information presented in tables to summarise key data. The narrative synthesis will explore relationships and findings within and between the included studies. It will highlight the four key dimensions of intervention fidelity identified from the literature (content, coverage, frequency, and duration), moderating factors for intervention fidelity (participant responsiveness, comprehensiveness of policy, strategies to facilitate implementation, quality of delivery, recruitment, and context), any new dimensions explored, and data collection method used to evaluate each key dimension.
We will present quantitative data for all primary and secondary outcomes proposed. Where appropriate, data will be presented in tabular form.
We will investigate the possible sources of heterogeneity by performing subgroup analysis. Specifically, we will recompute the main quantitative outcomes for subgroups of studies with high, uncertain, and low risk of bias to better understand potential sources of variation in results. If the data permit, we will conduct a sensitivity analysis to explore whether studies at lower risk of bias undertake more comprehensive assessment of intervention fidelity. Because of the study question and the nature of the outcomes assessed, we do not intend to perform meta-analyses.
We recognize that data may be biased due to non-study-related processes and plan to assess specific meta-biases. This study compares results for protocols and published trial reports, and is thus designed to address potential reporting bias and to investigate potential outcome bias. As our review focuses on methodological issues rather than on outcome assessment, we will not assess potential publication bias.
Discussion
Development initiatives require high-quality evaluations to determine whether the programmes work or not and to know how to improve them [
57,
58]. According to Rychetnik et al. [
48], evaluation of public health interventions requires detailed information about the “design and implementation of an intervention; contextual circumstances in which the intervention was implemented; and how the intervention was received”.
We will conduct a methodological systematic review to evaluate the current practices for evaluating implementation fidelity in CRTs of public health interventions carried out in LMICs. Fidelity assessment may be a key tool for making studies more reliable, internally valid, and externally generalizable [
59]. In the absence of fidelity assessment, it may be difficult to determine if CRT results are due to the intervention design, to its implementation, or to unknown or external factors that may influence results. The rejection of effective interventions or acceptance of ineffective interventions incurs incalculable costs, due to the use of financial and scientific resources, and the inability of the authors to extrapolate the results [
26]. Improved assessment and reporting of intervention fidelity may be important for researchers, for those who finance health interventions, and for decision-makers who seek the best evidence on public health interventions to promote health, prevent disease, and reduce health inequalities.
Acknowledgements
We would like to acknowledge Daniela Ziegler, librarian at the University of Montreal Hospital (CHUM), for her help with the database search strategy and Professor Christina Zarowsky, University of Montreal, for the helpful comments on an earlier manuscript draft.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.