The paper’s methods were shaped around its overall aim: to advance clarity in the language used to describe outcomes of implementation. We convened a working group of implementation researchers to identify concepts for labeling and assessing outcomes of implementation processes. One member of the group was a doctoral student RA who coordinated, conducted, and reported on the literature search and constructed tables reflecting various iterations of the heuristic taxonomy. The RA conducted literature searches using key words and search programs to identify literature on the current state of conceptualization and measurement of these outcomes, primarily in the health and behavioral sciences. We searched in a number of databases with a particular focus on MEDLINE, CINAHL Plus, and PsycINFO. Key search terms included the name of the implementation outcome (e.g., “acceptability,” “sustainability,” etc.) along with relevant synonyms combined with any of the following: innovation, EBP, evidence based practice, and EST. We scanned the titles and abstracts of the identified sources and read the methods and background sections of the studies that measured or attempted to measure implementation outcomes. We also included information from relevant conceptual articles in the development of nominal definitions. Whereas our primary focus was on the implementation of evidence based practices in the health and behavioral sciences, the keyword “innovation” broadened this scope by also identifying studies that focused on other areas such as physical health that may inform implementation of mental health treatments. Because terminology in this field currently reflects widespread inconsistency, we followed leads beyond what our keyword searches “hit” upon. Thus we read additional articles that we found cited by authors whose work we found through our electronic searches. We also conducted searches of CRISP, TAGG, and NIH reporter and studies to identify funded mental health research studies with “implementation” in their titles or abstracts, to identify examples of outcomes pursued in current research.
Group processes included iterative discussion, checking additional literature for clarification, and subsequent discussion. The aim was to collect and portray, from extant literature, the similarities and differences across investigators’ use of various implementation outcomes and definitions for those outcomes. Discussions often led us to preserve distinctions between terms by maintaining in our “nominated” taxonomy two different implementation outcomes because the literature or our own research revealed possible conceptual distinctions. We assembled the identified constructs in the proposed heuristic taxonomy to portray the current state of vocabulary and conceptualization of terms used to assess implementation outcomes.
Taxonomy of Implementation Outcomes
Through our process of iterative reading and discussion of the literature, we worked to nominate definitions that (1) achieve as much consistency as possible with any existing definitions (including multiple definitions we found for a single construct), yet (2) serve to sharpen distinctions between constructs that might be similar. For several of the outcomes, the literature did not offer one clear nominal definition.
Table
1 depicts the resultant working taxonomy of implementation outcomes. For each implementation outcome, the table nominates a level of analysis, identifies the theoretical basis to the construct from implementation literature, shows different terms that are used for the construct in the literature, suggests the point or stage within implementation processes at which the outcome may be most salient, and lists the types of existing measures for the construct that our search identified. The implementation outcomes listed in Table
1 are probably only the “more obvious,” and we expect that other concepts may emerge from further analysis of the literature and from the kind of empirical work we call for in our discussion below. Many of the implementation outcomes can be inferred or measured in terms of expressed attitudes and opinions, intentions, or reported or observed behaviors. We now list and discuss our nominated conceptual definitions for each implementation outcome in our proposed taxonomy. We reference similar definitions from the literature, and also comment on marked differences between our definitions and others proposed for the term.
Table 1
Taxonomy of implementation outcomes
Acceptability | Individual provider Individual consumer | Rogers: “complexity” and to a certain extent “relative advantage” | Satisfaction with various aspects of the innovation (e.g. content, complexity, comfort, delivery, and credibility) | Early for adoption Ongoing for penetration Late for sustainability | Survey Qualitative or semi-structured interviews Administrative data Refused/blank |
Adoption | Individual provider Organization or setting | RE-AIM: “adoption” Rogers: “trialability” (particularly for early adopters) | Uptake; utilization; initial implementation; intention to try | Early to mid | Administrative data Observation Qualitative or semi-structured interviews Survey |
Appropriateness | Individual provider Individual consumer Organization or setting | Rogers: “compatibility” | Perceived fit; relevance; compatibility; suitability; usefulness; practicability | Early (prior to adoption) | Survey Qualitative or semi-structured interviews Focus groups |
Feasibility | Individual providers Organization or setting | Rogers: “compatibility” and “trialability” | Actual fit or utility; suitability for everyday use; practicability | Early (during adoption) | Survey Administrative data |
Fidelity | Individual provider | RE-AIM: part of “implementation” | Delivered as intended; adherence; integrity; quality of program delivery | Early to mid | Observation Checklists Self-report |
Implementation Cost | Provider or providing institution | TCU Program Change Model: “costs” and “resources”
| Marginal cost; cost-effectiveness; cost-benefit | Early for adoption and feasibility Mid for penetration Late for sustainability | Administrative data |
Penetration | Organization or setting | RE-AIM: necessary for “reach” |
Level of institutionalization? Spread? Service access?
| Mid to late | Case audit Checklists |
Sustainability | Administrators Organization or setting | RE-AIM: “maintenance” Rogers: “confirmation”
| Maintenance; continuation; durability; incorporation; integration; institutionalization; sustained use; routinization; | Late | Case audit Semi-structured interviews Questionnaires Checklists |
Acceptability is the perception among implementation stakeholders that a given treatment, service, practice, or innovation is agreeable, palatable, or satisfactory. Lack of acceptability has long been noted as a challenge in implementation (Davis
1993). The referent of the implementation outcome “acceptability” (or the
“what” is acceptable) may be a specific intervention, practice, technology, or service within a particular setting of care. Acceptability should be assessed based on the stakeholder’s knowledge of or direct experience with various dimensions of the treatment to be implemented, such as its content, complexity, or comfort. Acceptability is different from the larger construct of service satisfaction, as typically measured through consumer surveys. Acceptability is more specific, referencing a particular treatment or set of treatments, while satisfaction typically references the general service experience, including such features as waiting times, scheduling, and office environment. Acceptability may be measured from the perspective of various stakeholders, such as administrators, payers, providers, and consumers. We presume rated acceptability to be dynamic, changing with experience. Thus ratings of acceptability may be different when taken, for example, pre-implementation and later throughout various stages of implementation. The literature reflects several examples of measuring provider and patient acceptability. Aarons’ Evidence-Based Practice Attitude Scale (EBPAS) captures the acceptability of evidence-based mental health treatments among mental health providers (Aarons
2004). Aarons and Palinkas (
2007) used semi-structured interviews to assess case managers’ acceptance of evidence-based practices in a child welfare setting. Karlsson and Bendtsen (
2005) measured patients’ acceptance of alcohol screening in an emergency department setting using a 12-item questionnaire.
Adoption is defined as the intention, initial decision, or action to try or employ an innovation or evidence-based practice. Adoption also may be referred to as “uptake.” Our definition is consistent with those proposed by Rabin et al. (
2008) and Rye and Kimberly (
2007). Adoption could be measured from the perspective of provider or organization. Haug et al. (
2008) used pre-post items to capture substance abuse providers’ adoption of evidence-based practices, while Henggeler et al. (
2008) report interview techniques to measure therapists’ adoption of contingency management.
Appropriateness is the perceived fit, relevance, or compatibility of the innovation or evidence based practice for a given practice setting, provider, or consumer; and/or perceived fit of the innovation to address a particular issue or problem. “Appropriateness” is conceptually similar to “acceptability,” and the literature reflects overlapping and sometimes inconsistent terms when discussing these constructs. We preserve a distinction because a given treatment may be perceived as appropriate but not acceptable, and vice versa. For example, a treatment might be considered a good fit for treating a given condition but its features (for example, rigid protocol) may render it unacceptable to the provider. The construct “appropriateness” is deemed important for its potential to capture some “pushback” to implementation efforts, as is seen when providers feel a new program is a “stretch” from the mission of the health care setting, or is not consistent with providers’ skill set, role, or job expectations. For example, providers may vary in their perceptions of the appropriateness of programs that co-locate mental health services within primary medical, social service, or school settings. Again, a variety of stakeholders will likely have perceptions about a new treatment’s or program’s appropriateness to a particular service setting, mission, providers, and clientele. These perceptions may be function of the organization’s culture or climate (Klein and Sorra
1996). Bartholomew et al. (
2007) describe a rating scale for capturing appropriateness of training among substance abuse counselors who attended training in dual diagnosis and therapeutic alliance.
Cost (incremental or implementation cost) is defined as the cost impact of an implementation effort. Implementation costs vary according to three components. First, because treatments vary widely in their complexity, the costs of delivering them will also vary. Second, the costs of implementation will vary depending upon the complexity of the particular implementation strategy used. Finally, because treatments are delivered in settings of varying complexity and overheads (ranging from a solo practitioner’s office to a tertiary care facility), the overall costs of delivery will vary by the setting. The true cost of implementing a treatment, therefore, depends upon the costs of the particular intervention, the implementation strategy used, and the location of service delivery.
Much of the work to date has focused on quantifying intervention costs, e.g., identifying the components of a community-based heart health program and attaching costs to these components (Ronckers et al.
2006). These cost estimations are combined with patient outcomes and used in cost-effectiveness studies (McHugh et al.
2007). A review of literature on guideline implementation in professions allied to medicine notes that few studies report anything about the costs of guideline implementation (Callum et al.
2010). Implementing processes that do not require ongoing supervision or consultation, such as computerized medical record systems, may carry lower costs than implementing new psychosocial treatments. Direct measures of implementation cost are essential for studies comparing the costs of implementing alternative treatments and of various implementation strategies.
Feasibility is defined as the extent to which a new treatment, or an innovation, can be successfully used or carried out within a given agency or setting (Karsh
2004). Typically, the concept of feasibility is invoked retrospectively as a potential explanation of an initiative’s success or failure, as reflected in poor recruitment, retention, or participation rates. While feasibility is related to appropriateness, the two constructs are conceptually distinct. For example, a program may be appropriate for a service setting—in that it is compatible with the setting’s mission or service mandate, but may not be feasible due to resource or training requirements. Hides et al. (
2007) tapped aspects of feasibility of using a screening tool for co-occurring mental health and substance use disorders.
Fidelity is defined as the degree to which an intervention was implemented as it was prescribed in the original protocol or as it was intended by the program developers (Dusenbury et al.
2003; Rabin et al.
2008). Fidelity has been measured more often than the other implementation outcomes, typically by comparing the original evidence-based intervention and the disseminated/implemented intervention in terms of (1) adherence to the program protocol, (2) dose or amount of program delivered, and (3) quality of program delivery. Fidelity has been the overriding concern of treatment researchers who strive to move their treatments from the clinical lab (efficacy studies) to real-world delivery systems. The literature identifies five implementation fidelity dimensions including adherence, quality of delivery, program component differentiation, exposure to the intervention, and participant responsiveness or involvement (Mihalic
2004; Dane and Schneider
1998). Adherence, or the extent to which the therapy occurred as intended, is frequently examined in psychotherapy process and outcomes research and is distinguished from other potentially pertinent implementation factors such as provider skill or competence (Hogue et al.
1996). Fidelity is measured through self-report, ratings, and direct observation and coding of audio- and videotapes of actual encounters, or provider-client/patient interaction. Achieving and measuring fidelity in usual care is beset by a number of challenges (Proctor et al.
2009; Mihalic
2004; Schoenwald et al.
2005). The foremost challenge may be measuring implementation fidelity quickly and efficiently (Hayes
1998).
Schoenwald and colleagues (
2005) have developed three 26–45-item measures of adherence at the therapist, supervisor and consultant level of implementation (available from the MST Institute
www.mstinstitute.org). Ratings are obtained at regular intervals, enabling examination of the provider, clinical supervisor, and consultant. Other examples from the mental health literature include Bond et al. (
2008) 15-item Supported Employment Fidelity Scale (SE Fidelity Scale) and Hogue et al. (
2008) Therapist Behavior Rating Scale-Competence (TBRS-C), an observational measure of fidelity in evidence based practices for adolescent substance abuse treatment.
Penetration is defined as the integration of a practice within a service setting and its subsystems. This definition is similar to (Stiles et al.
2002) notion of service penetration and to Rabin et al.s’ (
2008) notion of niche saturation. Studying services for persons with severe mental illness, Stiles et al. (
2002) apply the concept of service penetration to service recipients (the number of eligible persons who use a service, divided by the total number of persons eligible for the service). Penetration also can be calculated in terms of the number of providers who deliver a given service or treatment, divided by the total number of providers trained in or expected to deliver the service. From a service system perspective, the construct is also similar to “reach” in the RE-AIM framework (Glasgow
2007b). We found infrequent use of the term penetration in the implementation literature; though studies seemed to tap into this construct with terms such a given treatment’s level of institutionalization.
Sustainability is defined as the extent to which a newly implemented treatment is maintained or institutionalized within a service setting’s ongoing, stable operations. The literature reflects quite varied uses of the term “sustainability,” but our proposed definition incorporates aspects of those offered by Johnson et al. (
2004), Turner and Sanders (
2006), Glasgow et al. (
1999), Goodman et al. (
1993), and Rabin et al. (
2008). Rabin et al. (
2008) emphasizes the integration of a given program within an organization’s culture through policies and practices, and distinguishes three stages that determine institutionalization: (1) passage (a single event such as transition from temporary to permanent funding), (2) cycle or routine (i.e., repetitive reinforcement of the importance of the evidence-based intervention through including it into organizational or community procedures and behaviors, such as the annual budget and evaluation criteria), and (3) niche saturation (the extent to which an evidence-based intervention is integrated into all subsystems of an organization). Thus the outcomes of “penetration” and “sustainability” may be related conceptually and empirically, in that higher penetration may contribute to long-term sustainability. Such relationships require empirical test, as we elaborate below. Indeed Steckler et al. (
1992) emphasize sustainability in terms of attaining long-term viability, as the final stage of the diffusion process during which innovations settle into organizations. To date, the term sustainability appears more frequently in conceptual papers than actual empirical articles measuring sustainability of innovations. As we discuss below, the literature often uses the same term (niche saturation, for example) to reference multiple implementation outcomes, underscoring the need for conceptual clarity as we seek to advance in this paper.