Main findings
This paper describes the development of the first measure of model fidelity for Crisis Resolution Teams: the CORE CRT Fidelity Scale. The CRT model specified within the scale is based on best available evidence and stakeholders’ priorities for CRT service delivery and organisation. The development of the scale has followed a structured and transparent concept mapping process. The 75-team pilot shows that the resulting CRT Fidelity Scale and review process have good feasibility and acceptability for use in CRTs. Results from the pilot show that the scale can distinguish higher and lower fidelity services overall, and that most items also generate a fairly balanced spread of scores across items’ five point scoring range. A minority of items where high scores were obtained by either very few or very many teams were retained with minimal changes for two reasons: first, further consultation with stakeholders confirmed that these items accurately describe important components of the CRT service model; second, a national survey of CRTs undertaken as development work for the fidelity scale [
22], and the fact that at least some teams in the 75-team pilot scored good or excellent fidelity, both suggest that the scoring criteria are attainable in practice.
A 9-team pilot of version 2 of the CORE CRT Fidelity Scale suggests it generates only modest changes to total and item scores compared to version 1, but increases the clarity of the scale. The inter-rater reliability testing of V2 of the CORE CRT Fidelity Scale indicates promising initial psychometric properties.
Limitations
Two limitations of the scale development work reported here are: the scope of consultation in the development of the scale; and the extent of psychometric testing conducted.
First, regarding consultation, we aimed to include all stakeholder groups’ views at all stages of the development of the scale through: i) a thorough review of available evidence, including qualitative research, surveys and guidance as well as empirical studies at the statement-generating stage; ii) inclusion of service user, clinical staff and other stakeholder representatives at statement selection, concept mapping and concept mapping cluster solution-stages; and iii) consultation with service users, carers and clinicians at the fidelity item development stage through research forums and national networks. However, participants in our concept mapping exercise were pragmatically recruited: clinicians slightly outnumbered service users and carers, and some important stakeholder groups (e.g. General Practitioners, service commissioners, emergency services staff (e.g. police and ambulance) were not represented. Most of those who participated in consultations were UK stakeholders: while a small number of participants from the Netherlands, Norway and Australia contributed to the concept mapping exercise, how far the CRT fidelity scale reflects the priorities of stakeholders in other countries, and is suitable for use outside the UK, remains to be investigated.
Second, more extensive testing of the psychometric properties of the CRT Fidelity Scale is desirable: for example, assessing the test-retest reliability of the scale. This was not attempted in our study. The demands and time consumed by a fidelity review day and preparation for it are considerable for participating CRT services. Expecting CRT teams to participate in a second review soon after the first, which would provide no additional benefit to the service, was therefore not feasible.
Exploring inter-rater reliability in vivo was also not attempted. During a fidelity review, the three reviewers all collect evidence from different sources, then meet at the end of the day to share information and agree fidelity scores. Because no single reviewer holds all the required information until scores are discussed, individual reviewers could not assess services’ fidelity independently, to allow comparison of scores.
The vignette exercise used to assess inter-rater reliability drew on information collected from actual reviews (anonymised, and combined from multiple services), giving the exercise a high degree of realism. It only tested the reliability of reviewers’ ratings from the available evidence however, rather than how consistently reviewers gather evidence during a review day. Moreover, conclusions about the inter-rater reliability of the CRT Fidelity Scale, when based on ratings from a single vignette, can only be viewed as provisional. Arguably however, the vignette exercise provided a harder test of inter-rater reliability than an actual review, in that participants in the exercise made their scores entirely independently. In a real fidelity review, by contrast, three reviewers discuss the evidence collected and how to score a service, and refer back to the CRT manager or others to collect more information where necessary. In this light, the moderately good inter-rater reliability of the scale indicated by the vignette exercise, is promising, and suggests the reviewer training and scoring guidance within the scale are adequate to allow the scale to be used reliably.
The rigorous initial evidence gathering, and positive feedback from stakeholders and CRT teams which participated in piloting, demonstrate that the CRT Fidelity Scale has good face validity. The concept mapping grouping exercise, and cluster structure underpinning the scale, also afford it good content validity. However, the criterion validity of the scale, i.e. the relationship between a high fidelity score and outcomes for CRT services, has yet to be explored. This step is critical to establishing the utility of a fidelity scale [
38]. Establishing criterion validity for the scale as well as for individual items is of particular importance, given the lack of empirical evidence about critical ingredients of CRT services [
23] which was available to inform scale development during the development of statements about CRT best practice or stakeholders’ prioritisation of statements in the concept mapping process.
Clinical implications
The successfully completed 75-team pilot suggests that the fidelity scale and accompanying audit process, involving clinician and service user or carer-reviewers, is feasible and acceptable to CRT services. Experience from piloting suggests three features of the review process can help to maximise the reliability of fidelity reviews and scoring, and thus enhance the potential clinical utility of the scale. First, reviewing teams should include at least three reviewers, to manage the workload of a review day, provide a range of expertise to inform the evidence review, and to help the reliability of the review scoring process by moderating any outlying views of individual reviewers. Second, using the interview guides and checklists developed for reviewers aids consistent collection and recording of information with which to score a team using the scale. Third, a right of reply by the CRT team manager to an initial draft of the fidelity report and scores can allow any additional evidence to be provided, or misunderstandings by the reviewing team to be corrected.
The CRT Fidelity Scale can be a useful tool for service planners, managers and commissioners in three ways. First, it provides clear specification of the CRT model, in more detail than has been previously provided in statutory or expert guidance [
7,
39]. This can guide service planners in setting commissioning specifications for Crisis Resolution Teams, and guide CRT managers and staff in setting service improvement goals. Second, a fidelity review using the CRT Fidelity Scale, which provides services with a score and feedback on each of the scale’s 39 items, can help CRT teams recognise their existing strengths and areas where service development is required. It can thus help teams to make service improvement plans and assess the impact of quality improvement initiatives. We have demonstrated that an external audit is feasible, acceptable, and, preliminary evidence suggests, adequately reliable. The CRT Fidelity Scale could in principle also be used for internal audit within health organisations, or self-audit by teams (although the reliability of scores when the scale is used this way is yet to be tested). Third, at a local or national level, the CRT Fidelity Scale can be used to generate benchmarking data about CRTs’ model fidelity, against which individual teams’ model fidelity can be assessed, or with which changes to service provision across a region or country over time could be explored. Our 75-team survey demonstrates that such a large-scale use of the CRT Fidelity Scale can be achieved. (All 75 reviews in our study were completed in less than 1 year.)
The value of the CRT fidelity scale to service planners and policy-makers is demonstrated by the speed with which it has been promoted and used by expert bodies and policy-making organisations nationally in the UK. The CRT Fidelity Scale is advertised as a “national inspiration” in the Crisis Care Concordat campaign led by NHS England [
40]. Benchmarking data from our 75-team survey have been used by the Care Quality Commission, the body responsible for quality inspections of health and social care services in England [
20], and the Royal College of Psychiatrists [
41] in recent reports presenting recommendations for CRT care.
Research implications
An important next step in developing the CRT Fidelity Scale is investigating its criterion validity. Fidelity scales assess elements of service or intervention structure and process: these can only be legitimately used as measures of service quality if a positive relationship to outcomes has been demonstrated [
42]. Investigating the relationship of teams’ CRT Fidelity Scale score to important outcomes like service users’ experience of and satisfaction with services, recovery following CRT care, and inpatient admission rates and bed use across a catchment area, is of high research interest. An initial exploration of the criterion validity of the CRT Fidelity Scale will be carried out in connection with a current trial of a CRT service improvement programme, also conducted as part of the CORE Study [
30].
A further question is whether the CRT Fidelity Scale can be an internationally useful measure, as other Fidelity Scales have become [
34]. There is a need to explore its feasibility and utility in non-UK contexts. Norway, as the other country where CRTs have been implemented nationally, is an obvious site for further testing of the scale.
While a fidelity review may in itself help services plan service improvement, there is also a pressing clinical need to develop and test resources to help CRT services meet the priorities of stakeholders, and achieve high model fidelity, with the aim of improving service outcomes. The development of a program manual, which can then be used in the training and supervision of programme staff, is advocated as a key tool in supporting high fidelity implementation of evidence based practices [
43]. Development of a CRT manual is required, informed by the CORE CRT Fidelity Scale. The CORE CRT Service Improvement Programme Trial [
30] will test a package of service improvement resources in a cluster randomised trial: 15 CRT teams will receive the package of resources over a 1-year period; 10 teams will act as controls. Fidelity reviews will be used to support and evaluate the service improvement intervention.
The development of the CRT Fidelity Scale also has two implications regarding research methods. First, concept mapping [
29] proved to be a useful method of developing and defining the CRT service model. It allowed the views of several stakeholder groups to contribute to developing the model, and provided a transparent basis for making decisions about which of many competing elements of CRT services to prioritise for inclusion in a fidelity scale. Second, the development and piloting of the CRT fidelity scale provides a model for patient and public involvement in research and clinical audit, as recommended to improve the quality and credibility of research [
44].