Background
Implementing new evidence-based interventions, technologies, and ways of organizing health care, with the purpose of improving clinical outcomes and patient experiences, is a complex challenge [
1]. If the implementation is not well executed, there is a risk that money and other resources will be wasted due to no or few real sustainable improvements being made. Therefore, in the field of implementation research, aiming to understand what, why, and how interventions work and to test approaches to improve them is urgent [
2]. Additionally, the importance of establishing theoretical bases for such research to (i) describe and guide, (ii) understand and explain, and (iii) evaluate implementation processes has been emphasized as important [
3].
The Normalization Process Theory (NPT) [
4] is an established middle range theory [
5] that has been categorized as a theory for enhancing the understanding and explanation of specific aspects of implementation [
3]. The NPT is based in social theory and provides an aid for structured interpretation of the issues being researched [
6]. It provides support for understanding the dynamics of implementing, embedding, and integrating interventions into routine practice, which in this framework is defined as normalization. It can be used as a conceptual tool, primarily for the investigation of the implementation of complex interventions in health care [
7].
The NPT is concerned with explaining what work people do—or need to do—with regard to implementing new practices, which is conceptualized in a set of four core constructs or organizing ideas that represent human processes. These four constructs are Coherence, Cognitive Participation, Collective Action, and Reflexive Monitoring (see Table
1). Coherence concerns the sense-making work that people do individually and collectively to operationalize new practices, while Cognitive Participation mirrors the relational work that people do to build and sustain a community of practice. Collective Action is the operational work that people perform to enact a set of practices, and Reflexive Monitoring is the appraisal work people conduct to assess and understand the ways that a new set of practices affect them and others [
4,
8]. According to NPT, it is also possible to investigate the probability or potential of a practice to normalize and become a work routine. The normalization potential [
9] can be understood by assessing the factors that are known to affect the implementation process in a specific setting and by the readiness of actors in the work of implementing a new practice and accepting it. The NPT has been widely used for qualitative analyses of the implementation of complex interventions in a diverse range of health care contexts, such as care of chronic kidney disease, chronic heart failure, tuberculosis treatment, maternity care, mental health care and e-health, and tele-treatment interventions [
5].
Table 1
Overview of the constructs of the Normalization Process Theory and NoMAD items by constructs
Coherence | Differentiation | I can see how the [intervention] differs from usual ways of working |
Communal specification | Staff in this organisation have a shared understanding of the purpose of this [intervention] |
Individual specification | I understand how the [intervention] affects the nature of my own work |
Internalization | I can see the potential value of the [intervention] for my work |
Cognitive Participation | Initiation | There are key people who drive the [intervention] forward and get others involved |
Legitimation | I believe that participating in the [intervention] is a legitimate part of my role |
Enrolment | I am open to working with colleagues in new ways to use the [intervention] |
Activation | I will continue to support the [intervention] |
Collective Action | Interactional workability | I can easily integrate the [intervention] into my existing work |
Relational integration | The [intervention] disrupts working relationships |
Relational integration | I have confidence in other people’s ability to use the [intervention] |
Skill set workability | Work is assigned to those with skills appropriate to the [intervention] |
Skill set workability | Sufficient training is provided to enable staff to use the [intervention] |
Contextual Integration | Sufficient resources are available to support the [intervention] |
Contextual integration | Management adequately support the [intervention] |
Reflexive Monitoring | Systemization | I am aware of reports about the effects of the [intervention] |
Communal appraisal | The staff agree that the [intervention] is worthwhile |
Individual appraisal | I value the effects the [intervention] has had on my work |
Reconfiguration | Feedback about the [intervention] can be used to improve it in the future |
Reconfiguration | I can modify how I work with the [intervention] |
The growing interest for implementation research has also brought about the development and validation of an increasing number of instruments for measuring implementation activity and progress from different theoretical perspectives [
10]. Martinez et al. advocate and provide guidance for the careful development and reporting of work to develop instruments for use in implementation science, in order to advance work in the field. So far, a limited amount of studies have developed NPT-based quantitative approaches [
11]. The Normalization Process Theory Measure (NoMAD) is one of the first instruments for measuring implementation from a NPT perspective [
8,
12]. The NoMAD is a 23-item instrument used for assessing implementation processes, which reflect the constructs of NPT (Table
1) and provide possibilities for adaptations for specific contexts and study protocols [
13]. It is aimed to be a sophisticated, yet simple to administrate, NPT-based assessment tool [
14] and is therefore anticipated to be potentially useful in a Swedish context.
The current study presents the processes of translation, adaptation, and pilot testing NoMAD to make it available for use in Sweden. It is a Swedish version of this instrument, which we have named S-NoMAD. In addition, we aimed at creating a digital version of S-NoMAD to make it easy to adapt for use in different contexts. The objectives were therefore to (1) translate the original (UK) version of NoMAD for use in the Swedish context and (2) undertake initial psychometric testing of the instrument in terms of reliability and validity, across a sample of staff involved in the implementation of co-ordinated care planning across health and social services. In doing so, the proposition that a Swedish-translated version of NoMAD can adequately assess the NPT constructs of Coherence, Cognitive Participation, Collective Action, and Reflexive Monitoring is tested.
Results
The translation and adaptation process
There was a high degree of consistency between the backward translation and the original version. Most of the adjustments concerned only precision of language. Some items in the original NoMAD were found to be difficult to translate as they contained words that do not convey the same meaning to Swedes. For example, in the backward translation the word ‘understand’ was used instead of ‘see’, ‘competence’ instead of ‘skills’, and ‘relevant’ instead of ‘legitimate’. How these slight differences in the use of words may change their meaning was discussed with the translator before the final version was decided upon.
The scoring expressions were the most challenging in terms of semantic equivalence. The response options in the original instrument, i.e. strongly agree, agree, neither agree nor disagree, disagree, strongly disagree, could not be used fully. In the Swedish translation, it was difficult to distinguish between the scoring options, due to a translation close to the English wording. We chose to use strongly agree, agree, neither agree nor contradict, contradict, and strongly contradict to make the options more clear. In the pilot test of the instrument, the participants had no queries concerning the wording of the questions of the S-NoMAD questionnaire.
The results of the first content validity analysis showed a S-CVI of 0.84, which is slightly above the recommended level of 0.80, and an I-CVI ranging from 0.5 to 0.92. Four of the items had values less than the critical value of 0.78. The experts thought the items with a low I-CVI were expressed in a difficult and complicated way and that they were difficult to understand because of ambiguous and vague wording. For example, ‘There are key people who drive the intervention forward and get others involved’—this expression addresses two activities in one question, which the experts thought was misleading. Another example of items with low I-CVI was ‘The staff agree that the intervention is worthwhile’. In the first version of the Swedish translation, this had defensive and negative connotations (worth the effort).
A second content validation analysis was carried out on the revised set of items, with I-CVI scores above 0.83 and an S-CVI of 0.96.
The analysis of data from the interviews showed that the majority of the experts welcomed the underlying idea of assessing how well an implementation has been embedded in normal work and using an instrument to do this. Several of the people ran implementation projects and had been looking for a suitable instrument. They welcomed the translated instrument and planned to use it in their future work.
Psychometric results
Internal construct validity
The first factor Coherence consists of four items. Statistical analysis indicates that one factor should be sufficient (p = 0.824) and the measures of fit are all acceptable or better indicating that one factor is sufficient with adequate fit. The same applies to the second factor Cognitive Participation consisting of four items (p = 0.326), which also has an appropriate fit.
For the third factor Collective Action, which has seven items, one factor is not sufficient (p value < 0.01) and the fit measures indicated a bad fitting model. Further analysis indicated that two items were the cause of the bad fit. Removing these two items from the factor model yielded a one-factor model (p = 0.339) and good fit. For identification reasons, it is not possible to estimate a factor model where only two items load on one factor. Hence, we did not estimate a two-factor model, but a one-factor model with the remaining items.
For the fourth factor Reflexive Monitoring with five items, we had a similar problem where one item caused rejection of the one-factor model (p = 0.027) and bad fit. Discarding this item gave a one-factor model with four items (p = 0.873) and good fit.
Most of the factor loadings are between 0.52 and 0.97 when normalizing the factor variances to one (
Appendix 2). The exception is one of the loadings for the factor Coherence of 0.39 whereas the other loadings were 0.77, 0.9, and 0.83 respectively. In general, the loadings of the remaining three factors are of a similar size, indicating factor models with no problems of interpretation.
Due to the limited number of observations (
n = 144), we did not estimate a full four-factor model to be able to estimate the correlations between the latent factors. Instead, a four-factor model with restrictions on the loadings was estimated. The restrictions came from the above estimated one-factor models. The only free parameters were the correlations between the factors. Table
3 displays the estimated correlations. The correlations between the factors are high, or even very high—ranging from 0.356 up to over 0.9.
Table 3
Results from analysis of internal construct validity and internal consistency, after exclusion of three items
Coherence | 0.386 | 0.824 | 0.000 | 0.016 | 1.000 | 1.007 | 0.806 |
Cognitive Participation | 2.239 | 0.326 | 0.030 | 0.034 | 1.000 | 0.999 | 0.793 |
Collective Action | 5.674 | 0.339 | 0.032 | 0.039 | 0.999 | 0.998 | 0.831 |
Reflexive monitoring | 0.271 | 0.873 | 0.000 | 0.012 | 1.000 | 1.016 | 0.782 |
Internal consistency
The internal consistency of the four factors and 17 items of the S-NoMAD had Cronbach’s alpha values of above 0.79 (Table
4). An alpha of about 0.8 implies a random error of 0.36, indicating that the factor models yield good reliability.
Table 4
Correlation between the constructs (factors) of the Normalization Process Theory
Coherence | 1 | | | |
Cognitive Participation | 0.647 | 1 | | |
Collective Action | 0.797 | 0.356 | 1 | |
Reflexive Monitoring | 0.920 | 0.698 | 0.909 | 1 |
Discussion
The current study presents the translation process, pilot testing, and psychometric analysis of the Swedish version of the original NPT-based British instrument NoMAD [
4,
12,
14], known as S-NoMAD. This study contributes to the development, pilot testing, and evaluation of a questionnaire for measuring success in the implementation of complex interventions in health care for use in different Swedish health care contexts. The analysis of construct validity, based on the CFA and goodness-of-fit indices (SRMR, RMSEA, and CFI), showed good fit to the hypothesized model after deleting three items with low internal consistency. These deleted items were ‘The intervention disrupts working relationships’, ‘I have confidence in other people’s ability to use the intervention’, and ‘Feedback about the intervention can be used to improve it in the future’ (see Table
1), which might need to be revised in future revisions of the S-NoMAD. However, the final factor analysis yielded satisfactory factor loadings, suggesting that S-NoMAD reflects the constructs of the NPT [
4,
13]. The internal consistency for the four constructs reflected by S-NoMAD (Coherence, Cognitive Participation, Collective Action, and Reflexive Monitoring) in terms of Cronbach’s alpha values ranged from 0.76 to 0.83 and can be considered as indicating good reliability, in concordance with other studies including the results from the still ongoing initial psychometric evaluations of the original NoMAD instrument [
14].
The methods used in the present study were chosen with caution to ensure that the outcome should provide psychometric standards that are as credible as possible. The translation methodology used here, including forward and backward translation, has been recommended as a reliable method for translating instruments for research utilization [
18]. Additionally, several experts participated in the translation process to secure cross cultural validity of S-NoMAD [
23]. We also used two rounds of expert panels and extensive discussions with them and others, at researcher seminars, to ensure that the nuances of the languages were correctly interpreted. The cultural adaptation was performed throughout the entire translation and development process, so it was not considered as being a separate step. This meant that words and expressions were questioned and discussed at all stages of the process until consensus was reached.
Despite the changes that we made (in wording), we consider the adaptation of the original NoMAD to the Swedish version, with its four steps [
18] including forward and backward translation, to be carefully and methodically performed and conducted with sensitivity to the original purpose and theoretical foundation of the instrument. Thus, the core of the instrument should remain the same. However, there is no golden standard for instrument translation and adaptation, rather the use of multiple methods, which we applied in our study, is commonly recommended [
24]. The CVI methodology used proved to be an important way of visualizing problematic expressions or items, which from the expert panel’s point of view was considered less relevant. It therefore served as a basis for further analysis and discussions with the research team, but was not the sole criteria for item removal or alteration. In combination with the assessment of CVI, we also used interviews with the experts of the panel, which contributed to the adjustment of the Swedish instrument and governed the development process. This enabled interpretations of the reasons why some items got low CVI scoring, which helped us to improve some of them. It is to be noted that the CVI methodology can aid the handling of already existing items, which correspond to the aim of the present study. This, however, is not useful for the generation of other (new) items that might be of importance to adequately measure the underlying construct [
24]. On the other hand, the original NoMAD has been tested for relevance earlier in the item generation process [
12]. However, in a translation process, the semantic meaning may be lost and a new test of the relevance of the translated instrument is strongly recommended [
23].
The very high response rate of over 98% for the pilot test of S-NoMad, which we used for psychometric analyses, is a clear strength. This can be compared to a response rate of > 50% that is commonly viewed as sufficient for most purposes, even though lower response rates are the norm. However, the sample size and population size are also of importance for calculating a sufficient response rate [
25]. There was a variation of the questionnaire item non-response with a higher response rate for items in the beginning and lower response rate for items in the last section of the S-NoMAD (see
Appendix 1). This variation might be related to the length of the questionnaire, rather than lack of relevance or comprehensibility since the respondents did not express any doubts when filling in the questionnaire [
26]. A shorter questionnaire will obviously take less time and effort to complete for the respondents [
27] and might be preferable. Our findings on statistically lower performing items, if replicated in other studies reporting the use of NoMAD, can contribute to the future reduction of the item set through further validation. It may also be worth noting that reflexive monitoring items, which appear at the end of the S-NoMAD, are about appraisal of impact and some respondents can find these more difficult to answer. Possible explanations for this may be found in the fact that in the reflexive monitoring section, some of the questions are about future issues such as the provision of resources in the implementation project, which most of the participants in our pilot study did not have a task assignment for nor a possibility to influence. This is supported by the findings of 108 NPT framed studies synthesized by May et al. [
11], which revealed Reflexive Monitoring to be the least often applied theoretical construct in the studies. This was because many of the studies reported were feasibility studies, where the impact of monitoring was under-explored. Nevertheless, the respondents in the pilot study gave seemingly adequate responses to the items (see
Appendix 1) without associated notable problems.
As mentioned above, the analysis of internal construct validity based on the pilot results and by the use of a CFA [
21] indicated a bad fitting model for three of the of S-NoMAD items, leading them to being excluded from the final model. Our interpretations of these results include speculations about cultural language-related differences between expressions in Swedish and in English. For example, the question asking if the intervention disrupts working relationships might be semantically problematic in the Swedish context. The word ‘disruption’ might be too strong in the present context, since Swedish professional relationships are typically built on consensus. The item ‘I have confidence in others’ ability to use the intervention’ also showed a low fit according to the CFA. This might reflect that it can be more demanding to judge others’ ability to execute working tasks than it is to report ideas about one’s own performance. However, the item is relevant since according to NPT [
4] and other implementation theories such as the theory of organizational readiness for change [
28], the implementation of more substantial changes in health care requires collective actions, reflexions, and peer support to build communal engagement.
Another result that needs to be considered is the partly high correlation between the four factors (representing the NPT constructs), which was unexpected since this has not been shown in earlier studies concerning the original NoMAD, which showed more moderate correlations among constructs [
14]. The high correlations between S-NoMAD factors may be related to the relatively small sample size and a data collection performed on only one occasion, in relation to an introduction and before the intervention had been initiated in daily practice. In the present study, the sample size was just above the recommended size in psychometric testing in order to reach a stable co-variation among the items (10 samples per item) [
18,
29]. However, the result may also be traced back to the conceptual and semantic equivalence of the translated instrument. For example, the words in the scoring steps used in the pilot test might be too close to each other in order to correctly discriminate the answers (strongly agree, agree, neither agree nor disagree, disagree, strongly disagree). In the translation process, we wanted to stay as close to the original wording as possible, which might have resulted in a translation that semantically differs somewhat from the original language. In a later developed version of S-NoMAD, we slightly changed the scoring expressions of one ‘middle response’ alternative of the scale to improve language clarity, but we did not adjust any endpoints. Given that all of the psychometric tests are relational (and that they all use the same scale) rather than comparative in any absolute sense, we judge that this adjustment of scoring method will only have a very minor influence on the results. A high correlation could also be a sign of item redundancy, which risks diminishing content validity, if the items do not provide one item’s worth of new information related to the NPT construct in question [
30]. On the other hand, all the items tapping different attributes of NPT should, therefore, be at least moderately correlated. Otherwise, the homogeneity and internal consistency of the instrument is at risk of being reduced [
31]. Considerations concerning appropriate levels of correlations should be allowed to influence the interpretation of the current results and should be analysed again in future revisions and with more extensive tests of the S-NoMAD instrument.