Abstract

Objectives: In regards to pain-related fear, this study aimed to: (1) identify existing measures and review their measurement properties, and (2) identify the optimum measure for specific constructs of fear-avoidance, pain-related fear, fear of movement, and kinesiophobia. Design: Systematic literature search for instruments designed to measure fear of pain in patients with persistent musculoskeletal pain. Psychometric properties were evaluated by adjusted Wind criteria. Results: Five questionnaires (Fear-Avoidance Beliefs Questionnaire (FABQ), Fear-Avoidance of Pain Scale (FAPS), Fear of Pain Questionnaire (FPQ), Pain and Anxiety Symptoms Scale (PASS), and the Tampa Scale for Kinesiophobia (TSK)) were included in the review. The main findings were that for most questionnaires, there was no underlying conceptual model to support the questionnaire's construct. Psychometric properties were evaluated by diverse methods, which complicated comparisons of different versions of the same questionnaires. Construct validity and responsiveness was generally not supported and/or untested. Conclusion: The weak construct validity implies that no measure can currently identify who is fearful. The lack of evidence for responsiveness restricts the current use of the instruments to identify clinically relevant change from treatment. Finally, more theoretically driven research is needed to support the construct and thus the measurement of pain-related fear.

1. Introduction

Over the last decennia, fear of pain and/or injury has become an integral part of our understanding in the explanation of disability in patients with persistent musculoskeletal pain [1]. Pain-related fear has been identified in the general population [2] as well as in various patients’ groups with persistent musculoskeletal pain [39]. Subsequently, treatment strategies have been developed to reduce fear and disability in patients with low back pain [10, 11], as well as neuropathic pain [4]. In the clinical setting fear is acknowledged as an important aspect in patients’ disability which needs to be addressed to achieve successful outcome. Although the role of fear in pain-related disability seems thus well established, consensus concerning proper assessment and interpretation of fear in relation to pain is currently still lacking.

One of the reasons for this is the fact that fear in relation to pain has been described with a variety of conceptual definitions among which pain-related fear, fear-avoidance beliefs, fear of movement, and kinesiophobia are the most commonly used. These are, however, constructs rather than a disorder or other pathological state in and of itself. The relationship between fear and pain was first described by Lethem et al. 1982 in “the fear-avoidance model of exaggerated pain perception” [12], in which fear and pain were both presented as associated with behaviour through avoidance learning. Almost a decade later, Kori et al. presented their thoughts about kinesiophobia [13]. Kinesiophobia was originally defined as a condition in which a patient has “an excessive, irrational, and debilitating fear of physical movement and activity resulting from a feeling of vulnerability to painful injury or reinjury”. The model that has gained most interest among researchers and clinicians alike is the “cognitive-behavioural fear-avoidance model” [14]. In that model fear of movement was identified as an important factor to result in disability, disuse, and depression in patients with persistent musculoskeletal pain. A subsequent construct is pain-related fear [15], defined as fear that incorporates fear of pain, fear of injury, fear of physical activity, and so forth. Closely related constructs are “fear-avoidance beliefs” [16] and “pain-related fear-avoidance beliefs” [17], but for none of them a conceptual definition is available. In the literature, the above-mentioned constructs are often used interchangeably, although they are not synonyms. Therefore, it seems that, currently, the conceptual framework of the various constructs used in relation to fear and pain are far from clear.

Despite the limited evidence available concerning the underlying constructs, over years several questionnaires have already been developed for the assessment of fear in relation to pain [16, 18, 19]. The Tampa Scale for Kinesiophobia (TSK) is the oldest existing measure and was designed 1991 by Miller et al. [19] to measure the patient’s subjective estimation of kinesiophobia. The Fear-Avoidance Beliefs Questionnaires (FABQ) [16] and the Pain Anxiety Symptoms Scale (PASS) [18] were subsequently developed. These questionnaires are currently all used in research as well in clinical practice, in order to identify problem areas and to aid in the design of targeted treatment strategies for the patient. However, no systematic evaluation of the psychometric properties of the existing measures has ever been performed.

Therefore, the aims of the present study were threefold; first to identify the existing measures, second to review the measurement properties, and thirdly to identify the optimum measure for specific constructs fear-avoidance, pain-related fear, fear of movement and kinesiophobia.

2. Methods

2.1. Search Strategy

A systematic literature search for instruments specifically designed to measure fear of pain in patients with persistent pain was performed. The databases used for this search included PubMed, Cinahl, Embase, PsycINFO, and articles published between January 1990 to June 2009 were saved. The search was conducted in two steps. First the search items “assessment” and “pain” were combined with the subordinate terms “fear avoidance”, “pain-related fear”, “fear of movement”, and “kinesiophobia”. Once the relevant questionnaires were found, the second step was performed to assess the measurement properties of the various questionnaires using the search terms “reliability”, “validity”, and “psychometric”.

2.2. Inclusion Criteria

Article abstracts were read to determine which questionnaires were included. An article was included if it contained research on adults (>18 years of age) with persistent pain and if the article was written in the English language. All articles that met inclusion criteria were reviewed by the two authors (M. Lundberg and A. Grimby-Ekman.). In case of disagreement between the reviewers, an independent reviewer (J. Verbunt) was consulted in order to obtain a consensus.

2.3. Assessment Criteria

The review criteria included an evaluation of the following psychometric steps: Conceptual and measurement model, reliability, validity, responsiveness, interpretability, practicality, and cross-cultural applicability [20]. To assess the levels of the psychometric steps mentioned above we used criteria adjusted after Wind et al. [21] (Table 1).

2.4. Psychometric Terms Defined for This Study

The following definitions were used in the current study.

A conceptual model is a rationale for and a description of a concept(s) that the measure is intended to assess and describes the relationship between the concepts included [20].

Reliability is the extent to which a measurement is consistent and free from error [22, 23]. Reliability can be assessed based on various subconstructs. For the purpose of this study we have chosen to divide reliability into internal consistency [24] and reproducibility [20].

Measurement Validity refers to the extent to which an instrument measures what it is intended to measure [22, 23, 25]. Validity can be assessed based on different components [22, 23, 26]. For the purpose of this study validity will be presented as face-, content-, construct-, and criterion-related validity [20].

Responsiveness describes the instrument’s ability to detect change. Less frequently used, but clinically important terms are interpretability and practicality. Interpretability describes the degree to which one can assign easily understood meaning to an instrument’s quantitative scores. Practicality refers to aspects as time, effort, and other demands placed on those on which the instrument is being administered (respondent burden) or on those who administer the instrument (administrative burden).

The cross-cultural adaptation is about assessment of conceptual and linguistic equivalence and evaluation of their psychometric properties. For the purpose of this study, we have chosen to evaluate the cross-cultural adaptation separately for each instrument.

3. Results

The initial search strategy identified 588 abstracts from the PubMed, Cinahl, Embase, PsycINFO, and Web of Science databases. After removing duplicates, 37 abstracts were retrieved for further review. Of the 37 articles, 13 articles addressed the FABQ, one article the FAPS, 2 articles the FPQ, 10 articles the PASS, and 12 articles the TSK. The results are presented as a brief summary of the assessment of the measurement properties of all questionnaires (Table 2). In addition, a more detailed overview of the included articles is presented in Table 3.

3.1. Questionnaires Identified in Relation to the Conceptual Definitions

Fear of Pain Questionnaire (FPQ) and Pain Anxiety Symptoms Scale (PASS) were identified as being designed to measure the construct “pain-related fear”. The Fear Avoidance Beliefs Questionnaire (FABQ) and the Fear-Avoidance of Pain Scale (FAPS) were identified as designed to measure the construct “fear-avoidance beliefs”. The Tampa Scale for Kinesiophobia (TSK) was identified to measure “kinesiophobia”. No questionnaire was found to assess the construct “fear of movement”. Figure 1 demonstrates the relationship between the constructs and the identified measures.

3.2. Descriptions of the Included Questionnaires

The Fear-Avoidance Beliefs Questionnaire (FABQ) was originally developed by Waddell et al. [33] to assess fear-avoidance beliefs in patients with back pain. The original version of the FABQ contains 16 items divided into two subscales: fear-avoidance beliefs about physical activity (5 items) and fear-avoidance beliefs about work (11 items). Each of the 16 items are rated on a seven-point Likert Scale (“do not agree at all” = 0 to “completely agree” = 6). The FABQ is available in nine languages, with each language version evaluated in relation to its psychometric properties.

The Fear Avoidance of Pain Scale (FAPS) was constructed by Crowley and Kendall [42] to measure fear avoidance of activities. The FAPS comprises 21 items ranging from 0 (never) to 6 (all the time). Patients are asked to rate how often the mentioned activity occurs.

The Fear of Pain Questionnaire (FPQ) is presented to measure fear of pain and is based on the work of Lethem et al. [12]. The FPQ versions used in the two included articles consist both of a total number of 30 items, but the individual items included differ however in both articles. The FPQ items are statements briefly describing painful situations, and the patient is asked to mark the “amount of fear” on a scale of 1 (not at all) to 5 (extreme) for each item. Three subscales are reported: fear of minor, severe, and medical pain. A total score is used.

The Pain Anxiety Symptoms Scale (PASS) was originally designed to assess fear of pain [18] in relation to the three response modalities: cognitive, physiological, or motor response domains. The patients are asked to rate their score for each item on a scale from 0 (never) to 5 (always). Items on the PASS are measured on a 6-point Likert scale. Different versions of the PASS exist including the 53-item questionnaire and an abbreviated version consisting of 20 items.

The Tampa Scale of Kinesiophobia (TSK) was designed to measure kinesiophobia [19]. The original TSK consists of 17 items. Each item is evaluated on a 4-point Likert scale with scoring alternatives ranging from “strongly disagree” to “strongly agree”. A total sum is calculated after inversion of the individual scores of items 4, 8, 12, and 16. The total score can vary from 17 to 68. The TSK is available in four languages, but can contain various versions of the TSK. For instance, the various English language versions contain 4, 11, 13, or 17 items.

3.3. General Summary according to the Psychometric Properties for All Questionnaires

None of the articles reviewed provided a rationale reasoning regarding the conceptual and measurement model related to the questionnaire. Extensive work on the definition of psychometric properties has been performed on the FABQ and the TSK. At present, the FABQ appeared to be the best available measure, in terms of psychometric properties, to measure the concept “fear-avoidance beliefs”, the PASS seems to be the best available measure to measure “pain-related fear”, and the TSK is the best available to measure “kinesiophobia”. As shown in Table 2, for various questionnaires information concerning reliability and validity is often lacking. In addition, for almost all measures evidence to confirm its responsiveness is lacking. Measures of interpretability and practicality were rarely or never addressed and hence not evaluated. Each identified measure is analysed in depth below.

3.4. Fear-Avoidance Beliefs Questionnaire (FABQ)

In total, 16 articles were included in the review of the FABQ. These articles addressed the psychometric properties of the Chinese [29, 30], Dutch [31, 32], English [16, 34], French [35], German [3638], Greek [39], Hebrews [40], Norwegian [27], and Spanish language versions [41], with each language version presented separately (Tables 2 and 3).

As shown in Table 2 the internal consistency of the FABQtotal as well as its subscale FABQwork was assessed as high in all language versions. However, the internal consistency of the subscale FABQphysical activity appeared to be only low to moderate in most language versions. The FABQ reproducibility as reported ranged from low to high, whereas the reported FABQ validity could be classified as low. The quality of the FABQ validity analysis procedures as performed could be classified as compromised. Study objectives were often not clearly focussed, which complicated identifying the validation procedure as performed. As shown in Table 2, only limited information appeared to be available related to the FABQ’s content validity. Most often, the construct validity was assessed using factor analyses. Various factor solutions were presented ranging from one to four, with two being the most common factor solution. The items within the factor solutions varied, but were mostly robust. As shown in Table 3, a variety of measures were used to either converge or diverge in relation to the FABQ. Testing conditions were not reported in any of the studies.

The criterion-related validity, both concurrent and predictive, of the FABQ was ranging from low to moderate. The presentation of the rationale for the choice of the criterion-related measures was often lacking, which complicated the interpretation.

3.5. Fear Avoidance of Pain Scale (FAPS)

One article appeared to be available that evaluated the psychometric properties of the FAPS. Based on this information, the internal consistency is considered as high whereas its reproducibility seems low. Concurrent validity was ranging from low to moderate. Moreover, the evidence supporting validity of the FAPS is poor in all versions. It is also unclear what subscale of the FAPS would be preferable to use.

3.6. Fear of Pain Questionnaire (FPQ)

Our selection of articles identified 16 articles that evaluated psychometric properties of the FPQ. However, only two of these articles [43] fulfilled our inclusion criteria. The FPQ measures fear of pain and is to be based on the work of Lethem et al. [12]. However, a remarkable finding seems to be the fact that several of the selected articles refer to one specific document, classified as proceedings of a meeting, in which the origin of the FPQ [45] is reported to be described. The availability of this FPQ key manuscript seemed however rather limited since we were not able retrieve it, even after a thorough search in the international literature. The FPQ as referred to in the two articles included in the final selection consisted of 30 items. All items describe painful situations, and the patient is asked to mark the “amount of fear” he/she is experiencing related to each item on a scale of 1 (not at all) to 5 (extreme). The three subscales mentioned are fear of minor, severe, and medical pain. A total score is used. As reported the internal consistency varied between low in the medical scale to high in the total scale. The content validity was considered low.

3.7. Pain Anxiety Symptoms Scale (PASS)

In total, 12 articles were included in the review of the PASS. Articles discussing the psychometric properties of the English [18, 28, 34, 4749] and Dutch language [46] versions were addressed (Table 1). The reliability was high in all PASS language versions, except for the English 20-item version of the subscale PASSphysiological symptoms of anxiety which was reported to be only moderate. As for the FABQ, the description of the validation procedure for the PASS was often lacking in the articles reviewed. In addition, the diversity of versions of the PASS complicated comparisons between studies. Content validity ranged between low and moderate in the English and the French versions of the PASS. Construct validity was found to be low in the combined Dutch and American study [46]. As shown in Table 3, a variety of measures were used as a criterion variable in the various validation procedures. Again motivation for the selection of these criterion measures was often not available. Concurrent validity was low to moderate in the Dutch and English version of the PASS [59]. Responsiveness was addressed in only one study. In addition to the original version, two versions of the PASS are available in Dutch.

3.8. Tampa Scale for Kinesiophobia (TSK)

A total number of 12 articles addressed the psychometric properties of the Dutch [5052], English [5356], Norwegian [57], and Swedish version of the TSK [7]. Reliability was not evaluated at all in the Norwegian version of the TSK. The reliability of the English version ranged from low in the 13-item version subscale TSKtsf [55] to high in TSKtotal. The fact that there are three different variations of the English language version available complicates its final qualification. The reliability of the various Dutch versions ranged from low on the 13-items subscale TSKpathologic somatic focus to high on the TSKtotal in a group of patients with Fibromyalgia. The support for reliability of the Swedish version was found to be high in a group of patients with persistent low back pain [7]. Validity was found to be low in all versions of the TSK.

4. Discussion

Five questionnaires, Fear of Pain Questionnaire (FPQ), Pain Anxiety Symptoms Scale (PASS), Fear-Avoidance Beliefs Questionnaire (FABQ), the Fear Avoidance of Pain Scale (FAPS), and the Tampa Scale for Kinesiophobia (TSK), were identified to assess fear in relation to pain. All questionnaires had weaknesses in relation to their psychometric properties, poor reliability and validity indicating a lack of construct validity. At present the FABQ seems to be the best available questionnaire to measure the concept “fear-avoidance beliefs”, the PASS seems to be the best available questionnaire to measure “pain-related fear”, and the TSK is the best used to measure “kinesiophobia”.

4.1. Conceptual and Measurement Models

As shown in this study the description of the underlying conceptual model appeared to be poor (FABQ, FAPQ, PASS) to nonexistent (TSK). The fact that no or only limited information is available concerning the description of the conceptual model has direct impact on the manner in which the concepts are consecutively operationalised and measured. As a consequence of that, it is currently unclear how individual scores obtained from either of these questionnaires could be interpreted for research and clinical settings. Although in clinical practice several of the presented questionnaires are already frequently in use in order to screen individual patients for fear related to pain, based on the results of this review, it could be concluded that the interpretation of the scoring has to be performed with caution. For example, what would be the clinical relevance of a high score on the FABQ? Based on the results of the current review, it can only be concluded that in such a situation, a patient who has elevated scores on the FABQ cannot be interpreted as having high fear-avoidance beliefs. A reported observation of Linton et al. [60] agrees with this finding. In one of their studies, patients showed a clinical relevant decrease in their fear-avoidance beliefs as observed by the rehabilitation staff, although this change was not represented in a change in TSKscoring. Patients continued to have high scores even after successful participation in the treatment. This finding may suggest that the construct validity of the TSK version used was poor as the questionnaire did not target fear as it was intended to or that the measure was not sensitive enough to pick up the actual clinical change. It can be concluded that there is an ambiguity in relation to the conceptual framework, and there are several different constructs to define what fear is in relation to pain.

It should be emphasized that none of the measures seem to be explicitly based on the frequently referred to and well-established “cognitive-behavioural fear-avoidance model” [14]. All of the identified questionnaires (FABQ, FAPS, FPQ, PASS and TSK) were already designed before the introduction of Vlaeyen’s fear avoidance model in 1995. This fear avoidance model has been further elaborated on by several other researchers [6163]. In addition, two recent models have tried to incorporate alternative activity-related behavioural strategies, among these are the Ergomania model [64] and the Avoidance-Endurance model [65]. In summary, there seems to be a discrepancy between the moment the measures and the conceptual models available on fear related to pain were introduced. This leads to the question, what the questionnaires in their current versions really measure.

4.2. The Link between Models and Measures

At this stage, the reader might wonder whether the questionnaires can be used at all. Closely linked to the conceptual framework is the validity of each questionnaire. In the current study, the evaluation of the evidence concerning validity was complex. It must however be noted that establishing validity of an instrument is extremely difficult [26] what seems to imply that no questionnaire can be labelled as 100% valid [66]. In the current study, firstly, the validation procedure was often poorly described, complicating quality judgment of this procedure. Some of the authors, however, succeeded to guide the reader through the procedure [27, 54]. Secondly, a variety of measures were used as external criterion in the concurrent/criterion validity process. As shown in Table 3 all of the selected questionnaires were validated against other questionnaires as part of the validation process. However, the validity of the instrument used as the “gold standard” appeared to be in many occasions less established as it was assumed. Based on the results of the current review, it was shown that the FABQ, PASS, and TSK were all used to evaluate the validity of a reference instrument which has been included during the original validity evaluation of the questionnaire itself. Based on this circular reasoning, it can however only be concluded that both instruments measure the same construct, but whether this construct is indeed the desired construct is still unknown. For instance, studying the association between the TSK and the FABQ will result in information regarding the construct validity [67] of the questionnaires, but will not evaluate the concurrent/criterion validity for which a gold standard scale is required. In the articles reviewed, most authors referred indeed to the validation as an evaluation of construct/convergent rather than concurrent/criterion validity. However, it should be noted that even if the term for the validation is used properly, the weakness of construct/convergent validity is still present. Moreover, whether a test is valid or not is ultimately a matter of opinion based on the evidence available describing its validity [26].

Some researchers have argued that an evaluation of construct validity is the only proper way to evaluate the validity of an instrument [22, 26]. During construct validity evaluations, three essential steps have to be taken. Firstly, the domain of observables related to the construct has to be specified; secondly, the extent to which observables tend to measure the same construct has to be determined; thirdly studies and/or experiments have to be performed to determine the extent in which supposed measures of the constructs are consistent with “best guesses” about the construct. According to Nunnally and Bernstein [22], researchers often tend to develop a measure of a construct and then leap to the third aspect, for example, correlating a particular measure of anxiety with a particular measure of shyness instead of tightly defining the initial domain of observables for the construct. The challenge that lies ahead is to pool all the observations together by all these measures and place them into a new elaborated framework.

4.3. Responsiveness

Based on the results of our review, it can be concluded that the information currently available on psychometric properties of the instruments is too limited to establish their quality as diagnostic tests. For example, at the moment evidence-based cut-off scores for the various instruments are still not available. In addition, the limited information regarding responsiveness of the questionnaires complicates the use in clinical practice, since information on a clinically relevant change of an instrument is not available. Information on responsiveness was only available for the FABQ (German, French, Norwegian, Spanish, and Chinese versions and the TSK 11-item English version). Whether these instruments are applicable for screening patients for pain-related fear seems an important topic for future research. To support the appropriateness as an outcome measure, more information must be gathered on the responsiveness of the various measures.

4.4. Interpretability and Practicality

Both interpretability and practicality are seldom discussed in published articles, even though the evaluation of both constructs seem highly relevant in the questionnaire selection process. In the present review, these issues were only addressed in relation to the Spanish version of the FABQ [41]. In clinical practice, both criteria seem relevant and need to be highlighted for future research.

4.5. Cross-Cultural Applicability

During cross-validation, both language and cultural differences should be taken into account. For example, can a questionnaire which has been reported as reliable and valid in Chinese based on cross-cultural validation between languages be used throughout entire China, representing various intercultural differences within this enormous country? As for the TSK, in Sweden and Norway, two countries with a comparable cultural background, two separate versions of the TSK are used, whereas in Belgium and Holland, the same language versions are used, without considering possible cultural differences. Furthermore, the focus on language excludes immigrants, and there is a limitation in the generalization of the results.

4.6. Methodological Considerations

This review was based on an evaluation of psychometric properties. It is evident that there is no such thing as a “gold standard” criteria for psychometric testing. Even within the domain of research on psychometric evaluation, there is no consensus as to what should be included in an analysis of reliability and validity [66]. We therefore chose to base our analyses of the psychometric properties on a modified version of the Wind-criteria [21]. For this purpose, the original Wind criteria was adjusted based on the modification used in criteria of Grotle et al. [68] and Larsen and Marx [69]. We would also like to clarify that the rationale for using criteria instead of simply presenting the data as shown in Table 3 was to provide a guidance for the reader who is trying to decide which measure to use.

There are, however, other methods available to evaluate reliability and validity. Psychometric theory assumes that the data is normally distributed and treated as data on at least an interval level, while others argue that data from questionnaires are ordinal data [70, 71]. Svensson [72] has developed a family of non-parametric rank-invariant methods that are valid for all types of ordered data without assumptions about their distribution. Bunketorp et al. [58] applied Svensson’s method to evaluate the reliability of a slightly different version of the TSK-SV and found it reliable.

Another model available for evaluation of measurement quality is the Rasch Model. [73], Rasch models [50, 52] are logistic models in item response theory in which a person’s level on a latent trait and the various items on the same latent variable can be estimated independently. The Rasch model was applied to the Norwegian version of the TSK by Damsgard et al. [57].

4.7. Limitations of the Study

Other instruments often presented in association with the evaluated constructs are the Photographs Series of Daily Activities (PHODA) [74] the Pain Beliefs Screening Instrument (PBS) [75], Örebro Musculoskeletal Pain Questionnaire [76], and the Pain and Impairment Relationship Scale (PAIRS) [77]. These instruments were not included in this review for various reasons. The PHODA was not identified as part of the search, since it was originally presented in Dutch. Furthermore, it is designed to determine the level of perceived harmfulness of various physical activities and movements. Both the PBS and the Örebro Musculoskeletal Pain Questionnaire are designed to screen patients at risk of developing persistent pain whereas the PAIRS was designed to assess beliefs about chronic pain and functional impairment and were not included for further evaluation in the current review.

Another issue to be raised is the statement that there is an (over)reliance on self-reports [78] for both constructs (fear and pain). Verbal reports as well as the other methods (observation by others and instrument/apparatus) have both their limitations [78]. The best methods of assessment are multimodal and multimethod. However, such an analysis is not always possible in a clinical setting. Once again, we need to be cautious when interpreting the results.

In conclusion, evidence supporting the psychometric properties of questionnaires which are currently available to measure fear of pain in patients with musculoskeletal pain is still incomplete. Future research on the validity of fear of pain measures seems warranted. In order to facilitate the clinical application of these instruments, it is recommended that the focus of research be on an agreement of the conceptual and operational definitions of the various constructs. One way of starting that process is to combine the more established psychometric procedures with a qualitative approach, in order to be able to incorporate the patient’s perspective.

Acknowledgment

The authors would like to express thier sincere gratitude towards Rita Zakarian for helping out with the language revision and the format of the paper.