Introduction
Pain-related catastrophizing, defined as an exaggerated negative orientation towards pain [
1], has become a widely recognized construct for explaining significant variance in experimental [
2,
3] and clinical [
4,
5] pain conditions. Pain-related catastrophizing has also demonstrated consistent prognostic association with outcomes following motor vehicle collision [
6], surgery [
7,
8] and conservative rehabilitation [
9]. The Pain Catastrophizing Scale (PCS [
3]) is a widely used tool for quantifying pain-related catastrophizing. It consists of 13 items related to the experience of pain including ‘I feel I can’t stand it anymore’ and ‘I can’t seem to keep it out of my mind’. Prior authors have used exploratory [
10], confirmatory [
11,
12] and Rasch-based [
13] approaches to identify 3 sub-factors within the PCS: rumination, helplessness/hopelessness, and magnification. Despite this, the subscale scores are rarely reported separately in clinical research or practice, raising the possibility that the subscales are not always necessary.
If a single summative scale score is preferred, then 13 items may be unnecessary for many routine clinical encounters. Prior authors have reported high internal consistency of the original PCS (α ≥ 0.95) [
12,
14] that suggests a degree of item redundancy and the potential for an abbreviated scale [
15]. Bot et al. [
1], McWilliams et al. [
16] and Darnall et al. [
17] have all previously published shortened (4-item, 6-item, and 3-item, respectively) versions of the PCS. Both Bot et al. [
18] and McWilliam et al. [
16] have used Classical Test Theory methods in samples of upper extremity or mixed chronic pain to derive their shortened versions and intentionally included items drawn from each of the 3 PCS sub-domains. McWilliams et al. [
16] found that the 4-item version endorsed by Bot et al. did not satisfy all a priori criteria for adequate measurement properties, favoring the 6-item version instead. Neither has yet been tested for invariance across clinically relevant subgroups like sex, age, or symptom duration. Darnall et al. [
17] employed more qualitative cognitive interviews to adapt and refine the scale contents and instructions for use as a brief tool for daily administration. Employing three approaches, they derived a new 3-item ‘daily PCS’ that included one item from each of the 3 original subscales.
To our knowledge the prior efforts at abbreviation have yet to see widespread adoption. We believe that, in the absence of clear rationale for retaining the 3-factor structure in most clinical encounters, a better priority would be to create an abbreviated version that can be easily used to predict full-scale scores rather than adhere to a 3-factor structure. Newer measurement approaches including Rasch modeling for ordinal scales [
8] offers the potential to conduct deeper exploration of individual item function, and large databases allow scale evaluation from different statistical and theoretical perspectives in the same population. Qualitative or theoretical review is rarely prioritized over statistical methods for scale interpretation, though we suggest that a triangulation of approaches between classical and newer statistical methods and theoretical review will lead to even better scale development. To that end the purpose of this study was to triangulate findings across qualitative, Classical Test Theory, and Rasch-based analyses to arrive at a psychometrically sound, unidimensional, abbreviated version of the PCS that could be confidently applied across conditions and clinical subgroups, reducing burden while providing scores that are comparable to those of the original version.
Methods
The database for this analysis consisted of 5646 PCS scores obtained through the Quebec Pain Registry (QPR) for chronic pain problems. The QPR is a province-wide administrative and research database that provides standardized data on a large cohort of patients with chronic pain referred to tertiary care pain clinics. Participant phenotypes are described using a set of common demographic and clinical measures based on uniform and validated tools [
19]. It has undergone considerable data fidelity checking with each dataset checked by two independent research nurses. Data were provided for participants between October 2008 and December 2014. Inclusion criteria included adult (18 years old and above) community-dwelling males and females with non-cancer pain that could read and understand conversational French or English. Participants completed the full PCS as originally described by Sullivan [
3], the Brief Pain Inventory Interference subscale [
20] that provided two scores: Physical Interference and Affective Interference [
21], and the Beck Depression Inventory – II, a well-supported measure of depressive symptoms that has been used extensively in pain studies [
22]. The database also included longitudinal follow-up data, but only the baseline data were used in the current study. Raw score responses to each individual item on the PCS and all other scales were extracted from the QPR data to form the study database for the current analysis. Ethical approval was obtained from the relevant institutions prior to data collection. No additional ethical approval was sought for this secondary analysis of de-identified data.
Discussion
The Pain Catastrophizing Scale has become a popular self-report tool in pain-related research. It is often used as a screening tool to discriminate between people high and low in catastrophic beliefs rather than as an evaluative measure to track change over time. While scales with more items are generally thought to be more responsive to change by virtue of a larger range of scores, discriminative/screening tools can function adequately well with fewer items. While there is no ‘right’ number of items required for a screening tool, scales of 3 to 5 items have been shown to have adequate discriminative properties for several clinical conditions [
27,
28]. If the intention of the PCS is to discriminate between high/low catastrophizers for patient phenotyping, then the smallest number of items that retain adequate measurement properties should reduce barriers to implementation in practice. Further, while the original PCS was described as a 3-factor scale, it is rare that the 3 separate subscale scores are reported in pain research, and even rarer still that those scores can or have been used to inform different treatment directions. With these pragmatic considerations in mind, our team set out to intentionally reduce the number of PCS items as far as possible while retaining sound measurement properties for screening purposes that also retained strong association with the full original scale.
The approach was rigorous, harnessing knowledge and methods from qualitative and quantitative fields of psychometrics and scale development. No single piece of evidence was used in isolation to decide on item retention/removal. As a deviation from traditional approaches that tend to heavily prioritize statistical methods, we have demonstrated an approach to triangulating findings across techniques to inform revision decisions. The size of the QPR database provided the opportunity to test and retest the scales as needed. Through conscientious, informed decisions a new brief version of the scale was created that satisfied theory, classical, and newer statistical methods in a way that prior attempts to shorten the scale have not done.
The results of the CFA revealed some ambiguity in model fit; despite unidimensionality in Rasch analysis, RMSEA was higher than desirable in the 5-item model. The goals of this study continued to guide decision making, and so one item was removed that, upon retesting in Rasch and CFA, fit both the newer and classical approaches and satisfied the conceptual review. Through this iterative and stepwise approach of testing across multiple perspectives, revising and retesting where necessary using several independent samples, we are confident that the BriefPCS version defined here is adequate for routine use. The strong correlations between the brief version and the full-scale analog (r ≥ 0.94) suggest that the brief version is also an adequate proxy of the full version. With only 4 items, ability to detect change over time at the individual level has likely suffered (as evidenced partly by a PSI < 0.80 in the sample), but as a screening/discriminative tool the 4-item version still offers 17 levels of discrimination (0–16). While not formally tested in this study, it should be expected that the score thresholds often referred to in the existing PCS literature (20/52 moderate, 30/52 high catastrophizing) can safely be applied to the new scale (6/16 moderate, 9/16 high) though this is a reasonable area for additional study.
The brief scale offers additional benefits. Perhaps the most notable is the removal of item 2 ‘
I feel I can’t go on’ that was deemed through conceptual analysis to be potentially tapping depression or even suicidal ideation. While these are important constructs that should be explored especially in especially in people with chronic pain, anecdotal experience with clinicians using this scale reveal that few are aware of this potential overlap, and fewer still act upon it when endorsed. While this may be a trivial concern, the team unanimously decided to remove it from the brief version due partly to statistical considerations but also in the interest of protecting clinicians who may be ill-equipped to address emotional crises, and to highlight that depression/suicidality in chronic pain warrants its own dedicated investigation. The removal of items 8 and 12 also fit with results from our prior Rasch analysis of the PCS in an independent sample [
13]. That analysis found disordered response thresholds requiring rescoring of those two items, and evidence of considerable location dependence between other items through the scale. Collectively those prior findings lent additional justification to the effort to create an abbreviated scale that overcomes some challenges to scoring and interpretation in the original version.
Prior attempts at abbreviating the PCS have prioritized the retention of a 3-factor structure. Bot [
18] and colleagues derived a 4-item version of the tool that retained the 3 factors but suffered lower correlation with the full 13-item version than the version derived here, with Pearson r values ranging from
r = 0.60 to 0.82. McWilliams and colleagues [
16] derived a 6-item version that also prioritized the inclusion of items from all 3 subscales. That version retained items 4, 5, 6, 10, 11 and 13, three of which (4, 10 and 11) overlapped with our version, and correlated with the original version at a similar magnitude (
r = 0.95). Further in our larger sample, correlations between the PCS-4 and depression (
r = 0.46) and functional interference (
r = 0.43) were similar in absolute magnitude to those of McWilliams’ 6-item version (
r = 0.47 depression, and 0.38 functional interference). We believe the rigour with which our 4-item version was derived and its brevity represents an advantage over the 6-item version of McWilliams, though the potential ability to tap the 3 different subscales of the original PCS in the McWilliams version may be an attractive aspect for those who find value in doing so. Darnall and colleagues [
17] 3-item version also prioritized inclusion of items from all 3 subscales. That version retained items 4, 6, and 10, two of which (4 and 10) were again consistent with the version described herein. Those authors did not analyze correlation with the original version making comparison difficult. That group also made other changes to the scale instructions and scoring to make it appropriate as a tool for daily ‘state’ administration rather than as a clinical phenotyping tool of catastrophizing as a ‘trait’. The availability of these different versions has value, and users are encouraged to consider the intended use of a scale before choosing the abbreviated version that will best fit their needs.
Despite considerable rigor there are limitations that should be observed when interpreting these results. The primary one is that this was a secondary review of an existing database that meant our research team had no control over the methods through which the primary data were collected. While this is less concerning with patient self-report data than it would have been if using clinician-administered tools, it does mean that logistics such as instructions to patients, time allowed to complete, and even environment within which the tools were completed very likely differed across sampling context. The extent to which this may have influenced results is unknown, and this is one reason for analysis on multiple random samples. Another potential influence on our results is that fit indices for the CFA may have been artificially biased as no covariates were built into the model. By that point, covariates had already been explored through the DIF analyses in Rasch. However, it is possible that the CFA fit indicators could have been improved even further by inclusion of, for example, sex or age in the CFA analysis. With CFI/TLI estimates already at or near 1.00, it is doubtful that adding complexity to an already complex analysis would have led to any meaningful change in the results or interpretation, and we were already concerned about creating an unstable model through overfitting. Finally, as with any scale revision, confidence in the results will be strengthened when the new version is tested again in another independent sample and results are replicated.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.