Background
The disease-specific questionnaire, Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC), is the most widely used instrument to evaluate symptomatology and function in patients with hip or knee osteoarthritis (OA) [
1‐
5]. The measure was developed to evaluate clinically important, patient-relevant changes in health status resulting from treatment interventions [
6]. The WOMAC, which is self-administered and covers three dimensions: pain (5 items), stiffness (2 items), and physical function (17 items), is reliable, valid, and sensitive to changes in the health status of patients with hip or knee OA [
1,
7‐
10].
A major uses of health measurement scales is to detect health status changes over time, and a priority may be efficiency, i.e., responses achieved using the shortest possible questionnaire [
11,
12]. A shorter version would further enhance its applicability in epidemiologic studies, clinical trials, and daily clinical practice [
13], since short questionnaires result in improved patient compliance and response rates and are thought to improve the quality of the response [
14,
15]. Traditionally, one of the major disadvantages of self-administered questionnaires has been the low response rate, which greatly affects the study validity [
15,
16], but it has been shown that shorter version of the questionnaires would significantly increase the response rate [
15]. In addition, several studies have reported that the WOMAC function scale is redundant and suggested that the scale should be shortened by omitting the repetitious items [
17,
18]. Therefore, it would be very useful to have a shortened WOMAC version in Spanish, which retains the same good psychometric properties of the original version.
The WOMAC questionnaire has been shortened recently [
11,
19‐
21]. Some have been shortened using statistical approaches [
19,
20], and others by considering the perspective of patients and rheumatologists [
11,
21]. The stiffness domain of the WOMAC is largely redundant and is often excluded from the questionnaire [
21]. Therefore, some authors have centred their studies on shortening the function domain [
11,
21], while others have shortened the pain and function domains [
19,
20], but these shortened domains have not been validated as a whole shortened WOMAC version, checking the existence of two underlying domains. Since the shortened scale is essentially a component of the fully shortened version, the subjacent structure of the reduced version should be analyzed.
The goal of the current study was to propose a shortened Spanish WOMAC version based on previously shortened versions and to evaluate the validity, reliability, and responsiveness of this shortened questionnaire for patients with hip OA, combining classical and modern statistical techniques, such as Rasch analysis.
Discussion
The results of the current prospective study with two independent and large cohorts of patients who underwent THR at different hospitals and who were followed to 6 months support the validity, reliability, and responsiveness of the new 11-item version of the WOMAC. To the best of our knowledge, this is the first study to validate a shortened WOMAC version as a whole tool, including both pain and function dimensions, and most importantly, the first valid, reliable, and responsive WOMAC short version proposed in Spanish.
The WOMAC questionnaire is widely used both in research studies in orthopedic or rheumatologic processes as in clinical practice [
1‐
5,
7]. One of the major disadvantages of self-administered questionnaires has been the burden of its completion [
41]. In some epidemiological and clinical studies, patients usually have to complete several questionnaires implying a great burden. In clinical practice, where information is collected to evaluate response to treatment, the goal is to involve as little effort as possible for both the patient and the physician. Therefore, if using a shortened version the same information is collected but with little burden, the instrument would be useful. In addition, another disadvantage of self-administered questionnaires has been the low response rate, which greatly affects the study validity [
15,
16]. Patients missing items has important implications for data collection, completion, and analysis. However, it has been shown that shorter version of the questionnaires would significantly increase the response rate [
15], and the compliance increased when the respondent was asked to complete an appreciably smaller set of questions [
42]. Therefore, a shorter version would further enhance its applicability in epidemiologic studies, and daily clinical practice [
13]. On the other hand, a consequence of the reduction of items is a loss in content validity, the comprehensiveness with which each domain is sampled, and investigators must be cognizant of this issue when they reduce the number of items [
12]. Because of a greater length of the questionnaire, it provides a detailed insight of different dimensions. However, this might also be a disadvantage, because of reduced patient compliance and incomplete response [
14]. Therefore, it would be very useful to have a shortened WOMAC version in Spanish, which retains the same good psychometric properties of its original version.
The aim of the current study was to propose a new short WOMAC form and validate it in Spanish. Fairclough [
43] commented that it is preferable to select a previously validated instrument than to create a new one. Considering this, and according to the different short versions of the WOMAC pain domain proposed by other investigators [
19,
20], we selected the shortened pain scale proposed by Davis et al [
19]. They shortened the WOMAC pain domain using Rasch analysis in a community sample of 773 patients with a hip or knee complaint. The authors concluded that the pain short scale fits the Rasch model and has interval-level scaling properties, and the stability of the model also was supported by a sample of 1,151 surgical patients. Rothenfluh et al. [
20] proposed a different three-item pain short version that had two items in common with the version proposed by Davis et al. [
19], but the authors based it on a very small sample of patients with hip OA (n = 57). Taking into account our objectives, the methodology used by Davis et al. [
19] for the reduction study, the larger sample size, and that both shortened pain domains had the same number of items, we decided that the pain short form proposed by Davis et al. [
19] was more adequate.
Regarding the WOMAC function short forms, other versions have been proposed by different authors [
11,
19‐
21]. Davis et al. [
19], who based their new version on the Rasch model, also proposed a shortened version of the function scale. Nevertheless, they only excluded three items from the original version, and we did not consider short enough. Rothenfluh et al. [
20] also proposed a nine-item short version of the function scale based on the Rasch model but used a very small sample of patients with hip OA (n = 57). Given that our target population is composed of patients with hip OA, we did not consider large enough the sample they used. Whitehouse et al. [
21] reduced the 17-item function scale to seven items by a clinically driven process based on the opinions of 36 orthopaedic and rheumatology personnel. The authors studied the validity, reliability, and responsiveness of the short scale in patients with hip or knee OA [
21], and the criterion validity and repeatability of this reduced function scale also was assessed in a sample of 100 patients, but only 30 had THR [
42]. This short function scale also was validated in an independent cohort, but using a sample of patients with knee OA [
14]. Finally, Tubach et al. [
11], reduced the function scale from 17 items to eight, based on the opinions of 1,362 patients with hip or knee OA and 399 rheumatologists. This short function scale was validated in an independent sample of patients with hip or knee OA, and it was found to be responsive, reproducible, and valid [
25]. Although Whitehouse et al. [
21] and Tubach et al. [
11] used similar methods for shortening the scales, the latter considered more expert opinions, added patient opinions, and the scale was validated by also considering patients with hip OA. Therefore, we selected the function short scale proposed by Tubach et al. [
11].
The validation studies of the various shortened WOMAC versions [
11,
14,
19‐
21,
25,
42] have consisted of studying the measurement properties of the corresponding shortened WOMAC pain or function scales individually. In our study, we validated our new 11-item WOMAC-SF as an entire tool, including both pain and function dimensions, and studying the construct validity of the short version to test the hypothesis that the 11 items in the questionnaire comprised two separate factors. Validation of the 11-item WOMAC-SF using CFA provides the questionnaire with greater construct validity. The CFA results confirmed the hypothesized internal structure of the two latent factors, given that all fit indices were satisfactory and all factor weights exceeded the recommended thresholds [
26‐
29]. We also confirmed the internal structure of the 11-item WOMAC-SF by CFA performed in an independent cohort. A possible limitation could be the violation of the normal distribution of items when using the CFA. However, it has been argued that the maximum likelihood estimation procedure appear to be fairly robust against moderate violation of this assumption [
29]. In addition, some studies, based upon experience or computer simulations, have claimed that scales with as few as five points yield stable factors [
37]. Therefore, taking into account that we use a 5-points Likert scale, a maximum likelihood estimator procedure, and that we have a large sample size, with practically equal results in both cohorts, we think that our CFA results are reliable and stable.
The Rasch method applied to the three-item pain short domain and the eight-item function short domain provided adjustment levels (infit and outfit) and unidimensionality sufficient to be considered adequate, providing major evidence of construct validity. Although two of the items, the item "pain on sitting or lying" relative to pain scale and the item "putting on socks" relative to function scale, presented infit or outfit statistics slightly above the recommended threshold of 1.3, taking into account the satisfactory results obtained from the rest of analysis, such as PCA of the residuals, the functioning of the rating scale categories, the absence of DIF by gender in both items, and the item and person separation indexes, we do not consider that the slight difference in these infit or outfit indexes with respect to the recommended limit 1.3 is large enough to conclude that these two items are misfitting items. Regarding the three-item pain short form, the results were similar to those reported by Davis et al [
19]. Considering that the criteria were satisfactory, we concluded that the shortened WOMAC pain scale fit the Rasch model. Regarding the eight-item function short form, we obtained a scale that shows the fundamental properties of model fit and unidimensionality.
Analysis of the internal consistency allowed us to confirm the hypothesis that the items that comprised the pain short scale or those that comprised the function short scale measured the same concept as Cronbach's alpha coefficient exceeded the threshold of 0.70 [
36]. For the function short scale, the results were similar to or slightly higher than those reported by the original authors of the short form [
11,
25]. Further, the reliability of the 11-item WOMAC-SF, although it was as high as that for the original Spanish WOMAC questionnaire (0.82 for pain domain and 0.93 for function domain) because of the reduction of the number of items, it was slightly lower, indicating that it maintained excellent internal consistency [
8].
The convergent and discriminant validity of the WOMAC-SF was assessed by examining the relationship between the pain and function short scales and the factors of the SF-36. Validity was demonstrated by correlation coefficients lower than the internal consistency of the short forms and by confirming the hypothesis that the highest correlation coefficients were found between the WOMAC pain short form and the SF-36 bodily pain domain and between the WOMAC function short form and the physical function domain of the SF-36. Baron et al. [
25] also reported satisfactory convergent validity of the eight-item function WOMAC short form, but they used measures other than the SF-36. Whitehouse et al. [
21] studied the convergent validity of their proposed seven-item function short form using the SF-36 physical function domain, and although the results were similar to those we obtained, in our case the correlation coefficient was slightly higher. Further, we obtained similar results to those of the original WOMAC questionnaire [
8], since they also found the highest correlation coefficient between the WOMAC pain and function long scales and the SF-36 bodily pain and physical functioning domains (-0.55 and -0.59, respectively). Otherwise, the WOMAC-SF maintained excellent known-groups validity similar to that of the original WOMAC questionnaire [
8], since they also observed that the more severity level, the higher their WOMAC pain and function long scores were.
The 11-item WOMAC-SF showed good responsiveness 6 months after the intervention. Responsiveness parameters were substantially above the 0.80 threshold for designating large change [
39]. Tubach et al. [
11] and Baron et al. [
25] also reported this finding for the function short form, although we found much higher responsiveness parameters, probably due to the follow-up period. We considered a follow-up of 6 months, whereas they considered 4 weeks. Whitehouse et al. [
21], who purposed a seven-item function WOMAC-SF, studied the responsiveness considering follow-up periods of 3 months and 1 year, and Auw Yang et al. [
14], who validated the previous seven-item function WOMAC-SF in a different cohort, also studied the responsiveness considering follow-up periods of 3 and 6 months. Nevertheless, the responsiveness parameters of the seven-item function WOMAC-SF that they reported [
14,
21] were much lower than our responsiveness parameters of the eight-item function short form that we proposed, indicating that the eight-item function short form is more responsive than the seven-item function short form proposed by Whitehouse et al. [
21]. Further, the responsiveness results of the 11-item WOMAC-SF we obtained were similar to those of the original WOMAC questionnaire [
9], given that they also found minor floor and ceiling effects (< 2%) before the intervention, and the SES and SRM responsiveness parameters were practically equal (2.10 and 1.86 respectively for pain domain, and 2.34 and 1.80 respectively for function domain).
The strong correlation between the long and short WOMAC pain or function scales and the high agreement in scores examined by the Bland-Altman approach [
40] support the hypothesis that the shortened scale captures pain and functional status as well as the original WOMAC version. Our results are similar to those found by Tubach et al. [
11] and Baron et al. [
25].
A possible limitation of the current study was the use of the data provided by the original WOMAC long form to validate the 11-item WOMAC-SF [
25]. This might constitute a framing bias and lead to overestimation of the similarity between the two forms [
21,
25]. Although this problem is inherent in many validation studies [
11,
25], in the current study, whenever possible, we analyzed separate samples to compensate for this problem as much as possible. Nevertheless, the 11-item WOMAC-SF must be validated in a new independent sample of patients with hip OA and in different languages. Besides, the original WOMAC has been used in patients with hip or knee OA, consequently this 11-items short form could probably be applicable in both patients with hip or knee OA. However, we have based our study only on patients undergoing total hip replacement, and therefore, further validation studies in patients with different arthroplasties would be necessary to be completely sure about the applicability of this short WOMAC form.
In addition, an instrument must be reliable, valid, and responsive to be useful. Although we studied the reliability of the 11-item WOMAC-SF by means of the Cronbach alpha coefficient to measure the internal consistency, the reliability study should be complemented with a test-retest study. Regarding responsiveness, missing data are a key limitation of the prospective cohort design and a usual finding when conducting follow-up studies [
11,
21,
25]. In our case, there was a very good response rate before the intervention (about 80%) in both cohorts, and 6 months after it (about 75%). These losses occurred despite our mailing up to two reminders and contacting nonresponders by telephone. However, no differences were observed in relevant variables when responders were compared with nonresponders. Therefore, although a bias may have been present in our responsiveness study due to missing data, it is likely to be minor and we believe the results are generalizable to the entire sample.
Authors' contributions
AB has participated in the conception and design of the study, in the analysis and interpretation of data, and has been involved in drafting the manuscript; JMQ and AE have participated in the conception, design and coordination of the study, have helped to draft the manuscript and have been involved in revising it critically for important content; CLH and MO have made substantial contribution to acquisition of data, and have helped to draft the manuscript. All authors have read and approved the final manuscript.