Background
Methods
Stage 1: Item generation & pre-testing
Stage 2: Field test 1
Stage 3: Field test 2
Stage 2 & 3 statistical analyses
Rasch measurement theory analysis
-
2.1 Do the response categories work as intended?Response category thresholds were examined for disordering as the RMT expects them to be ordered in a sequential manner (i.e., “0 = very uncertain”, “1 = somewhat uncertain”, “2 = somewhat certain”,”3 = very certain”) when plotted on the measurement continuum to reflect the decreasing level of uncertainty the responses denote [32, 41].
-
2.2 Do the PUQ-R scale items define a single variable?RMT expects items within a scale to be cohesive in defining a single measurement continuum [41, 46]. Three “fit” indicators were examined to assess this. Item fit residuals assess whether the item-person interaction is in line with the RMT. Fit residuals reflect the difference between the observed scores and the ones expected by the Rasch model (i.e. observed-expected=residual) and are expected to be distributed between -2.5 to +2.5 [32].Chi-square statistics assess whether the item-trait interaction is in line with the RMT. Chi square is a summary statistic computed by dividing the sample into six groups (class intervals) based on their trait (i.e. level of uncertainty). For items to fit the RMT, it is expected that the chi-square probabilities would not be significant (>0.01) [32, 47, 48].
-
2.3 Do responses to one item bias responses to others?RMT expects that response to an item should not directly influence response to another as this will bias measurement estimates (inflate or deflate reliability). Response dependency is assessed via residual (observed score – expected score= residual) correlations. As the RMT model expects local independence for items, it is also expected that item residuals should be unrelated in order to reflect random error. Residual correlations were used to examine response bias [43, 44] in line with the r>0.30 rule of thumb, but residual correlations below <0.4 were considered as acceptable [49].
-
2.4 Is the performance of the scales stable across relevant groups?The RMT expects the measurement continuum to perform consistently across different sample groups. Item stability was assessed through differential item functioning (DIF) [32, 41, 50]. DIF explores the relationship between item responses and group membership by examining the observed response differences between class intervals within groups [51]. DIF was assessed between the SLE and RA groups using ANOVA.
-
3.1 Is the sample separated by the PUQ-R scales?A scale is expected to detect differences in the levels of trait within a sample and also detect changes in trait levels over time. Within the RMT paradigm the person separation index (PSI) is calculated to assess this [32, 41]. The PSI is computed as the ration of variation of person estimates relative to the estimated error for each person [52]. In other words, the PSI displays how much of the variation in person-location estimates can be associated with random error, where a 0 score indicated all error and a 1 score no error at all [32].
-
3.2 To what extent are raw scores linear?The extent to which ordinal raw scores approach linear (interval) measurement and their subsequent transformations on an interval scale were assessed. This is important as one point on a scale is not necessarily the same across the breadth of the scale [41, 53]. Considering the stringent mathematical criteria of the RMT minor deviations of raw scores from interval/linear measurement is expected.
Traditional test theory analysis
Property | Definition | Criteria |
---|---|---|
Acceptability: Data quality | The extent to which total scores can be computed – data completeness | • Item level missing data <10 %, • scale level missing data <50 % |
Acceptability: Targeting | The extent to which the range of uncertainty measured by the scale matches the range of uncertainty in the study sample | • floor & ceiling effects <15 % • skewness statistic range: -1 to 1 • precision of scores and means to scale possible scores & mid-points |
Scaling assumptions | The extent to which it is legitimate to sum a set of items, without weighting or standardization to produce a single total score. Summing PUQ-R scores is considered legitimate when (i) items are measured at the same point on the scale (ii) contribute similarly to the variation of the total score; (iii) measure a common underlying construct and (iv) contain similar proportion of information with regard to the construct being measured. | • CITCs ≥0.30 • mean IIC ≥0.30 • ITCs ≥0.30 • item mean scores & standard deviations |
Reliability | The extent to which a scale scores are not associated with random error. Scale precision is based on homogeneity of items at a single point in time. | • Cronbach’ s alpha ≥0.7 • homogeneity coefficient • ITCs ≥0.30 |
Validity | The extent to which a scale measures what it intends to measure. The extent to which a scale measures a single construct was assessed through internal consistency. Item convergent and discriminant validity with an item-total scale correlation criterion of >0.30 for the items’ own scale and a magnitude of > 2 standard errors than other scales. | • Cronbach’s alpha ≥0.70 • ITC between item and own scale: 0.30 - 0.70 • ITC between item and other scale: >2 standard errors of ITC with own scale. |
Results
Stage 1: Item development & pre-testing
Stage 2: Field test 1
Field test 1 | Field test 2 | |||||
---|---|---|---|---|---|---|
Total (n = 383) | SLE (n = 173) | RA (n = 210) | Total (N = 279) | SLE (N = 165) | RA (N = 114) | |
Age (years) | ||||||
Mean (SD) | 52.3(16.28) | 43.8 (15.2) | 59.4 (13.3) | 49.93 (14.8) | 45.31 (14.3) | 56.95 (12.5) |
Range | 18–86 | 18–80 | 23–86 | 18–84 | 18–76 | 20–84 |
Disease Duration (years) | ||||||
Mean (SD) | 12.3 (10.8) | 11.1 (9.7) | 13.3 (11.7) | 15.87 (11.2) | 16.04 (10.1) | 15.60 (12.5) |
Range | 0.08–54 | 0.08–39 | 0.25–54 | 0.50–52 | 1–40 | 0.50–52 |
Gender n (%) | ||||||
Female | 320 (83.6) | 157 (90.7) | 163 (77.6) | 245 (87.8) | 158 (95.8) | 87 (76.3) |
Male | 63 (16.4) | 16 (9.3) | 47 (22.4) | 34 (12.2) | 7 (4.2) | 27 (23.7) |
Ethnicity n (%) | ||||||
White | 283 (73.9) | 101 (58.4) | 182 (86.7) | 191 (68.5) | 97 (58.8) | 94 (82.5) |
Black | 45 (11.7) | 33 (19.1) | 12 (5.7) | 43 (15.4) | 40 (24.2) | 3 (2.6) |
Indian/Pakistani/Bangladeshi | 27 (7.0) | 21 (12.1) | 6 (2.9) | 21 (7.6) | 15 (9) | 6 (5.3) |
Mixed race | 11 (2.9) | 7 (4.0) | 4 (1.9) | 6 (2.2) | 5 (3.0) | 1 (0.9) |
Other | 11 (2.9) | 9 (5.2) | 2 (1.0) | 11 (3.9) | 8 (4.8) | 3 (2.6) |
Missing | 6 (1.6) | 2 (1.2) | 4 (1.9) | 7 (2.5) | – | 7 (6.1) |
Stage 3: Field test 2
RMT Analysis: How adequate is the sample to scale targeting?
RMT analysis: to what extent has a measurement scales been constructed successfully?
Item String | Loc. | SE | Fit Res. | Chi Sq. | Prob. | res. r | DIF ANOVA (df = 1) | |||
MS | F | Prob. | ||||||||
PUQ-R Symptoms & Flares Scale (PSI = 0.91) | ||||||||||
1 | straight away | −1.82 | 0.11 | −2.00 | 14.09 | 0.00 | 0.33 | 5.93 | 8.68 | 0.00 |
2 | specific symptoms | −1.64 | 0.10 | −0.98 | 3.39 | 0.34 | <0.30 | 1.31 | 1.57 | 0.21 |
3 | everyday symptoms | −1.22 | 0.10 | −0.40 | 5.48 | 0.14 | 0.33 | 2.51 | 2.90 | 0.09 |
4 | serious symptoms | −1.12 | 0.09 | 0.40 | 0.87 | 0.83 | <0.30 | 0.90 | 0.93 | 0.34 |
5 | getting older | −1.04 | 0.09 | 0.80 | 2.75 | 0.43 | <0.30 | 1.91 | 1.91 | 0.17 |
6 | side-effects | −0.87 | 0.09 | 2.35 | 4.48 | 0.21 | <0.30 | 0.36 | 0.31 | 0.58 |
7 | all different | −0.80 | 0.09 | −0.89 | 10.36 | 0.02 | <0.30 | 0.08 | 0.10 | 0.76 |
8SLE | symptom triggers | −0.39 | 0.10 | 2.09 | 11.03 | 0.01 | 0.37 | 0.00 | 0.00 | 1.00 |
9 | flare type | 0.36 | 0.08 | −0.24 | 2.99 | 0.39 | <0.30 | 2.09 | 2.33 | 0.13 |
10 | symptom timing | 0.46 | 0.09 | 0.84 | 1.79 | 0.62 | 0.37 | 2.27 | 2.24 | 0.14 |
8RA | symptom triggers | 0.80 | 0.14 | 2.05 | 7.49 | 0.06 | <0.30 | 0.00 | 0.00 | 1.00 |
11 | future effect | 1.21 | 0.09 | 1.44 | 3.75 | 0.29 | 0.42 | 0.29 | 0.26 | 0.61 |
12 | flare timing | 1.39 | 0.10 | 0.11 | 7.08 | 0.07 | 0.32 | 8.93 | 10.13 | 0.00 |
13 | flare severity | 2.11 | 0.11 | −0.38 | 1.67 | 0.64 | 0.51 | 2.40 | 2.77 | 0.10 |
14 | flare frequency | 2.57 | 0.12 | −1.12 | 5.15 | 0.16 | 0.51 | 0.67 | 0.88 | 0.35 |
PUQ-R Medication Scale (PSI = 0.91) | ||||||||||
15RA | need medication | −1.54 | 0.17 | −0.96 | 1.81 | 0.61 | 0.32 | 0.00 | 0.00 | 1.00 |
16 | help symptoms | −0.91 | 0.10 | −0.75 | 1.38 | 0.71 | 0.44 | 27.30 | 39.46 | 0.00 |
17 | controls condition | −0.56 | 0.09 | −1.23 | 5.21 | 0.16 | 0.49 | 1.02 | 1.34 | 0.25 |
15SLE | need medication | −0.52 | 0.11 | 2.56 | 31.68 | 0.00 | 0.49 | 0.00 | 0.00 | 1.00 |
18 | stronger dose | −0.24 | 0.09 | −0.68 | 3.32 | 0.34 | 0.48 | 2.01 | 2.44 | 0.12 |
19 | will help symptoms | −0.20 | 0.10 | 0.63 | 1.80 | 0.61 | 0.51 | 0.22 | 0.23 | 0.63 |
20 | need additional | −0.08 | 0.09 | −1.61 | 5.27 | 0.15 | 0.52 | 0.25 | 0.33 | 0.57 |
21 | need alternative | 0.05 | 0.09 | 0.16 | 0.21 | 0.98 | 0.52 | 0.66 | 0.71 | 0.40 |
22 | will control | 0.07 | 0.10 | 0.43 | 2.44 | 0.49 | 0.51 | 0.04 | 0.04 | 0.84 |
23 | will need stronger | 1.22 | 0.09 | −0.30 | 2.18 | 0.54 | 0.75 | 4.38 | 5.13 | 0.02 |
24 | will need additional | 1.32 | 0.10 | 0.66 | 4.21 | 0.24 | 0.75 | 2.98 | 3.24 | 0.07 |
25 | will not alternative | 1.37 | 0.09 | 1.05 | 2.26 | 0.52 | 0.62 | 0.29 | 0.28 | 0.60 |
Item | Loc. | SE | Fit Res. | Chi Sq. | Prob. | res. r | DIF ANOVA (df = 1) | |||
MS | F | Prob. | ||||||||
PUQ-R Trust in Doctor Scale (PSI = 0.73) | ||||||||||
26 | best dose | −1.16 | 0.11 | −2.75 | 16.24 | 0.00 | 0.71 | 0.76 | 1.29 | 0.26 |
27 | which medication | −1.10 | 0.11 | −2.66 | 22.06 | 0.00 | 0.71 | 1.43 | 2.49 | 0.12 |
28 | help physical | −0.88 | 0.10 | −1.15 | 10.30 | 0.02 | <0.30 | 3.69 | 5.09 | 0.02 |
29 | what’s wrong | −0.67 | 0.10 | 0.37 | 4.05 | 0.26 | <0.30 | 1.95 | 2.16 | 0.14 |
30 | physically active | −0.19 | 0.10 | −1.59 | 8.02 | 0.05 | <0.30 | 0.13 | 0.18 | 0.67 |
31 | help non-physical | 0.67 | 0.09 | 1.28 | 1.41 | 0.70 | <0.30 | 0.21 | 0.21 | 0.65 |
32 | future progress | 1.61 | 0.09 | 1.63 | 1.52 | 0.68 | <0.30 | 4.56 | 4.45 | 0.04 |
33 | cause | 1.72 | 0.08 | 4.23a
| 36.72 | 0.00b
| <0.30 | 0.24 | 0.17 | 0.68 |
PUQ-R Self-management Scale (PSI = 0.86) | ||||||||||
34 | questions | −0.52 | 0.10 | 0.93 | 10.46 | 0.02 | <0.30 | 0.55 | 0.59 | 0.44 |
35 | symptom report | −0.49 | 0.11 | −0.56 | 3.69 | 0.30 | <0.30 | 4.20 | 5.60 | 0.02 |
36 | test results | 0.11 | 0.09 | 0.91 | 3.65 | 0.30 | <0.30 | 6.49 | 7.34 | 0.01 |
37 | activities to avoid | 0.15 | 0.09 | 0.10 | 1.84 | 0.61 | <0.30 | 0.06 | 0.07 | 0.79 |
38 | how to manage | 0.32 | 0.10 | −1.11 | 9.91 | 0.02 | <0.30 | 3.19 | 4.61 | 0.03 |
39 | help control | 0.43 | 0.09 | 0.93 | 7.34 | 0.06 | <0.30 | 2.50 | 2.81 | 0.10 |
PUQ-R Impact Scale (PSI = 0.87) | ||||||||||
40 | education | −1.24 | 0.16 | 0.56 | 1.20 | 0.75 | <0.30 | 0.48 | 0.47 | 0.49 |
41 | relationship | −0.91 | 0.10 | 2.11 | 15.09 | 0.00b
| 0.32 | 9.20 | 8.89 | 0.00 |
42 | children | −0.53 | 0.13 | 1.66 | 5.36 | 0.15 | 0.32 | 4.41 | 4.21 | 0.04 |
43 | plan life | 0.01 | 0.10 | −1.92 | 13.60 | 0.00 | <0.30 | 1.57 | 2.39 | 0.12 |
44 | finances | 0.02 | 0.09 | 1.10 | 1.73 | 0.63 | <0.30 | 0.00 | 0.00 | 0.97 |
45 | functionality | 0.12 | 0.10 | −3.60a
| 14.73 | 0.00b
| <0.30 | 8.29 | 16.21 | 0.00c
|
46 | exercise | 0.46 | 0.10 | 0.48 | 1.46 | 0.69 | <0.30 | 0.55 | 0.61 | 0.44 |
47 | mobility | 0.54 | 0.09 | −1.36 | 7.06 | 0.07 | <0.30 | 5.21 | 7.45 | 0.01 |
48 | job prospects | 0.54 | 0.11 | −0.58 | 5.71 | 0.13 | <0.30 | 0.71 | 0.94 | 0.33 |
49 | pregnancy | 0.99 | 0.16 | 3.31a
| 27.10 | 0.00b
| <0.30 | 0.63 | 0.32 | 0.58 |
RMT analysis: How has the sample been measured?
Traditional psychometrics
Data quality | Scaling assumptions | Targeting | Reliability | Item convergent – discriminant validity: ITC range | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Missing data % | Possible range (mid- point) | Actual score range | Mean (SD) | CITC range | Floor effect % | Ceiling effect % | Skewn. | Cronbach’s alpha | IIC mean | 1 | 2 | 3 | 4 | 5 | |
Symptoms & flares | 7.52 | 14–56 (35) | 14–56 | 35.29 (7.99) | 0.44–0.69 | 0.72 | 0.72 | −0.35 | 0.90 | 0.40 | 0.54–0.73 | 0.05–0.20 | 0.12–0.34 | 0.16–0.45 | 0.02–0.25 |
Medication | 4.30 | 11–40 (27.50) | 11–44 | 30.96 (6.77) | −0.35–0.71 | 0.72 | 3.94 | −0.15 | 0.90 | 0.60 | 0.02–0.18 | 0.46–0.77 | 0.23–0.40 | 0.25–0.33 | 0.04–0.43 |
Trust in doctor | 2.15 | 8–32 (20) | 8–32 | 22.37 (4.90) | 0.40–0.71 | 0.72 | 1.43 | −0.24 | 0.86 | 0.61 | 0.32–0.14 | 0.23–0.49 | 0.56–0.78 | 0.39–0.20 | 0.24–0.49 |
Self-management | 3.58 | 6–24 (15) | 6–24 | 18.80 (3.68) | 0.53–0.67 | 0.72 | 8.60 | −0.61 | 0.82 | 0.60 | 0.18–0.48 | 0.19–0.44 | 0.26–0.40 | 0.67–0.78 | 0.04–0.21 |
Impact | 2.51 | 10–40 (25) | 10–40 | 24.95 (8.15) | 0.39–0.79 | 1.43 | 0.36 | −0.13 | 0.93 | 0.73 | 0.00–0.24 | 0.11–0.42 | 0.20–0.43 | 0.04–0.29 | 0.49–0.84 |
PUQ-R | CQR | HADS-A | HADS-D | PCS | MCS |
---|---|---|---|---|---|
SLE | |||||
Symptoms & flares | – | – | – | – | – |
Medication | 0.22* | −0.18* | −0.29** | 0.28** | 0.22** |
Trust in doctor | 0.33** | −0.23** | −0.36** | 0.21** | 0.20** |
Self-management | – | −0.28** | −0.21** | – | 0.16* |
Impact | 0.18* | −0.38** | −0.52** | 0.47** | 0.31** |
RA | |||||
Symptoms & flares | 0.26* | – | – | – | – |
Medication | 0.31** | – | – | 0.19** | 0.19* |
Trust in doctor | 0.39** | – | – | – | – |
Self-management | – | −0.20* | −0.24* | 0.20** | 0.20* |
Impact | – | −0.43** | −0.57** | 0.35** | 0.34** |