Background
Methods
Field test one design
Sample
Rasch analysis
Psychometric property | Traditional methods - test and criteria | Rasch methods - test and criteria |
---|---|---|
Acceptability and data quality - Completeness of item- and scale-level data. | ● Score distributions (floor/ceiling effects and skew of scale scores) | ● Even distribution of endorsement frequencies across response categories (>80%) |
● % of item-level missing data (<10%) [30] | ● Low number of persons at extreme (i.e. floor/ceiling) ends of the measurement continuum | |
● % of computable scale scores (>50% completed items) [31] | ||
● Items in scales rated ‘not relevant’ <35% | ||
Scaling assumptions - Legitimacy of summing a set of items (items should measure a common underlying construct). | ● Positive residual r between items (<0.30) | |
● Items have adequate corrected ITC (ITC ≥0.3) [34] | ● High negative residual r (>0.60) suggests redundancy | |
● Items have similar ITCs [34] | ● Items sharing common variance suggests uni-dimensionality | |
● Items do not measure at the same point on the scale | ● Evenly spaced items spanning whole measurement range | |
Item response categories - categories in a logical hierarchy. | ● NA | ● Ordered set of response thresholds for each scale item |
Targeting - extent to which the range of the variable measured by the scale matches the range of that variable in the study sample. | ● Scale scores spanning entire scale range | ● Person-item threshold distribution: person locations should be covered by items and item locations covered by persons when both calibrated on the same metric scale [35] |
● Floor and ceiling (proportion sample at minimum and maximum scale score) effects should be low (<15%) [36] | ||
● Skewness statistics should range from −1 to +1 [37] | ● Good targeting demonstrated by the mean location of items and persons around zero | |
● No published criteria for item level targeting | ||
Reliability
| ||
Internal consistency - extent to which items comprising a scale measure the same construct (e.g. homogeneity of the scale). | ● Cronbach's alphas for summary scores (adequate scale internal consistency is ≥0.70 [22] | ● High person separation index >0.7 [38]; quantifies how reliably person measurements are separated by items |
● Item-total r between +0.4 and +0.6 indicate items are moderately correlated with scale scores; higher values indicate well correlated items with scale scores [22] | ||
● Power-of-tests indicate the power in detecting the extent to which the data do not fit the model [24] | ||
● Items with ordered thresholds | ||
*Test-retest reliability - stability of a measuring instrument. | ● Intra-class r coefficient >0.70 between test and retest scores [11] | ● Statistical stability across time points (no uniform or non-uniform item DIF (p=>0.05 or Bonferroni adjusted value)) |
● Pearson r: >0.7 indicates reliable scale stability | ||
Validity
| ● Involves accumulating evidence from different forms | |
Content validity - extent to which the content (items) of a scale is representative of the conceptual construct it is intended to measure. | ● Consideration of item sufficiency and the target population | ● Clearly defined construct |
● Qualitative evidence from individuals for whom the measure is targeted, expert opinion and literature review (e.g. theoretical and/or conceptual definitions) [9]. | ● Validity comes from careful item construction and consideration of what each item is meant to measure, then testing against model expectations | |
Construct validity | ||
i) Within-scale analyses - extent to which a distinct construct is being measured and that items can be combined to form a scale score. | ● Cronbach alpha for scale scores >0.70 | ● Fit residuals (item-person interaction) within given range +/−2.5 |
● ITC >0.30 | ||
● Homogeneity coefficient (IIC mean and range >0.3) | ● Non-significant chi square (item-trait interaction) values | |
● Scaling success | ● No under- or over-discriminating ICC | |
● Mean fit residual close to 0.0; SD approaching 1.0 [39] | ||
● Person fit residuals within given range +/−2.5 | ||
Measurement continuum - extent to which scale items mark out the construct as a continuum on which people can be measured. | ● NA | ● Individual scale items located across a continuum in the same way locations of people are spread across the continuum [26] |
Response dependency –response to one item determines response to another. | ● NA | |
ii) Between scale analysis | ||
Criterion Validity - hypotheses based on criterion or ‘gold standard’ measure. | ● NA | |
*Convergent validity - scale correlated with other measures of the same/ similar constructs. | ● Moderate to high r predicted for similar scales; criteria used as guides to the magnitude of r, as opposed to pass/fail benchmarks (high r >0.7; moderate r=0.3-0.7; low r <0.3) [43] | ● NA |
*Discriminant validity – scale not correlated with measures of different constructs | ● Low r (<0.3) predicted between scale scores and measures of different constructs (e.g. age, gender) | ● NA |
*Known groups differences - ability of a scale to differentiate known groups | ● ^Generate hypotheses (based on subgroups known to differ on construct measured) and compare mean scores (e.g. predict a stepwise change in PU-QOL scale scores across 3 PU severity groups and that mean scores would be significantly different) | ● Hypothesis testing (e.g. clinical questions are formulated and the empirical testing comes from whether or not data fit the Rasch model) |
● Statistically significant differences in mean scores (ANOVA) | ||
*Differential item functioning (item bias) - The extent of any conditional relationships between item response and group membership. | ● NA | ● Persons with similar ability should respond in similar ways to individual items regardless of group membership (e.g. age) [44] |
● Uniform DIF - uniformity amongst differences between groups | ||
● Non-Uniform DIF - non-uniformity amongst differences between groups; can be considered at 1% (Bonferroni adjusted) and 5% CIs |
Traditional analysis
Field test two design
Sample
Rasch analysis
Traditional analysis
Results
Field-test one: scale construction and preliminary psychometric evaluation
Sample
Field test 1 | Field test 2 | |
---|---|---|
Characteristics
|
Range (Mean, SD)
|
Range (Mean, SD)
|
Age
| 24 - 98 (72, 13.5) | 20 - 103 (71.3, 16.5) |
Gender
|
Total n (%)
|
Total n (%)
|
Total | n=227 | n=229 |
Male | 90 (39.6) | 119 (52.0) |
Female | 137 (60.4) | 110 (48.0) |
Ethnicity
| ||
White | 223 (98.2) | 227 (99.1) |
Asian | 1 (0.4) | 2 (0.9) |
Black/African | 2 (0.4) | 0 |
Chinese | 0 | 0 |
Not stated | 1 (0.4) | 0 |
Setting
| ||
Hospital (surgery) | 99 (43.6) | 62 (27.1) |
Hospital (medicine) | 21 (9.3) | 74 (32.3) |
Community | 107 (47.1) | 88 (38.4) |
PU severity
| ||
Category 1 | 38 (10.6%) | 76 (18.1%) |
Category 2 | 144 (40.2%) | 170 (40.5%) |
Category 3/4 | 153 (42.7%) | 170 (40.5%) |
Missing | 1 (0.3%) | 4 (0.9%) |
PU risk classification
| ||
Short-term risk | 39 (17.2) | 36 (15.7) |
Medium to long-term risk | 71 (31.3) | 87 (38.0) |
On-going long-term risk | 116 (51.1) | 103 (45.0) |
Missing | 1 (0.4) | 3 (1.3) |
Marital status
| ||
Single | 59 (26.0) | 71 (31.0) |
Married | 85 (37.5) | 77 (33.6) |
Cohabiting | 81 (35.7) | 75 (32.8) |
Missing | 2 (0.8) | 6 (2.6) |
Living arrangements
| ||
Live alone | 84 (37.0) | 86 (37.6) |
Cohabit with carer | 63 (27.8) | 51 (22.3) |
Cohabit with other | 61 (26.9) | 48 (20.9) |
Missing | 19 (8.4) | 44 (19.2) |
Education
| ||
No formal education | 129 (56.8) | 125 (54.6) |
GCSE or equivalent | 39 (17.2) | 40 (17.5) |
A-Level or equivalent | 25 (11.0) | 16 (6.9) |
Degree or higher | 15 (6.6) | 21 (9.2) |
Missing | 19 (8.4) | 27 (11.8) |
Rasch analysis: item reduction and scale formation
Scale (No. of items) | Rasch analysis | Traditional psychometric analysis | ||||||
---|---|---|---|---|---|---|---|---|
Items with disordered thresholds | Item locations logits range | Fit statistics fit residuals outside +/−2.5 | Items with Chi square probability significance ≥0.001 | Person separation index | Cronbach alpha | Range IIC | Scaling assumptions corrected ITC | |
Pain (8) | 5 | -0.94 − 0.80 | 0 | 4 | 0.78 | 0.89 | 0.24 – 0.66 | 0.53 – 0.70^ |
Exudate (8) | 4 | -0.51 − 0.48 | 0 | 0 | 0.59 | 0.92 | 0.40 – 0.86 | 0.56 – 0.84^ |
Odour (6) | 2 | -1.47 − 0.60 | 0 | 0 | 0.62 | 0.96 | 0.74 – 0.91 | 0.83 – 0.92^ |
Sleep (6) | 3 | -0.54 − 0.31 | 0 | 0 | 0.62 | 0.92 | 0.48 – 0.84 | 0.67 – 0.86^ |
Vitality (3) | 0 | -0.48 − 0.44 | 0 | 0 | 0.03 | n/a | n/a | n/a |
Movement/mobility (11) | 4 | -0.33 − 0.48 | 0 | 0 | 0.58 | 0.93 | 0.23 – 0.91 | 0.67 – 0.80^ |
ADL (9) | 8 | -0.54 − 0.57 | 0 | 0 | 0.29 | 0.95 | 0.41 – 0.90 | 0.58 – 0.90^ |
Emotional well-being (17) | 4 | -1.15 − 1.46 | 1 | 0 | 0.82 | 0.93 | 0.24 – 0.79 | 0.54 – 0.76^ |
Appearance & self- consciousness (7) | 4 | -0.83 − 0.65 | 0 | 0 | 0.56 | 0.90 | 0.41 – 0.75 | 0.60 – 0.79^ |
Participation (9) | 4 | -0.56 − 0.54 | 0 | 0 | 0.65 | 0.96 | 0.53 – 0.89 | 0.73 – 0.90^ |
Traditional analysis
Field-test two: final psychometric evaluation
Sample
Rasch analysis
Scale (No. of items) | Disordered thresholds | Item locations logits range | Fit statistics fit residuals outside +/−2.5 | Items with Chi square probability significance ≥0.001 | Person separation index | DIF age | DIF gender | DIF HC setting | |||
---|---|---|---|---|---|---|---|---|---|---|---|
Uni | Non | Uni | Non | Uni | Non | ||||||
Pain (8) | 0 | -1.11 − 1.03 | 0 | 0 | 0.72 | 0 | 0 | 0 | 0 | 0 | 0 |
Exudate (8) | 1 | -0.75 – 0.84 | 1 | 1 | 0.69 | 0 | 0 | 0 | 0 | 0 | 0 |
Odour (6) | 0 | -1.31 – 0.91 | 0 | 0 | 0.66 | 0 | 0 | 0 | 0 | 0 | 0 |
Sleep (6) | 0 | -0.91 – 0.45 | 1 | 1 | 0.62 | 0 | 0 | 0 | 0 | 0 | 0 |
Mobility and movement (9) | 2 | -0.46 – 0.57 | 0 | 0 | 0.42 | 0 | 0 | 0 | 0 | 2 | 0 |
Activity (8) | 4 | -0.30 – 0.56 | 0 | 0 | 0.27 | 0 | 0 | 0 | 0 | 0 | 0 |
Vitality (6) | 0 | -0.50 – 0.81 | 0 | 0 | 0.38 | 1 | 0 | 0 | 0 | 0 | 0 |
Emotional well-being (15) | 2 | -1.48 – 2.44 | 0 | 0 | 0.86 | 0 | 0 | 0 | 0 | 0 | 0 |
Self-consciousness (7) | 0 | -1.27 – 1.02 | 0 | 0 | 0.58 | 0 | 0 | 0 | 0 | 0 | 0 |
Participation (9) | 7 | -0.91 – 1.00 | 0 | 0 | 0.57 | 0 | 0 | 0 | 0 | 0 | 0 |
Traditional analysis
Scale (No. of items) | Internal consistency Cronbach’s alpha | IIC | Scaling assumptions-corrected ITC | Test retest reproducibility | Convergent validity | Discriminant validity | ||||
---|---|---|---|---|---|---|---|---|---|---|
ICC consistency | ICC absolute | Correlation | Related SF12 scale r
1
| PU-QOL HRQL item r
1(n) | Gender R
2(n) | Age r
2(n) | ||||
Pain (8) | 0.89 | 0.24 – 0.66^ | 0.53 – 0.70^ | 0.80 | 0.81 | 0.80 | 0.48b
| 0.38b (206) | 0.13b (214) | 0.11b (214) |
Exudate (8) | 0.91 | 0.32 – 0.72^ | 0.51 – 0.75^ | 0.62 | 0.63 | 0.62 | n/a | 0.25a (216) | 0.08b (225) | -0.14b (224) |
Odour (6) | 0.97 | 0.72 – 0.93^ | 0.79 – 0.94^ | 0.68 | 0.68 | 0.70 | n/a | 0.20a (217) | 0.05b (228) | -0.14b (227) |
Sleep (6) | 0.92 | 0.49 – 0.81^ | 0.68 – 0.85^ | 0.82 | 0.82 | 0.82 | n/a | 0.32b (171) | 0.21b (178) | 0.10b (178) |
Vitality (6) | 0.90 | 0.49 – 0.90^ | 0.63 – 0.90^ | 0.74 | 0.74 | 0.74 | 0.36b
| 0.52b (135) | 0.03b (137) | -0.16b (137) |
Movement/Mobility (9) | 0.93 | 0.23 – 0.91^ | 0.67 – 0.80^ | 0.87 | 0.86 | 0.88 | -0.50b
| 0.39b (37) | 0.04b (39) | 0.22b (39) |
ADL (8) | 0.95 | 0.41 – 0.90^ | 0.58 – 0.90^ | 0.87 | 0.87 | 0.87 | -0.38b
| 0.35b (48) | -0.05b (49) | -0.19b (49) |
Emotional well-being (15) | 0.94 | 0.24 – 0.79^ | 0.54 – 0.76^ | 0.83 | 0.82 | 0.83 | -0.44b
| 0.58b (133) | 0.16b (135) | -0.15b (135) |
Appearance & self-consciousness (7) | 0.89 | 0.37 – 0.79^ | 0.62 – 0.76^ | 0.81 | 0.81 | 0.81 | -0.40b
| 0.50b (176) | 0.23b (179) | -0.03b (178) |
Participation (9) | 0.93 | 0.36 – 0.88^ | 0.60 – 0.86^ | 0.63 | 0.64 | 0.63 | -0.52b
| 0.51b (75) | 0.01b (76) | -0.29b (76) |