The rasch model
Data from the scale were fitted to the Rasch measurement model
[
30]. The purpose here is to see if the data satisfy, in a probabilistic manner, the axioms of additive conjoint measurement, and so conform to the requirements for effecting a transformation to interval scaling, rather than having to use the raw score of the scale, which is at the ordinal level
[
31‐
33]. This involves testing a series of assumptions, including the stochastic ordering of items, local response dependency, and unidimensionality
[
34]. Stochastic ordering is evaluated through fit to the model which reflects a probabilistic Guttman ordering
[
35]. A series of fit statistics are used to indicate adequacy of fit, and their ideal values are shown below at the bottom of the summary fit table (Table
2).
Table 2
Fit of PGWBI to the Rasch model
1. Pos Well Being | 4 | -0.24 | 0.96 | -0.42 | 0.84 | 27.9 | 0.68 | 0.81 | 5.59 (3.8-11.2) |
2. General Health | 3 | 0.42 | 0.92 | -0.40 | 0.95 | 12.6 | 0.18 | 0.59 | 1.12 (-0.2-4.3) |
3. Depressed Mood | 3 | -0.06 | 0.67 | -0.40 | 0.89 | 21.4 | 0.77 | 0.88 | 5.03 (1.8-8.2) |
4. Self Control | 3 | 0.21 | 0.34 | -0.30 | 0.77 | 9.5 | 0.40 | 0.65 | 2.79 (0.0-6.0) |
5. Anxiety | 5 | 0.28 | 1.74 | -0.41 | 1.15 | 51.8 | <0.00 | 0.81 | 7.26 (4.1-10.5) |
6. Anxiety | 3 | -0.10 | 1.66 | -0.58 | 1.19 | 15.8 | 0.08 | 0.72 | 3.37 (0.2-6.5) |
7. Vitality | 4 | 0.26 | 1.39 | -0.38 | 0.87 | 9.89 | 0.62 | 0.85 | 3.91 (0.7-7.1) |
8. Time 1 | 22 | 0.58 | 2.39 | -0.15 | 1.55 | 158.5 | < 0.01 | 0.92 | 25.7 (22.5-28.9) |
9. Six testlet | 6 (22) | 0.25 | 1.78 | -0.35 | 1.15 | 18.3 | 0.11 | 0.85 | 7.8 (4.6-11.0) |
10. Time 2 | 22 | 0.30 | 3.01 | -0.13 | 1.59 | 399.2 | <0.01 | 0.94 | 21.2 (18.0-24.4) |
11. Six testlet 2 | 3 (22) | 0.21 | 2.23 | -0.36 | 1.21 | 22.5 | 0.03 | 0.89 | 8.9 (5.7-12.1) |
12. Five testlet 2 | 5 (22) | 0.12 | 2.11 | -0.42 | 1.23 | 20.1 | 0.17 | 0.88 | 7.3 (4.1-10.5) |
13. Pooled | 2 (22) | 0.14 | 0.92 | -0.64 | 0.99 | 3.8 | 0.95 | 0.86 | 4.9 (2.6-7.1) |
Ideal Values
| |
0.0
|
<1.4
a
|
0.0
|
<1.4
| |
>0.05
b
|
>0.85
|
(LCI <5%
)
|
The item trait interaction and standardized mean person and item fit, was evaluated by using X
2
statistics with non-significant X
2
probability values. A significant X
2
indicates that the hierarchical ordering of the items varies across the trait being measured (ie, psychological well-being), which compromises the required property of invariance. Available as a summary fit statistic, and for each individual item, Bonferroni corrections are applied to the X
2
at the 0.05 level.
The standardized mean values of the summary person and item fit residuals by a mean (SD) score of 0.0 ± 1.0 indicates perfect fit. At the individual item-and person level of fit, a nonsignificant X
2
probability value and standardized fit residuals of between -2.5 and +2.5 indicate adequate fit the latter consistent with the 99% confidence interval for the residuals, thus allowing for some recognition of multiple testing (i.e. setting the significance level at 0.01).
Local response dependency is where items are linked in some way, for example two items about climbing stairs, where one asks about difficulty for climbing a single flight, the second about several flights. If a respondent has no difficulty in climbing several flights of stairs, then they must also have no difficulty climbing a single flight of stairs. This breaches the local independence assumption that says that, conditioning on the trait being measured, responses to items must be independent
[
36]. The presence of local dependency inflates reliability, and compromises parameter estimation
[
37]. Local response dependency can be identified through the correlation of residuals which, in the current analysis, is a value of 0.2 above the average residual correlation. The problem can be accommodated though testlets where the items are simply summed together into a ‘super item’ or testlet (in the climbing stairs example this would form the equivalent of one question asking how many flights of stairs can be climbed without difficulty)
[
38]. Where all items are reduced to a set of testlets this is formally equivalent to a bi-factor model
[
39]. The latent correlation between testlets can also be determined, as well as the proportion of non-error variance accounted for when the testlets (super items) are added together to make a total score
[
40].
As a basic assumption of summating any set of items to make a total score is that the set are unidimensional, it is crucial to ensure that this is the case
[
41]. In RUMM2030, the software used in the current study, Smith’s test of unidimensionality is implemented whereby items loading positively and negatively on the first principal component of the residuals are used to make two independent person estimates (in this case of well-being), and these are contrasted through a series of independent t-tests
[
42]. Person estimates from these subtests were compared, and if more than 5% of these tests were found to be significant, then the scale was considered multidimensional.
A binomial confidence interval of proportions can be used to show that the lower confidence interval of the observed proportion falls below the 5% level.
In addition the process of Rasch analysis also allows for an investigation of polytomous item threshold ordering and Differential Item Functioning (DIF). Threshold ordering is important to ensure that the increase in the category of response to an item, represented by the transition point (threshold) between categories, reflects an increase in the underlying trait. Where this fails, it is indicative of a ‘disordered threshold’, which can be adjusted by the collapsing of categories.
For DIF the response to an item, condition upon the level of the trait, should not differ across group membership such as gender. When this is found to differ, it can be dealt with by ‘splitting’ items such that, for example, an item becomes two items, one for each gender, with structural missing values for the excluded gender. In this paper DIF by age, gender, and whether or not the patient was working, was tested.
A reliability index (Person Separation Index - PSI) is also reported. Where data are normally distributed this can be interpreted as similar to Cronbach’s alpha, and thus values of 0.7 and 0.85 are indicative of reliability sufficient for group and individual use respectively
[
43]. Where the distribution of data departs from normality, it is useful to view both the PSI and alpha, to gain insight into the effect of skewness and floor and ceiling effects. Under these circumstances the PSI reflects the number of statistically significant groups of patients (strata) that can be identified by the instrument
[
44].
Data from each time point was initially separately analysed. Once data were shown to fit the Rasch model, the data was pooled and tested for invariance (lack of DIF) over time. The procedure by Mallinson was used to accommodate repeated measures
[
45]. Given fit to the Rasch model, an interval scale latent estimate (in logits) is available for further analysis, with the raw score transformed into a suitable range, for example 0-100.
Targeting of persons and items (person-item threshold distribution) was made by comparing the mean location score obtained for the patients with that of the items (almost always zero at the center of the scale). In the Rasch model, the center of the scale represents the item (in the dichotomous case) of average difficulty
[
46].
The data from this scale were also subjected to a Confirmatory Factor Analysis (CFA) to gain insight into both the comparison with the Rasch analysis, and previous published factor analysis of the scale. In the case where the scale fails the CFA (where the correlation of error terms would be allowed), an Exploratory Factor Analysis (EFA) with a PROMAX rotation would be considered. The Root Mean Square Error of Approximation (RMSEA) is reported here (where a value of <0.10 is considered weak, and a value of <0.08 is considered a moderately supported of a unidimensional structure; and additional statistics including the Tucker Lewis Index (TLI) and the Comparative Fit Index (CFI) are indicative of a unidimensional construct when their values exceed 0.95.
Mplus 6 was used for the CFA and EFA
[
47], and RUMM 2030 for the Rasch analysis
[
48]. All other analysis was undertaken with SPSS18
[
49].
The study was approved by The Regional Ethical Review Board in Gothenburg (243-05) and conduced in compliance with the Helsinki declaration.