Study design and population
The data for the present study is baseline data from a longitudinal cohort study where the aim was to evaluate psychosocial working conditions, stress, health and well-being among employees in two human service organisations in Western Sweden. The data was collected in 2004 through a postal questionnaire sent to a random sample (n = 5,300) of 48,600 employees of the Region Västra Götaland, a large public healthcare organisation, and a random sample (n = 700) of 2,200 social insurance office workers in the same geographical area. An inclusion criterion of at least one year of employment (at least 50% of full-time) was applied. Two reminders were sent to non-responders. Written informed consent for participation in the study was obtained from participants. The study was approved by the Regional Ethical Review Board in Gothenburg, Sweden, and it was conducted in accordance with the 1964 Declaration of Helsinki. The total response rate was 62%. Due to the selection criteria, the participants were mainly employed in the healthcare sector (86%). The three most common professions were nurse, assistant nurse and physician and the mean age was 48 years. Further demographic and study-specific details are available in published studies [
19,
25,
26].
In this study, 2,817 individuals who responded to the SEQ items were available for analysis (2,378 women, 439 men). For evaluation of DIF, it is recommended that the groups compared are of approximately equal size, which ensures that if there is DIF, the results will not be dominated by the group that has the largest sample size [
27,
28]. Consequently, to achieve a balanced data set in terms of gender, approximately 20% of the women were sampled randomly from the original data set. The final study population thus comprised 441 women and 439 men. This sub-sample and the total sample were comparable in terms of age and profession.
Measures
The SEQ is an adjective checklist with two dimensions – stress and energy – hypothesised to describe two critical aspects of mood at work. The original overall question to be answered through the checklist is:
“How do you usually feel at the end of a normal working day?” In a modified version used in this study, the time perspective was changed to “
during the past week”. Based on the theory of allostatic overload [
29], we postulated that the dominant level of arousal during the past week rather than at the end of a working day would be more closely related to long-term stress exposure.
The SEQ is based on the circumplex model of affect proposed by Russell [
30]. According to this theory, stress and energy represent bipolar dimensions, giving two total scores, one for the stress dimension and one for the energy dimension. Hence, the stress dimension ranges from positively evaluated low activation to negatively evaluated high activation. The energy dimension ranges from negatively loaded low activation to positively loaded high activation. Each dimension is operationalised using three positively oriented items (stress:
rested, relaxed, calm; energy:
active, energetic, focused) and three negatively oriented items (stress:
tense, stressed, pressured; energy:
dull, inefficient, passive). The response alternatives are:
not at all,
hardly,
somewhat, fairly, much and
very much. The interpretation of response categories goes in opposite directions for positive and negative items. For positively loaded items,
very much implies the lowest stress level and the highest energy level (the most favourable response), while
not at all is the least favourable response. The opposite is true for negatively loaded items. Response categories are coded numerically (0–5). The numerical coding of positively loaded stress items and negatively loaded energy items were reversed before calculating a total mean score for stress and energy respectively.
Data analysis
The Rasch model, named after a Danish mathematician [
31], is based on a latent trait theory and falls into the modern psychometric approach category. The model is intended for the development and evaluation of multi-item instruments. The Rasch model operationalises the axioms of additive conjoint measurements, which are the requirements for the measurement construction [
32-
35]. The SEQ data were fitted to the Rasch measurement model using the unrestricted or partial credit model for polytomous cases, which allows the distances between thresholds to vary across the items [
36,
37]. A threshold is the point between any two adjacent categories in which the probability of either response is equally likely. Data were fitted to the Rasch model using the RUMM2030 software [
27]. Stress and energy dimensions were analysed separately as one of the assumptions for the Rasch analysis is unidimensionality.
The aim of the Rasch analysis is to see how well the observed data satisfy the model expectations. The Rasch analysis process involves testing a series of assumptions, including stochastic ordering of items (monotonicity), unidimensionality, local independency and principle of invariance [
38]. The adequacy of fit is evaluated using multiple fit statistics and their ideal values are shown at the bottom of summary fit Table
1 [
39].
Table 1
Fit to the Rasch model
1 Stress, 6 items | 0.48 | 1.30 | −049 | 1.19 | 52.79 | 0.52 | 0.92 | 10.5 (9.1;12.0) |
2 Stress 2 testlets | 0.31 | 0.56 | −0.62 | 1.03 | 13.57 | 0.75 | 0.87 | 4.4 (3.0;5.9) |
3 Energy, 6 items | −0.008 | 2.05 | −0.43 | 1.09 | 70.61 | 0.06 | 0.80 | 8.4 (7.0;9.9) |
4 Energy, 4 items | −0.31 | 2.35 | −0.42 | 0.89 | 47.76 | 0.04 | 0.75 | 6.1 (4.6;7.5) |
5 Energy re-scoring | 0.03 | 2.07 | −0.42 | 1.09 | 72.17 | 0.05 | 0.80 | 8.3 (6.9;9.8) |
6 Energy DIFsplit | −0.13 | 1.83 | −0.42 | 1.08 | 74.73 | 0.15 | 0.80 | |
7 Energy 2 testlets | 0.13 | 0.33 | −0.49 | 0.83 | 23.20 | 0.18 | 0.70 | 3.3 (2.2;5.1) |
Ideal values
|
0.0
|
<1.4
|
0.0
|
<1.4
| |
>0.05
|
>0.7
|
(LCI <5%)
|
Stochastic ordering of items is evaluated through the fit of data to the model. The response structure required by the Rasch model is a stochastically consistent item order, i.e. a probabilistic Guttman pattern [
40]. This means that persons who experience higher stress or energy levels are expected to get higher scores, whereas persons with lower stress or energy levels are expected to get lower scores. The intended increasing level of stress and energy across the response categories for each item needs to be reflected in the observed data. The Rasch analysis can be used to see if items are categorised correctly and threshold ordering was considered for this purpose. In the case of disordered thresholds, the items can be rescored by collapsing the categories [
38]. The disordering of the thresholds can be viewed graphically by plotting category probability curves.
The invariance criterion implies that the items need to work in the same way (invariantly) across the whole continuum of the latent trait for all individuals. In that case, the relative position of the items, i.e. the ratio between the location values of any two items, must be constant along the trait. Also given the same level of the latent trait (stress or energy), the scale should function in the same way for all comparable groups (e.g. gender). This is commonly known as differential item functioning (DIF). In the presence of DIF, women and men would score differently for a specific item, given the same level of stress or energy.
Three overall fit statistics were considered. The item-trait interaction is the χ2 statistic and reflects the property of invariance across the trait. A significant value indicates that the hierarchical ordering of the items varies across the stress or energy trait. The other two statistics are the item-person interaction statistics, transformed to approximate a standardised normal distribution. In the case of fit, the expected values are a mean of 0 and a standard deviation (SD) of 1. In addition to the overall fit, the individual item and person fit were also considered, both as residuals and as a χ2 statistic. A perfect fit is indicated by a standardised fit residual value of ±2.5 and a non-significant χ2.
In this study, DIF for gender was tested by conducting ANOVA of standardised residuals, which enables separate estimations of misfit along the latent trait, uniform and non-uniform DIF. In the case of uniform DIF, there is a consistent systematic difference in the response to an item across the whole range of the latent trait [
41]. Non-uniform DIF means that the magnitude of DIF is not constant across the trait. Detection of DIF can be dealt with by splitting a misfitted item into two items, one item for women, with missing values for men, and the other for men, leaving women with non-responses [
38]. In order to understand the nature and magnitude of DIF, the initial and resolved analysis can be compared in terms of parameter estimates, given fit to the model [
28,
42].
Local dependency is manifested in two ways – through response dependency and trait dependency, both analysed by means of residual correlations [
43]. The response dependency is where items are linked in a way that the response to one item will depend on the response to another item. The presence of response dependency inflates reliability, compromises parameter estimation and can be detected through the correlation of residuals [
44], which in the current analysis is a value of 0.2 above the average residual correlation. The trait dependency is a violation of unidimensionality.
Unidimensionality is a basic prerequisite for combining any set of items into a total score. Smith’s test of unidimensionality is implemented in RUMM2030 [
45]. For this test, items loading positively and negatively on the first principal component of the residuals are used to make an independent person estimate (in this case, stress and energy), and are then contrasted through a series of independent t-tests [
45]. Less than 5% of such tests would support the unidimensionality of the scale. A 95% binomial confidence interval of proportions can be used to show that the lower limit of the observed proportion is below the 5% level [
45]. If detected, the local dependency can be accommodated by combining locally dependent items into a ‘super item’ or testlets [
46].
In Rasch analyses, both persons and items are calibrated on the logit scale and this enables an evaluation to be made of how well-targeted the items are for the persons in the sample. This can be assessed by comparing the observed mean location score for persons with that of the value of the items, which is set at zero. For a well-targeted instrument, the mean location for persons would be around zero. A positive mean value for persons would indicate that the sample as a whole was located at a higher level of stress or energy that the average of the scale. The reverse is true for negative values. This is presented graphically in the form of a person-item distribution graph. Reliability is reported as a Person Separation Index (PSI), interpreted in a similar way to the Cronbach’s alpha. Values of 0.7 and 0.9 are indicative of sufficient reliability for group and individual use respectively [
47].