Introduction
The prevalence of disability due to Low Back Pain (LBP) increases from the third decade of life on, peaking between the age of 35 and 55 years [
1]. LBP causes a large number of absenteeism and work productivity losses [
2]. This makes LBP the most common health problem in the European workforce. In the Netherlands, costs of LBP have been estimated at 1.7% of its Gross National Product [
3].
The greatest potential for cost reduction is decreasing work absenteeism and disability due to LBP [
4]. Absenteeism and disability at work are influenced by the work ability of a person [
5]. Higher work ability is associated with less disability and pain, and higher quality of life [
6]. The Work Ability Index (WAI) was developed as a measure for self-reported work ability. The Work Ability Score (WAS) is an item of the WAI and compares current work ability with lifetime best [
7]. It is an acceptable brief alternative for the WAI in determining work ability [
8]. Convergent validity between the WAI and WAS is sufficient [
9]. Measurement properties are sufficient in a secondary vocational rehabilitation setting [
10], but have not been analysed in a secondary and tertiary spine care. The interference of chronic pain with daily activities can be assessed by the Pain Disability Index (PDI). The PDI has been validated in patients with chronic pain [
11]. The PDI Work item measures interference of chronic pain with the ability to engage in occupational activities. The PDI-W has also not been validated yet.
The WAS and PDI-W are Patient Reported Outcome Measures (PROMs). PROMs are highly recommended in clinical guidelines to assess the quality of care, treatment effects and change in health status from the patient’s perspective. Selection of PROMs should be based on the strength of relevant measurement characteristics (i.e. validity, responsiveness) [
12]. To meet conditions for construct validity, a measurement instrument should be consistent with hypotheses regarding relationships with other measures. The ability to detect changes in health status within individuals over time (responsiveness) and interpretation of change scores are important characteristics of PROMs [
13]. Minimal Clinically Important Change (MCIC) and measurement error (Smallest Detectable Change, SDC) can be used to interpret change scores. The MCIC is useful as this change score is perceived as beneficial and meaningful to patients [
14].
Despite the usefulness of PROMs, these measurements can be a burden for patients and caregivers. Length of time to fill out the questionnaires, difficulty in completing them independently, and length of time to analyse the results were the most frequently mentioned reasons for not using the measurements [
15]. Therefore, if measurement characteristics are sufficient, the WAS and PDI-W single items may be used in routine care instead of lengthy questionnaires. The aim of the present study was to assess construct validity, responsiveness, and MCIC of the WAS and PDI-W in patients with CLBP.
Discussion
The aim of this study was to assess construct validity, responsiveness, and MCIC of the WAS and PDI-W in patients with CLBP. For the WAS and PDI-W, respectively, 70% and 60% of predefined hypotheses were not rejected, which is lower than the threshold that was set at ≥ 80%. Therefore, construct validity was not supported. The WAS and PDI-W are responsive to change. MCICs of 1.5 point (WAS) and -2.5 points (PDI-W) were found. Nevertheless, clinically important change could not be distinguished from measurement error, since MCICs were smaller than SDC values. Individual change scores up to 5 points should be interpreted with caution.
For construct validity, rejection of more hypotheses than expected can be explained by different reasons. For the WAS, measurement scales of reference instruments might have contributed to the rejection of hypotheses. The WAS asks to compare current work ability to lifetime best, whereas reference instruments only ask for current functioning. Consequently, loss of functioning might have been scored differently resulting in lower correlations. Additionally, the work demands hypothesis focused on work pace, emotional and quantitative demands (i.e. evaluation of time available to finish work), because these questions were classified as work demands by the COPSOQII questionnaire. In retrospect, this construct could have been expanded by inclusion of physical demands, commitment to work, and job satisfaction. These factors are considered important factors in predicting work (dis)ability [
30]. Hypotheses on partial sick leave were rejected. Contrary to the scores on the WAS and PDI-W for patients on sick leave or fully working that were heavily skewed to the lower or higher end of the scale, scores for patients on partial sick leave were normally distributed with a high variance. Therefore we observed lower correlations than a priori hypothesized. Finally, in hindsight, permanent disablement was an insufficient reference test. The majority of permanently disabled patients scored the PDI-W as ‘not applicable’, because this item was irrelevant to these patients.
Floor (WAS; 25%) and ceiling (PDI-W; 15%) effects were also observed, both indicating most severe interference of LBP. These effects might have affected correlations with reference tests. Data were collected from patients receiving secondary and tertiary multispecialty care. Consumption of medical care and the influence of LBP on work ability are higher in this patient sample, compared to patients receiving primary level care [
16]. Therefore, the WAS and PDI-W might not be adequate instruments for distinguishing work ability levels in patients with severe CLBP. Further research should investigate the validity of these items in patients receiving primary care level.
Regarding longitudinal validity, measurement error should be considered in decision-making in individual patients. The SDC
individual for the WAS (4.9 points) and PDI-W (5.2 points) both exceeded the MCIC values (respectively, 1.5 and -2.5 points). This corresponds with results of previous research on PROMs in back pain [
11,
43]. Individual change scores larger than the MCIC but smaller than the SDC
individual should be interpreted with caution. These scores fall within the measurement error, which results in the risk of incorrect classification of patients as improved. The WAS and PDI-W are better at detecting changes at a group level, as the SDC
group was smaller than the SDC
individual. Results of the present study indicate that small changes in work ability can be considered important by CLBP patients. Because CLBP is very disabling [
7,
8], small improvement can have meaningful effect on well-being of patients.
For interpretation of individual change scores, the effect of baseline scores should be taken into account [
40]. Higher (PDI-W) or lower (WAS) baseline values (both indicating worse work ability) require higher MCIC values, because there is a greater potential for improvement [
44]. The results of the present study confirm that MCICs for the WAS and PDI-W are baseline dependent. This is supported by the (inverted) percentage change scores, which were 39% (WAS) and 56% (PDI-W).
Patient burden is an important consideration in selecting measurement instruments. If patient burden is decreased by using single items instead of lengthy questionnaires, then slightly less sufficient measurement characteristics might be acceptable. For example, when patients have to fill out multiple questionnaires or in frequent evaluations (e.g. daily or weekly) assessing work ability trends. In addition, the WAS can be considered on group level and large-scale surveys [
45]. The WAS is also suitable for systematic application during medical examinations in occupational health care or in public health surveys [
9].
A methodological consideration is the dichotomization of the external criterion into improved and unimproved patient groups. The improved group consisted of patients reporting to be
much improved and
completely improved. Only 20% of patients was classified as improved. Previous research stated that
little improved patients can be added to the improved group [
46]. However, other research stated that little improvement is in the range of natural fluctuation [
47]. When
little improved patients are considered improved, accuracy to differentiate between improved and unimproved patients decreases [
44]. In order to better reflect the concept of meaningful improvement,
little improved patients were not classified as improved.
In addition, the patient sample was chosen based on relevance, because increasing work ability is not a treatment goal for all patients. Therefore, retired and permanently disabled patients, and stay at home parents were excluded. We included these patients in a sensitivity analysis to test the accuracy of the applied relevance criterion. Results reported the same MCICs for total group and baseline score groups as reported for the patient sample selected on relevance. Only the PDI-W percentage change score differed, which was 41% instead of 56%. The PDI-W takes unpaid work into account, including that of housework or volunteer. This is also carried out by the excluded patients. It is possible that small improvements in interference of pain with unpaid work are considered important, resulting in a lower percentage change score.
The effect of treatment should also be included in future research. For measurement of responsiveness, it is required to ensure that a proportion of patients is likely to change [
37]. Not knowing for how many patients, when, what type of treatment takes place means that normally it would be difficult to predict whether a proportion of patients is likely to change within the utilized time interval of one year between baseline and follow-up. However, previous studies on patients from the GSC have shown that approximately a third shows clinically relevant improvement on measures of disability and impact of LBP one year after baseline measurement [
16,
48]. Therefore, we expected a similar proportion of our patient sample to improve on work ability during follow-up.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.