Introduction
Shoulder pain is the third most common musculoskeletal complaint, after back and knee pains [
1]. It is associated with considerable disability for the patient and costs to society. Depending on the diagnosis, many different surgical and non-surgical treatment modalities have been described. In research and clinical practice, determining whether a treatment results in meaningful improvement of symptoms requires the use of high-quality measurement tools.
Over the past decade, there has been a shift in interest from pathophysiological measurements to measuring patient-perceived health. This has resulted in increased use of patient-reported outcome measures (PROMs, also known as PROs). PROMs are self-evaluated measurements of any aspect of a patient's health status, without interpretation of the patient's response by a clinician or anyone else [
2]. PROMs are often questionnaires specifically evaluating pain and function from the patient's perspective. The quality of a PROM can be determined by assessing the measurement properties of the instrument. The consensus-based standards for the selection of health measurement instruments (COSMIN) initiative provide a checklist of standards for assessing the measurement properties of validity, reliability, and responsiveness [
3,
4]. This list does not include interpretability, which is a very important attribute of a questionnaire used in daily clinical practice. Interpretability refers to what a PROM score means; for example, a given score can be interpreted by providing reference data from the general population.
Interpretability is also important in regard to change scores; it is important to know when it can be said that a patient has improved. With many PROMs, change scores are often difficult or impossible to interpret, simply because we do not know exactly what a given difference in score means. Interpreting change in PROM scores requires two benchmarks: the measurement error, expressed as the smallest detectable change (SDC), and the minimal important change (MIC). The SDC is a measure of the variation in a scale due to measurement error. Thus, a change score can only be considered to represent a real change if it is larger than the SDC. The SDC is also known as the minimal detectable change; when using its 95% confidence interval, it can be abbreviated as MDC95%.
The MIC is defined as the smallest measured change score that patients perceive to be important [
5‐
7]. If the SDC is smaller than the MIC, it is possible to distinguish a clinically important change from measurement error with a large amount of certainty. However, this is much more difficult if the SDC is larger than the MIC, since there is a considerable chance that the observed change is caused by measurement error [
8]. The MIC is also known as the minimal clinically important difference (MCID).
Both the SDC and MIC are expressed using the same units as the original measure, and thus, these numbers have considerable value for clinical use. Using these two benchmarks to interpret change scores is particularly beneficial when PROMs are applied in individual patients, such as in clinical practice. On a group level, knowledge on the MIC will also provide clinicians with better options for interpreting study results. The MIC can be used to calculate the percentage of patients who report a change greater than the MIC (responders) in each arm of a trial, and these percentages of responders can be compared [
9]. Researchers can also use the SDC and the MIC on a group level to calculate an adequate sample size or to perform power analyses, as described by Terwee et al. [
8].
Some studies have already assessed measurement error (SDC) and interpretability (MIC) of body part-specific PROMs for patients with shoulder problems [
10‐
18]. The present study aimed to determine the SDC and MIC of four commonly used shoulder PROMs: the Disabilities of the Arm, Shoulder, and Hand (DASH); the Shortened Disabilities of the Arm, Shoulder, and Hand Questionnaire (
Quick DASH); the Simple Shoulder Test (SST); and the Oxford Shoulder Score (OSS), and compare the results.
Discussion
Monitoring the effects of treatment is of well-recognized importance and is the foundation of modern evidence-based health care. SDC and MIC can be used as benchmarks for the interpretability of a PROM to determine whether the observed change is beneficial to the patients. Here, we determined the SDC and MIC of four commonly used shoulder PROMs in a heterogeneous group of shoulder patients. We found an SDC of 2.8 and a MIC of 2.2 for the SST, an SDC of 16.3 and a MIC of 12.4 for the DASH, and an SDC of 17.1 and a MIC of 13.4 for the Quick DASH. For the OSS, we found an SDC of 6.0 and MIC values of 6.0 and 4.7 for function and pain, respectively. Overall, the SDC was slightly larger than the MIC for all four PROMs.
To determine whether a change score on an individual patient level is clinically important and not just measurement error, the SDC score must not exceed the MIC change score [
8]. In our study, all PROMs had an SDC that was slightly larger than the MIC. This means that if an individual patient has a change score as large as the MIC, we cannot be 95% sure that this change is not due to measurement error. In other words, the risk of measurement error is larger than 5%, and individual patient's change scores should be interpreted with caution. However, as the differences between the SDC and the MIC were rather small, we think that these four PROMs are suitable for use in clinical practice. In research, the measurement error is much less problematic because group mean changes are analyzed, and the SDC of a mean change is equal to SDC/√
n. In research, the MIC can also be used to calculate the percentage of patients who report a change greater than the MIC (responders) in each arm of a trial, and these percentages of responders can be compared [
9].
Although the observed differences between SDC and MIC were very small, it is desirable to find ways to minimize the SDC. One way of decreasing the SDC in a clinical setting is by averaging multiple measurements (i.e., repeated measurements at one point in time) in order to decrease the measurement error. However, this is difficult using questionnaires because it is a burden for patients and there is a high risk of recall bias. It might also be possible to improve the quality of the questionnaires by adding extra questions or improving the wording of questions.
The observed difference between SDC and MIC is less problematic in research because mean scores of groups of patients are used instead of individual patient scores; therefore, the measurement error should be calculated for a mean score instead of a single score. The SDC of a mean score is much smaller (by a factor of the square root of the sample size) than the SDC of a single score [
5,
37].
Table
5 presents an overview of the previously reported measurement error (SDC) and MIC of the PROMs evaluated in this paper [
10‐
18]. Our results for the SST are comparable with the results published by Roy et al. [
18] with a MIC of 3.0 6 months after shoulder arthroplasty and by Tashjian et al. [
15] who determined the MIC in 81 patients with rotator cuff tears. Although Tashjian et al. used a comparable anchor-based mean change score method, they determined the MIC by subtracting the change score of the ‘unchanged group’ from that of the slightly improved group (MIC -
substract). While there is no consensus on whether this subtraction should be performed, Hays et al. [
42] have argued that if the mean change in the unchanged group is 2 points and the mean change in the slightly improved group is 4 points, this means that a 2-point change is insufficient and that it takes a greater change of 4 points to constitute a MIC [
42]. We agree with Hays et al. [
42] that the unchanged change score should not be subtracted from the slightly improved change score. However, it is possible to calculate the MIC -
substract from our data (see Tables
1 and
2). For example, for the SST, the MIC -
substract for the functional anchor would be -2.2 - -0.1 = -2.1 and for the OSS -6.0 - -1.0 = -5.0. Both techniques give almost the same MIC values for the SST, DASH, and
Quick DASH, only for the OSS there is a small difference.
Table 5
Overview of previously published SDC and MIC values for the SST, DASH,
Quick
DASH, and OSS
SST | | 81 | n.m. | 2.8 |
| 120 | - | 3.0 |
DASH | Schmitt and Di Fabio [ 14] a | 53 | 14.6a | 10.2 |
| 361 | 10.7 | 11.5 |
| 109 | n.m. | 10 |
| 41 | 7.9 | - |
Quick DASH | | 101 | 13.3a,b | 8.2 |
| 35 | n.m. | 13.1c |
| 47 | 18.6a | |
OSS | - | - | - | - |
Our results for the DASH were comparable with the results found in the literature. Schmitt and Di Fabio [
14] used the anchor-based mean change method to analyze a heterogeneous group of 53 shoulder patients and found a SEM of 5.22 and a MIC of 10.2. They used a 90% interval for the SDC calculation. To improve comparability, we recalculated their data to a 95% interval, resulting in an SDC of 14.6. Beaton et al. [
10] studied a cohort of 361 heterogeneous shoulder patients treated by physiotherapists, using a comparable anchor-based mean change method; they found an SEM of 3.9, an SDC of 10.7, and a MIC of 11.5. Gummesson [
11] found a MIC of 10 in a comparable study in 109 upper extremity patients. Gabel et al. [
16] found a lower SDC of 7.9; this is probably due to the fact that the test re-test was done within 48 h, increasing the chance of a recall bias.
The results of the
Quick DASH were also comparable with those in the current literature. Mintken et al. [
12] analyzed 101 shoulder patients. Using a comparable anchor-based technique, they found a MIC of 8.2. They calculated SDC using the unchanged group at follow-up, which is a suboptimal technique for determining the measurement error because of the risk of bias due to the lack of validity of the anchor [
43]. They also used a 90% interval for the SDC calculations; we recalculated the SDC to a 95% interval, resulting in an SDC of 13.3. Polson et al. [
13] analyzed 35 upper extremity patients with an anchor-based mean change technique. They found a higher MIC of 19 points, most likely because they used the ‘much improved’ group for the MIC calculations instead of the slightly improved group as we did in this study. Polson et al. [
13] also reported the change score of the slightly improved group to be 13.1; this information is used in Table
5 to improve the comparability of our results. Gabel et al. found comparable results to our study, with a 95% recalculated interval of the SDC of 18.6 for the
Quick DASH [
17]. There is no international consensus on the optimal cut-off point on an anchor; however, we think that the slightly improved group best reflects a minimal important change opposed to the much improved group.
Our method to calculate the MIC is comparable with Jaeschke et al. and Redelmeier and Lorig [
6,
7]. Jaeschke et al. used a 15-point rating scale and used the mean change in patients who reported to be ‘almost the same,’ ‘a little better or a little worse,’ or ‘somewhat better or somewhat worse’ as the MIC value. Redelmeier used a similar 15-point scale and used the mean change in patients who reported to be ‘a little better or a little worse’. We used a 7-point rating scale and used the change in patients who reported to be slightly improved as the MIC.
To the best of our knowledge, there is no previous data on SDC and MIC for the Oxford Shoulder Score [
44]. One-third of the questions in the OSS are pain related, so we used both anchors. We found an SDC of 6.0 points on a scale from 12 to 60. The MIC was 6.0 corresponding to the functional anchor and 4.7 to the pain anchor.
Strengths of this study are that there were almost no missing data and we had very high response rates at all time-points. This is a clear advantage of web-based questionnaire administration. Furthermore, we included twice the recommended minimal number of patients.
There are several limitations to our study. First, we used a heterogeneous population for calculation of the MIC. There is no evidence in the literature that the MIC differs among (sub)populations of different diagnosis and surgical or non-surgical treatment, but it has been suggested that this should be evaluated [
35,
45]. This was not possible in our study because the subgroups would be too small. The advantage of using a heterogenic cohort is that it provides a MIC estimation that can be used in all kinds of shoulder disorders and for surgical and non-surgical treatments. Future studies should examine if and how the MIC varies among subgroups. Second, our patients had to complete three different PROMs at the same time. This could be a response burden to the patient, which might lead to loss of interest during completion. Theoretically, this could result in increased measurement error and a higher SDC. Third, we computed the
Quick DASH from the full DASH questionnaire. This is not the same as completing the
Quick DASH questionnaire independently. Fourth, the test-retest was determined within 1–4 weeks (average 12.8 days). We cannot be completely sure that none of the patients changed within this time frame. However, in The Netherlands, patients start physiotherapy treatment in general not earlier that 1–2 weeks after their initial visit and none of the patients were treated surgically within the test re-test period, so we do not expect patients to change within this time frame. Fifth, although anchor-based techniques are considered the best method for assessing the MIC [
35]; there is a debate in the literature about the validity of anchors and the best statistical approach for calculating the MIC [
46]. For example, a disadvantage of the mean change method is that it uses only the average change score of one patient subgroup for the MIC calculation, meaning that only 23 patients determined the MIC value in this study. For these methodological reasons, it has been recommended that the MIC of PROMs should be determined in multiple studies [
47]. Our study therefore contributes to a better understanding of the change scores of PROMs in shoulder patients.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
DAvK designed the study and wrote the protocol. He managed the database and preformed the analysis. He wrote and revised the manuscript. WJW helped in designing the study and included all the patients at the outpatient clinic. He critically reviewed the study. LWAHvB helped with the database management and performed parts of the analysis. She critically reviewed the manuscript. VABS helped in designing the study and performed parts of the analysis. She critically reviewed the manuscript. RMC helped in designing the study. He critically reviewed the manuscript. CBT helped in designing the study, advised on the statistical analysis, and critically reviewed the manuscript. All authors read and approved the final manuscript.