Data source and study population
FIMPACT is a randomised, placebo-surgery controlled three-arm efficacy trial of subacromial decompression for treating SAPS. The trial was conducted at three orthopaedic clinics in Finland. One hundred ninety-three patients aged 35 to 65 years with SAPS were randomised to arthroscopic subacromial decompression (ASD), diagnostic arthroscopy (DA) or exercise therapy (ET), and followed for 24 months. At the eligibility screening visit, an experienced shoulder surgeon examined the patients to rule out shoulder instability, rotator cuff rupture, frozen shoulder or other causes of shoulder symptoms. All potentially eligible participants had standard x-rays and MRI to rule out rotator cuff rupture and other shoulder pathology. Baseline characteristics of participants are presented in Table S
1 in the supplementary appendix and full details of the study can be found in the original articles [
16,
19].
Data time points
Pain and global rating of change (GRC) were collected at baseline, 6-, 12- and 24-month follow-ups; SST and Constant-Murley score were measured at baseline, 6- and 24-month follow-ups.
Data analysis for MID
We used the GRC as the anchor question for calculating the MID. An adequate transition anchor should correlate to the change in outcome, and ideally correlate equally, but in opposite directions to the scores of outcomes at baseline and at follow-up time points (post scores) [
21]. The correlation to change should be larger than the correlation to post scores when the GRC captures true change [
22]. To explore this, we calculated the correlation coefficients (Spearman’s rho) for the GRC answers at different time points and baseline scores, the GRC and each of the respective post scores and the post scores of the combined dataset, and the GRC and the change scores of the outcomes, also at follow-up time points and the combined dataset. 95% CIs were calculated by bootstrapping 1000 samples for the correlations between the anchor and relevant scores.
We used three approaches to determine the MID for improvement: 1) the ROC method, 2) the mean difference of change (MDoC) method and 3) the mean change (MC) method.
For the ROC method [
23], we dichotomised the GRC to improved (responses 1–3; Table
1) and no change (response 4; Table
1). Participants with response worse (response 5; Table
1) were excluded from the ROC analyses to obtain MID estimates for improvement [
24]. Because very few patients deteriorated, we could not estimate MIDs for worsening. We used the closest point to top left corner method to choose the cut-off value for the outcome, maximising specificity and sensitivity [
25]. For the target measures, we calculated change from baseline to each follow-up point.
To evaluate how well each measure could discriminate between those who were improved and those who were not improved, we calculated the area under the ROC curve (AUC). We determined the confidence intervals for AUC using DeLong’s method [
26]. The area ranges from 0.5 (no accuracy in distinguishing improved from not improved) to 1.0 (perfect accuracy) [
27,
28]. In musculoskeletal conditions, AUC values between 0.7 and 0.8 are acceptable, and value greater than 0.8 is considered to have good to excellent discrimination [
29].
In the MDoC method, we calculated the mean difference of the change scores of each outcome from baseline to the follow-up time point (with 95% CIs) between the participants who answered” Somewhat satisfied” and” Dissatisfied” (responses 3 and 4; Table
1). In the MC method, we determined the mean of the change scores from baseline to the follow-up time points (with 95% CIs) of those who reported” Somewhat satisfied” (response 3; Table
1). With the MDoC and MC methods, the 95% CIs were calculated by bootstrapping 1000 samples for the MID values.
We combined the data across all time points (6, 12, 24 months) and used the whole dataset irrespective of treatment for analyses to provide an estimate derived from a larger number of GRC-outcome pairs. We explored the ROC curves, and MID and PASS estimates at different time points and found them to be very similar, supporting our decision to pool data for our primary analysis. To explore whether the different treatments affected the MIDs, we performed sensitivity analyses and calculated MIDs for patients who underwent surgery (ASD and DA groups combined) and for patients who received exercise therapy. In the FIMPACT trial, the blinding between ASD and DA held well, and the patients in both ASD and DA groups subjectively underwent “surgical treatment”.
Data analysis for PASS
For PASS, we used the ROC and the 75th percentile [
30] methods for the combined dataset. The ROC method was used similarly as in MID. We used the closest point to top left corner method [
25] to determine the cut-off point and the AUCs were used to evaluate how well each measure could discriminate between participants who reported “Very satisfied, my shoulder has healed completely” and the rest of the cohort (responses 2–5, Table
1). In 75th percentile method, PASS was defined as the 25th percentile score for Constant-Murley score and Simple Shoulder Test, and 75th percentile score for pain VASs from the distribution of the patients who answered: “Very satisfied, my shoulder has healed completely”. Because the choice whether to use GRC 1 only or both 1 and 2 is debatable, we also calculated the PASS thresholds between participants who reported “Very satisfied, my shoulder has healed completely” and “Satisfied—I have only minor, activity related symptoms. My shoulder is much better than before treatment.” (responses 1–2, Table
1) and the rest of the cohort (responses 3–5, Table
1).