Background
Osteoarthritis (OA) is a leading cause of joint pain and disability worldwide [
1] with the overall prevalence of hip and knee OA in the adult population approximately 11 and 24% respectively [
2]. Osteoarthritis costs Australia’s economy $22 billion annually, and the burden of OA is expected to rise due to the ageing population and obesity [
3,
4]. Physiotherapists play an integral role in providing non-pharmacological management for OA. A systematic review of patients’ perceived health service needs for OA showed that patients generally perceive physiotherapists to be important to assist them in managing their condition and prescribing exercises [
5].
Despite international OA guidelines recommending exercise and weight loss [
1,
6,
7] as first line treatments for OA, their uptake is suboptimal in physiotherapy practice [
8‐
11]. Quality indicators (QIs) can be used to assess physiotherapists’ adherence to clinical guidelines recommendations and are accepted tools for assessing OA care [
12‐
14]. They represent minimal acceptable standards of practice [
15,
16] and are typically developed via consensus techniques [
17,
18]. Quality indicators can be assessed by auditing medical records [
18] however, these do not always include information pertaining to quality of care. Self-reporting by health professionals is another method but may introduce bias. To overcome limitations of these methods, patient-reported QIs are an alternative option to assess quality of OA care. Patient involvement in quality assessment is also valuable to enhance quality and relevance of research [
19] as well as to promote patient-centred care, one of the six pillars of high quality care [
20].
A systematic review conducted in 2013 identified QIs from 32 papers pertaining to non-pharmacological, pharmacological and surgical management for OA [
13] but found only one study [
21] (from Norway) that developed QIs in a patient-reported format. Blackburn and colleagues [
22] later developed a similar QI questionnaire in the United Kingdom (UK) by including items from the Norwegian questionnaire. The UK-QI questionnaire was subsequently used by patients across several European countries to assess the care they received from a range of health professionals for their OA management [
23,
24]. However, the UK questionnaire was not tailored to specifically evaluate physiotherapy care. In the Netherlands in 2016, a set of QIs for physiotherapy management in hip and knee OA was established using a Delphi technique [
25] but was not developed into a patient-reported tool. Furthermore, the QIs were based on older clinical guidelines from 2011 [
26] and were not developed with an international perspective given they only recruited a national group of experts for their Delphi panel.
Given the lack of specific patient-reported QIs to assess physiotherapy care for hip and/or knee OA, this study aimed to develop a patient-reported QI tool and to evaluate its clinimetric properties. It is vital to establish the validity and reliability of QI tools if the results are to accurately reflect physiotherapy practice and/or be used to guide decision-making to improve clinical services [
27]. These measurement criteria are prerequisite for any quality measure and should be established prior to implementation of the QIs [
16,
18].
Discussion
This study developed a patient-reported QI tool to measure and benchmark physiotherapy care for people with hip and/or knee OA. A clinimetric evaluation of the QI tool was then performed to establish its reliability and validity in assessing physiotherapy care for this patient group.
Test-retest reliability for each subscale and total score of the QUIPA tool was acceptable (ICC of ≥0.70) although in most cases, the lower bound of the CIs was below 0.70, reflecting variability in the data and/or limited sample size. However, reliability for individual items varied. The item on exercise prescription (item #10) was the only QI that achieved ‘almost perfect’ agreement while the item relating to discussing the benefits of weight loss (item #13a) reached ‘substantial’ agreement. The better reliability of these two items compared to that of others suggests that it was easier for patients to understand their intent and to recall whether or not these aspects of physiotherapy care were provided.
Most of the other items (
n = 13) achieved ‘moderate’ agreement with three attaining ‘fair’ agreement. Ten achieved high observed agreement (> 70%) despite high variability in their Kappa estimates (as indicated by CI). This may be due to the statistical effect of a high or low prevalence of a specific answer for those items. High or low prevalence reduces Kappa estimates despite high observed agreement [
36,
37]. For example, for the QI related to OA assessment (item #1), despite the high observed agreement (78%), the Kappa estimate (95% CI) =0.38 (0.11, 0.62) is low due to high prevalence of ‘yes’ (53 out of 63) responses (leading to a high expected agreement). If agreement is expected to be high by chance, perhaps because most participants select the same value for an item, then even if observed agreement is high, Cohen’s Kappa will be low. Conversely, for the QI related to OA pain (item #8), despite the observed agreement (76%) being comparable to item #1, the prevalence of ‘yes’ (34 out of 63) response resulted in a higher Kappa estimate (95% CI) =0.54 (0.32, 0.72) (Additional file
9). The three items with the lowest Kappa estimates were related to OA assessment, management plan and exercise preference. Despite efforts to maximise specificity, these items likely remained ambiguous and could be interpreted differently across participants. Another potential reason for disagreement between test and retest scores was related to poor recall as we observed interchanges between “yes/no” and “don’t remember” response options within an individual at W12 and W13. We deliberately chose a 3-month window when developing the QUIPA tool in order to capture multi-session episodes of physiotherapy care. Thus, we evaluated test-retest reliability of the tool at thirteen weeks, the period of maximum recall, in order to establish reliability in the ‘worst case’ scenario. Reliability may be greater with shorter recall periods.
Overall, despite generally low Kappa values for single items of the QUIPA tool, the test-retest Kappa estimates and observed agreement were comparable [
21] or only slightly lower [
30] than previous patient-reported QI tools for OA care which have been rolled out and now used in practice. However, it must be noted that these studies used a recall time frame of 2 weeks for evaluating test-retest reliability despite the tools having a maximum recall period of 3 months [
21,
30]. It is therefore not known whether the reliability estimates they reported would have been lower if they had used a three-month recall as we did.
In terms of validity, the QUIPA tool has acceptable construct validity with all three pre-defined hypotheses confirmed (
P < 0.05). These hypotheses were similar to those used for assessment of construct validity in other QI tools for OA care [
21,
30], although the sample size in our subgroups was smaller. While construct validity was supported, our data indicate that the tool does not have acceptable criterion validity as assessed via comparison of participants’ responses at W1 to responses provided by the physiotherapists.
The subscale scores, total scores and most of the individual items of the QUIPA tool achieved low agreement between participants and physiotherapists. Although the recall period for participants was shorter for validity testing, it is possible that treating physiotherapists might have delivered the care as described by the QIs, but participants might not remember receiving the care or misinterpreted the care received. Despite the consumer input to the development of the QI items, it appears that some items were ambiguous and likely to be interpreted differently, particularly from the perspective of a patient or a clinician. For example, for the item relating to review (item #6), a treating physiotherapist might suggest the participant see a physiotherapist for their hip and/or knee OA only when their symptoms flared up and would select the ‘yes’ response to this QI. For the participants, they might only select the ‘yes’ response if their treating physiotherapist proposed a specific date for their next physiotherapy review. It was also not clear for clinicians as to which responses to select if the participant voluntarily offered information relating to certain QIs without any promptings from their treating physiotherapist. Finally, for QIs that were not applicable to all participants (e.g. benefits of weight loss, walking aid, appliances and aids, work advice and depression referral), there were large inconsistencies between participants and their treating physiotherapists concerning whether the ‘no’ or the ‘not overweight’ / ‘no such problems’ option was selected. For the item relating to discussing the benefits of weight loss (item #13a), perceptions of overweight/obese can also differ between the participant and their physiotherapist. Overall, it appears difficult to generate items that are unambiguous, interpreted in the same way by different users and capture all variations in provision of care.
Future directions
This study lays the groundwork for future refinement of the QUIPA tool, a patient-reported QI for benchmarking quality of physiotherapy care in hip and/or knee OA. Further refinement and re-evaluation are required to improve the validity of the QUIPA tool. Considerations for future refinements include a patient recall period shorter than 3 months, removal of ambiguous items, development of more comprehensive instructions to patients about what they should consider when answering the items and reduction of response options.
Strengths and limitations
This study has several strengths. The QIs were generated from an international physiotherapy consensus exercise [
28] that used high-quality clinical guidelines for hip and knee OA [
1,
29]. Other strengths include robust methodology to develop QIs (e.g. defined scope and purpose of the QIs, involvement of patients and physiotherapists, formulation of specific and measurable QIs [
16,
38]) and good response rates to all surveys. In addition, no previous studies have comprehensively evaluated the validity of patient-reported QIs by assessing agreement between patients and their treating clinicians.
Despite achieving the recommended sample size for clinimetric testing, there was limited variation in the profiles of the participants. As such, we had few participants within subgroups such as those who were overweight, had problems with their walking, daily activities or work due to OA and with depression. Given these small sample sizes, and the aim of this work, we elected not to adjust for patient characteristics. Doing so may introduce bias and noise into our estimates of interest. In addition, during the course of this study, we were made aware of the use of pre-treatment ‘registration’ forms in some physiotherapy clinics, which may contain questions relating to the QIs. If information is collected via a form before a consultation rather than via a discussion during a consultation, this may lead to difficulties deciding how to answer the QUIPA items. Finally, despite attempts to increase variability in the data, the majority of the participants and treating physiotherapists chose ‘yes’ as their response options to the QIs.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.