Background
Foot pain affects one in four people over the age of 45 years [
1] and has a significant impact on mobility [
2],[
3] and quality of life [
4],[
5]. In recent years, several patient-reported outcome measures have been developed to assess the severity and impact of foot pain in clinical practice, epidemiological studies and clinical trials [
6]-[
11]. In order to provide useful information, these questionnaires need to be both valid (i.e. they actually measure what they are intended to measure) and reproducible (i.e. they are able to produce the same scores in identical conditions on different occasions). In addition, to be useful outcome measures in clinical trials, such tools need to be capable of detecting changes in health status over time, a construct commonly referred to as ‘responsiveness’ [
12]-[
16].
Broadly speaking, there are two main approaches for assessing the responsiveness of an outcome measure, most commonly referred to as
anchor-based and
distribution-based[
17].
Anchor-based approaches compare interval changes in outcome measure scores to a dichotomised ‘global’ rating of change score (using a question such as “Overall, how has your condition changed as a result of your treatment?”). The outcome measure score that corresponds to a meaningful change (generally defined as a response of “somewhat better” or above) is considered to be the smallest difference which participants perceive as beneficial, and is termed the minimal important difference, or MID [
12]. The MID can then be used as a benchmark for interpreting the effectiveness of an intervention.
Distribution-based approaches involve examining the statistical distribution of change scores for interval outcome measures (for either between-group or within-group change over time comparisons), and applying a range of statistics to calculate effect sizes [
18]. The smaller the MID or larger the effect size, the more responsive an outcome measure is considered to be.
Recent reviews of measures of foot function, foot health and foot pain indicate that few have undergone adequate evaluation of responsiveness, thereby limiting their use in clinical trials [
6]-[
11]. Two of the most commonly used and most extensively validated measures of foot pain and disability are the Foot Health Status Questionnaire (FHSQ) [
19] and the Manchester Foot Pain and Disability Index (MFPDI) [
20], but neither has undergone detailed analysis of responsiveness. In a study of people with plantar heel pain receiving foot orthoses, the FHSQ was shown to be more responsive than the Foot Function Index (FFI), based on the observation of significant improvements in all four subscales of the FHSQ but only two out of three subscales of the FFI [
21]. No responsiveness data have been published for the English language version of the MFPDI, although a modification of the MFPDI - the Manchester-Oxford Foot Questionnaire - has been shown to be responsive to improvements in foot health status following hallux valgus surgery [
22], and a recent study concluded that a Dutch version of the MFPDI demonstrated only moderate responsiveness [
23].
Given the increasing use of the FHSQ and MFPDI in foot and ankle research, there is a need for a more detailed evaluation of responsiveness of these instruments in order to determine whether it is appropriate to employ them as outcome measures in clinical trials of interventions for foot disorders. Therefore, as part of a randomised controlled trial assessing the effectiveness of extra-depth footwear in older people with foot pain [
24],[
25], we compared the responsiveness of the FHSQ and MFPDI subscales, using a range of recommended statistical approaches [
12]-[
16].
Discussion
The aim of this study was to evaluate the responsiveness of two commonly used measures of foot pain and disability: the Foot Health Status Questionnaire (FHSQ) [
19] and the Manchester Foot Pain and Disability Index (MFPDI) [
20]. To do this, we applied four of the most widely used responsiveness statistics to FHSQ and MFPDI subscale data obtained at baseline and at 16 weeks of follow-up from a clinical trial of off-the-shelf footwear for reducing foot pain in older people [
24],[
25]. Overall, the FHSQ pain subscale exhibited the highest responsiveness, as evidenced by a highly significant paired
t-test and effect sizes ranging from medium to huge. The next most responsive measure was the FHSQ function subscale (borderline paired
t-test and effect sizes ranging from small to very large). Based on these findings, it would appear that the FHSQ is preferable to the MFPDI as an outcome measure in clinical trials evaluating the effectiveness of interventions in reducing foot pain and improving foot function in older people.
The FHSQ footwear and MFPDI concern about appearance subscales performed poorly in this analysis, with negligible to small effect sizes. However, these subscales are not particularly useful in the context of a trial involving a standardised footwear intervention, as the FHSQ footwear subscale items reflect difficulty with obtaining suitable footwear, and the MFPDI concern about appearance subscale items focus on participants' self-consciousness regarding the appearance of their feet and shoes. These two subscales were included in the analysis for the sake of completeness, as the FHSQ and MFPDI are generally administered in their entirety rather than as selected subscales. The poor responsiveness reported here is therefore not a useful indicator of the potential value of these subscales when applied to other interventions. For example, Bennett
et al. [
42] have demonstrated significant improvements in the FHSQ footwear subscale following foot surgery, which is likely due to the surgery allowing a wider range of shoes to be worn postoperatively.
Our observation of the limited responsiveness of the MFPDI is consistent with van der Zwaard
et al.[
23], who reported that a Dutch translation of the MFPDI was only moderately responsive to change in people aged 50 years or over who were enrolled in a randomised trial of treatment for forefoot pain. The authors suggested several reasons for this, including: (i) the three level response options (‘none of the time’, on some days' and ‘on most/every day/s’) are too widely spaced, (ii) pain intensity is not directly addressed, and (iii) the concern about appearance subscale (consisting of only two items) has a large floor effect. In developing the Manchester-Oxford Foot Questionnaire (a modification of the MFPDI for use in foot surgery), Dawson
et al. [
43] addressed many of these issues by increasing the response categories from three to five, adding a pain severity item, and combining the concern about appearance and ability to undertake social, recreational and work activities items into a separate construct referred to as ‘social interaction’. The high responsiveness of this amended scale in patients undergoing hallux valgus surgery [
22] suggests that there may be some scope for improving the MFPDI as an outcome measure.
When interpreting these findings, it should be noted that there is currently no accepted gold standard approach for assessing responsiveness of outcome measures, and that each statistical approach has limitations (for a detailed discussion, see Husted
et al. [
13] and Revicki
et al.[
16]). Paired
t-tests provide an indication of the statistical significance of the observed change in the outcome measure scores, but this is influenced not only by the magnitude of change, but also the sample size (i.e. larger sample sizes are more likely to detect statistically significant differences). Cohen's
d and the standardised response mean are influenced by the variability of the denominator (baseline scores for Cohen's
d and change scores for standardised response mean), so higher variability in the denominator will result in smaller effect sizes. Finally, although the Guyatt index is considered by some to be the most appropriate effect size statistic [
13],[
44], it requires the calculation of a minimal important difference, which may vary across different populations, conditions and interventions [
16]. For this reason, we determined the minimal important difference of the FHSQ subscales by dichotomising a 5-point Likert scale response of perceived overall improvement (i.e. a positive outcome defined as moderate or marked improvement) and calculating mean change scores from our data, rather than using minimal important difference scores calculated from people with heel pain reported by Landorf
et al.[
45]. As no minimally important difference scores have been reported for the MFPDI, we used the same approach for each of the MFPDI subscales, which also allowed us to make direct comparisons between the two outcome measures. However, there is currently no consensus regarding the most appropriate question or number of response levels in determining the anchor used to define the minimal important difference, and anchor-based approaches have limited discriminative ability in trials where most participants report improvement in their condition [
16].
Despite these limitations, and the fact that each test calculates the magnitude of the effect size in different ways, the four statistics we used resulted in a reasonably consistent pattern of responsiveness across the outcome measure subscales. We can therefore be more confident of the superiority of the FHSQ pain and function subscales using this combined approach rather than using one statistical test alone. Nevertheless, this approach only addresses ‘internal’ responsiveness (the ability of a measure to change over time), not ‘external’ responsiveness, which Husted
et al. [
13] have defined as the extent to which changes in a measure over time relate to a corresponding change in an established reference measure of health status. We were unable to evaluate external responsiveness in this study due to the absence of an appropriate reference measure for comparison. Although we collected Short Form 12 data from this sample, generic health outcome measures are generally less responsive than condition-specific measures and are therefore not considered to be suitable reference standards [
6],[
22]. Outcome measures more directly related to mobility and physical function in older people, such as the disability index of the Health Assessment Questionnaire [
46], may be more appropriate reference standards for future evaluation of external responsiveness.
Our findings provide further support for the continued use of the FHSQ as an outcome measure in clinical trials of foot disorders. However, although the recoding of FHSQ response from a Likert scale to a 100-point scale enables the FHSQ subscales to be expressed as interval data, the FHSQ has so far not undergone Rasch analysis - a statistical technique which evaluates whether overall scores summed from ordinal items can be considered to be linear, interval-level variables [
47]. Such an analysis is necessary to confirm whether it is indeed appropriate to analyse FHSQ subscales using parametric statistical approaches.
In summary, this study has shown that the FHSQ pain and function subscales are most responsive to change over time in older people receiving a footwear intervention to alleviate foot pain. The FHSQ footwear and general foot health subscales and the three subscales of the MFPDI exhibited lower responsiveness, so may not be appropriate outcome measures in this population. Further research is required to determine the internal responsiveness of the FHSQ as an outcome measure in trials of other foot disorders, interventions and clinical populations, and to evaluate external responsiveness against an appropriate reference standard.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
HBM and NF conceived the idea and obtained funding for the study. HBM, NF and SEM designed the study protocol. MA and SR collected and entered the data. HBM conducted the statistical analysis and drafted the manuscript. All authors assisted with the writing of the manuscript, and read and approved the final manuscript.