The cross-cultural translation and adaptation of the Hand Function Sort for Dutch-speaking patients was successfully performed in a thorough manner. As such, the HFS-DLV can be used for research purposes and in clinical practice. The psychometric properties of the HFS-DLS appeared to be good, although the construct validity needs further study.
Part 1: cross-cultural adaptation of the HFS-DLV
A careful procedure, such as the 5-step translation and adaptation process as applied in this study, should be followed. In testing the prefinal version of the HFS-DLV, 98% of the participants made comments about the items and the comprehensibility in general. In contrast, Konzelmann et al. [
18] stated that only 32% of participants made comments about the prefinal version of the French HFS. Having a researcher present in our setting might explain this difference. Therefore, for future translations of questionnaires, the presence of a researcher orally receiving comments should be considered.
Participants frequently commented that it was unclear which hand to use for the described tasks. The developers of the HFS were consulted regarding this comment. They explained that the self-selection of the participants to either demonstrate their inability to perform the task with the injured hand or their ability to perform the task with their residual capacity is an important psychological variable. This cannot be identified if the participants were instructed which hand to use. Thus, allowing the participants to self-select gives the researchers the opportunity to consider whether and to what degree the participants may be magnifying their symptoms. We recommend adding an explanation to the examiner’s manual about this concept of self-selection and a response to questions of participants regarding the usage of the injured or uninjured hand for the described tasks.
Another frequent comment was that several items were too masculine. This was also described by Konzelmann et al. [
18], who stated that the tasks depicted in items 53–62 are heavy activities more specific to men. Overall, in the development of the HFS, the authors tried to balance gender [
10]. Adjusting the HFS to make it less masculine would indicate more rigorous changes in the tasks and therefore the construct.
The HFS is a questionnaire developed in the early 1990s, using pictures from that era. In the past 25 years, some activities and tools have changed, for example, the use of a rotary opener and cash money is less common. The pictures should be updated to match the current time frame.
For testing the prefinal version of the HFS-DLV, part of the participants had a diagnosis not classified as specific or nonspecific CANS. We assumed this would not affect the comments on the comprehensibility of the items. To prevent bias, none of the participants contributing to part 1 of the study were involved in the analysis for the psychometric properties of the final HFS-DLV, although we did not change any of the items.
Part 2: measurement properties of the HFS-DLV
In total, 6 out of 12 (50%) predefined hypotheses were accepted, which was below the goal of 75%. The highest correlation was found between the HFS-DLV and the QuickDASH, which is in line with the high correlation between HFS-F and the DASH [
18]. The HFS-DLV was also strongly correlated to the PRWHE, which might be explained by the finding that the PRWHE and DASH strongly correlate due to the assessment of comparable constructs [
42].
Our hypotheses for the correlations between HFS-DLV and NRS pain, RAND-36 vitality, and RAND-36 mental health could not be accepted. For all three, a slightly higher correlation then predicted was found.
For the NRS pain, a weak to moderate correlation was predicted, but a strong correlation was found. The predefined hypothesis was based on previous literature and a recent study who found a weak correlation between the HFS and VAS pain (coefficient of − 0.247) [
18]. The average score on the NRS pain was similar with 4.6 vs. 4.9 to Konzelmann [
18]. On the other hand, the pathology underlying the pain was different, in the study of Konzelmann [
18]; more than half of the participants had shoulder pathology, and only one third had hand/wrist pathology. For all items in the HFS, an individual needs the functionality of the hands and wrists; only a small portion of items require intensive use of the shoulders. This might explain why patients with pain from hand/wrist disorders show a stronger correlation with the HFS.
Our assumed correlation for the HFS-DLV with the RAND-36 vitality was weak-moderate, but we found a strong correlation, although this finding was marginally higher than expected. It might be that participants who experience more fatigue and who have less energy, experience more troubles performing the tasks in the HFS-DLV than predicted. For the RAND-36 mental health, a weak correlation was assumed, but a moderate correlation was found. Based on the biopsychosocial model [
43], it can be argued that not only hand/wrist function but also psychological well-being plays an important role for a person when determining his or her ability to perform a specific task. Konzelmann et al. [
18] found a weak correlation with the SF-36 mental component summary, however their sample consisted almost completely of men (84%) and this might play a role in the observed difference.
All three hypotheses for known-groups validity were correct but not of statistically significant difference, although the employment state showed a trend toward significance. For the employment state, only participants with a paid job were included. Participants with voluntary employment and students were categorized as unemployed. This could have affected the outcome, since these participants potentially could be able to perform a paid job. Nearly half of the participants had complaints of both hands, which meant the dominant side was in almost all cases affected. It was, however, not known whether one hand was more affected than the other. Considering the relatively small number of participants, a significant difference might be hard to determine.
Since there was no gold standard to determine the validity of the HFS-DLV, using predefined hypotheses for construct validity seems eligible. Possibly the hypotheses were too strict, since the three hypotheses that were incorrect only slightly differed from the predicted correlations. Alternatively, the validity could be assessed by comparing the HFS-DLV to more objective manners to determine work capacity, such as the Functional Capacity Evaluation (FCE) testing, as has also been performed previously for the English version of the HFS by Matheson et al. [
10]
The internal consistency of the HFS-DLV appeared to be higher than deemed acceptable. Although the recommended total of 434 participants was not reached, with 119 participants an adequate interpretation could be made. A remarkable finding was the very high Cronbach’s alpha (0.98), which tends to be higher when a questionnaire has more items, suggesting redundancy. A similarly high internal consistency has been described before [
18]. Since the HFS has 62 items, redundancy might indeed be present. A high number of items can lead to less motivation toward the end of the questionnaire, especially when all the questions have the same outline and instructions. Furthermore, for a quick evaluation of a person’s functioning in clinical practice, less items are preferable. In further research, the assumed redundancy of the HFS-DLV should be investigated, for example, using factor analysis.
The test-retest reliability determined by the ICC was good and appeared to be comparable with previous research [
18]. The Bland-Altman method showed a centered distribution, with limits of agreement slightly higher than those found by Konzelmann et al., who used a smaller interval (48 h instead of up to 3 weeks) between the two administrations of the HFS [
18]. However, even though we did not actually assess whether or not change in the clinical situation occurred, we did not expect these patients to improve or deteriorate considerably within this interval because of their generally long-standing complaints and absence of treatment during this interval. Since it has a low degree of measurement error, this implies that the HFS-DLV can be used for repeated measures in clinical practice. We determined the measurement properties in a group of patients with CANS from an outpatient hospital and from peripheral hand therapy practices. The test-retest reliability of the original HFS was tested in 48 patients with various upper extremity impairments, including hand fractures, carpal tunnel syndrome, and lacerations [
10]. Konzelmann et al. [
18] investigated a population of hospitalized patients admitted for rehabilitation with upper limb complaints. In all these populations with various upper extremity diseases, the HFS was found to have reasonable to good test-retest reliability.
Responsiveness determined by the AUC was good, although the SDC and MIC were quite high (45/248 and 37/248, respectively). Our SEM of 16.2 is similar to that found by Benhissen et al., but the MIC reported by them is lower (26/248) [
44]. This might be explained by a different method to determine the ROC cut-off point or actual differences in MIC, e.g. due to differences in patient characteristics. Although the HFS is able to discriminate between subjects who have and who have not improved, an improvement in score between 37 and 45 points should be interpreted with caution [
33]. A good responsiveness is clinically important to be able to use the HFS-DLV in daily practice or research to evaluate treatment effects, an important objective of PROs in general.
We observed that some participants filled in more than six question marks on the HFS-DLV, indicating that the questionnaires were marginally reliable. A question mark gives a similar score as if a person is unable to do the task. This could have given an underestimation of the participants’ abilities. Answering with a question mark was not observed in testing the prefinal version of the HFS-DLV. It seemed to make a difference if a researcher was present or not. In the additional comments of the HFS-DLV, participants explained that they chose a question mark when they had never done the tasks stated in the questionnaire. In the current HFS participant instructions, it is not stated what a participant should fill in when they have never done the task before. The general procedure for administration of the HFS states that under guidance of an evaluator, the participant should complete the first two items of the questionnaire. If the evaluator is assured that the participant understands the instructions adequately, the participant can complete the remaining items independently. However, the first two items are frequently encountered tasks with which all participants are familiar. A statement that participants should make a good guess in case of tasks they never performed before could be a valuable addition to the instructions. It would be more practical and less time consuming if a participant could complete the HFS-DLV without the presence of an evaluator. Another possibility would be to exclude the option of the question mark, which would force people to make a choice, but this could lead to incomplete questionnaires. Unreliable questionnaires (≥4 points difference between the similar items of internal check) were more observed for the test-retest reliability and responsiveness analyses. This can be explained by the fact that participants had to complete the HFS-DLV twice. This observation is also an argument to try to reduce the number of items on the HFS.
The strength of this study was the adherence to COSMIN recommendations to assess measurement properties, in particular the use of a wide variety of 6 questionnaires to determine construct validity.
The limitations of this study include the high number of marginally reliable questionnaires, which could possibly be reduced if a researcher would be present at completion of the questionnaires. We investigated patients with specific and nonspecific CANS in our study, so the presented results could possibly be less applicable to patients with hand/wrist pathology caused by trauma and/or systemic disease. Furthermore, the various measurement properties were not all assessed in the same sample, but generally in either a UH or PHTP group. While the majority of patient characteristics was similar, the distribution of diagnoses differed, which might limit generalization of the results. If that were the case this would probably hold true more for construct validity and responsiveness than for internal consistency and test-retest reliability. Further research might focus on determining or confirming the measurement properties of the HFS-DLV in other groups of patients.