Background
Musculoskeletal disorders are among the principal causes of activity limitation and long term disability [
1‐
3]. In 2004 musculoskeletal disorders accounted for 38% of work-related problems compensated by the Quebec Workers' Compensation Board (CSST) [
4,
5]. For the same year, new cases of musculoskeletal disorders (including low back pain) resulted in nearly 130 million dollars in salary compensation alone [
4,
5]. In the United States, musculoskeletal disorders accounted for 30% of the injuries and illnesses with days away from work in 2005 [
6]. According to the US Bureau of Labor Statistics, the median length of absence resulting from musculoskeletal disorders was 9 days; among those problems, shoulders disorders resulted in the longest absences from work with a median of 15 days [
6].
In epidemiological studies, data on neck-shoulder disorders are often collected by physical examination [
7,
8], by questionnaire [
9‐
15] or with both instruments [
16‐
24]. Physical examination by health professionals is usually recognized as more objective than questionnaires. However, questionnaires permit data collection on many participants for a fraction of the cost and time of a physical examination. Few epidemiological studies on neck and upper extremity musculoskeletal disorders have systematically compared the findings of questionnaires with those obtained by physical examination [
16,
19,
22‐
25]. Only four studies published in English have reported the sensitivity and specificity of a questionnaire compared to clinical examination of the neck-shoulder region to identify individuals with neck-shoulder disorders [
16,
19,
23,
24].
The present study was part of a larger investigation on the prevalence of musculoskeletal disorders among video display unit (VDU) users [
26]. The main objective of the present study was to assess the agreement between a self-administered questionnaire and the physical examination made by a health professional on the presence of musculoskeletal disorders of the neck-shoulder region. Secondary objectives were to assess the effects on the agreement of different questionnaire and physical examination definitions and the importance of the time interval elapsed between the administrations of the tests.
Results
The participation rate was 84% (89.2% for the cases and 77.7% for the non-cases according to the primary questionnaire definition). The VDU users in the agreement study were similar on demographic and occupational characteristics to all VDU users. Study participants were primarily female (83%). The mean age was 44 years. More than 80% of the participants were clerical workers, 11% were professional and executives and 7% were technicians. The average use of VDU was 20 hours per week.
According to the questionnaire definitions, the prevalence of musculoskeletal disorders varied from 2.9% to 17.1% (Table
1). More positive neck-shoulder findings were reported from the physical examination than from the self-administered questionnaire.
The distribution of participants according to the primary definitions (questionnaire and physical examination) and agreement values are presented in Table
2. The comparison of the primary definitions yielded a Kappa of 0.44 and a 72% global agreement. Among questionnaire cases, 79% had a positive physical examination while among non-cases, 66% were negative on examination.
Table 2
Distribution of study participants according to the primary questionnaire and physical examination case definitions (n = 187)
| |
+
|
-
| |
| | N (expected value) | N (expected value) | |
Questionnaire
|
+
| 67 (46) | 18 (39) | 85 |
|
-
| 35 (56) | 67 (46) | 102 |
| | 102 | 85 | 187 |
We investigated whether different questionnaire and physical examination definitions would influence the agreement. Table
3 presents measures of agreement between the five questionnaire definitions and the primary physical examination case definition. Sensitivity and specificity are also presented. Kappa and global percent agreement obtained with the questionnaire definition that required limitations in ADL were similar to measures obtained with the primary questionnaire definition. The definition that included limitations in work activities resulted in the lowest Kappa coefficient of the study (k = 0.19). Percent agreement was always higher among cases than non-cases. Percent agreement among cases (positive predictive value) tended to increase with the inclusion of the functional limitation criteria (Table
3). For the non-cases, global percent agreement (negative predictive value) varied little, remaining around 60% for all functional limitation definitions. The inclusion of the functional criteria to the primary questionnaire definition increased specificity but decreased sensitivity figures.
Table 3
Agreement between the five questionnaire definitions and the primary physical examination definition
1. Primary (n = 187)(3)
| 0.44 | 0.31–0.56 | 72 | 79 | 66 | 66 | 79 |
2. Limitations in activities of daily living (n = 153)(3)
| 0.38 | 0.25–0.52 | 69 | 84 | 64 | 47 | 91 |
3. Limitations in work activities (n = 128)(3)
| 0.19 | 0.08–0.30 | 63 | 92 | 59 | 19 | 99 |
4. Limitations in household activities (n = 135)(3)
| 0.29 | 0.17–0.42 | 66 | 95 | 61 | 30 | 99 |
5. Limitations in leisure activities (n = 138)(3)
| 0.27 | 0.14–0.40 | 65 | 83 | 61 | 31 | 95 |
When the primary questionnaire definition was compared with the three physical examination definitions, the Kappa varied from 0.30 to 0.48 (Table
4). The Kappa was lowest (0.30) when the physical definition was based only on decreased range of motion or muscular strength. The global percent agreement (66%), sensitivity (64%) and specificity (67%) were also somewhat lower with this definition. The global percent agreement tended to be similar for the physical examination definition based solely on pain manifested during maneuvers (74%) compared to the primary definition (72%). In this comparison, the Kappa values also tended to be similar (0.48 vs 0.44). Among cases, the percent agreement decreased with both alternative physical examination definitions compared to the primary definition. The definition based solely on decreased range of motion or muscular strength yielded a value for agreement among cases of 55%. Among non-cases, the percent agreement increased with both alternative definitions (75% and 82% compared to 66% for the primary physical examination definition). The percent agreement was higher among questionnaire cases compared to non-cases with the primary definition and was higher among non-cases for the two alternative definitions.
Table 4
Agreement between the three physical examination definitions and the primary questionnaire definition (n = 187)
1. Primary Definition | 0.44 | 0.31–0.56 | 72 | 79 | 66 | 66 | 79 |
2. Definition based on decreased range of motion or muscular strength | 0.30 | 0.17–0.44 | 66 | 55 | 75 | 64 | 67 |
3. Definition based solely on pain | 0.48 | 0.35–0.60 | 74 | 65 | 82 | 75 | 74 |
Finally, we investigated whether the time elapsed between the administrations of the two tests influenced the agreement. There was an average of 38 days (range: 2 to 187) elapsed between the administrations of the questionnaire and the physical examination. A global agreement of 77% was observed for the shortest interval (21 days or less) and of 66% for the longest interval (more than 21 days apart) (Table
5). The highest Kappa value of the study (k = 0.54) was obtained when the questionnaire and the physical examination were administered 21 days or less apart. The better agreement observed with the shortest period (21 days or less) between the administrations of the two tests was reflected in both cases and non-cases, however, none of the comparisons reached statistical significance because of the limited sample size (p-values were respectively 0.10 for global agreement, 0.30 for agreement among cases and 0.31 for agreement among non-cases). For both periods, the percent agreement was higher among cases compared to non-cases. A higher sensitivity was also observed when the questionnaire and the physical examination were administered within 21 days (sensitivity = 75%) than over 21 days (sensitivity = 56%).
Table 5
Effect of time elapsed between the administrations of the questionnaire and the physical examination(1) (n = 187)
≤ 21 days (n = 92) | 0.54 | 0.37–0.716 | 77 | 83 | 71 | 75 | 80 |
>21 days (n = 95) | 0.33 | 0.15–0.52 | 66 | 74 | 61 | 56 | 78 |
Discussion
In this study of VDU users, the agreement between a self-administered questionnaire on musculoskeletal disorders of the neck-shoulder region and a physical examination of the same region was examined in a sample of university clerical workers. Prevalence figures observed with questionnaire definitions were lower than those obtained from physical examination definitions. Results show an overall Kappa of 0.44 and a global agreement of 72% between the two instruments. The agreement was not substantially improved by the addition of questionnaire criteria related to functional limitations. The agreement diminished when the physical examination definition excluded the manifestation of pain. The percent agreement tended to be higher among cases than among non-cases. Higher agreement was observed with shorter time lapses between the administrations of the tests.
In order to be valid, a measure must first be reliable [
44]. The questionnaire used here was adapted from questionnaires used in previous studies [
9,
28‐
31]. Some items were taken from the Standardized Nordic Questionnaire, which showed an acceptable degree of reliability for the neck-shoulder region [
27,
28]. Furthermore, previous studies suggested that questions related to the presence, duration and intensity of symptoms provide reliable information on musculoskeletal symptoms [
27,
28,
45]. Thus, it is reasonable to consider that the questionnaire used in the present study had an acceptable level of reliability.
Previous studies also provide evidence of construct validity of subjective symptoms reported in questionnaires [
46]. Also, VAS are considered among the best instruments to measure pain [
32]. To reduce the impact of potential error in recall in this study [
44], only symptoms in the last seven days were considered. Furthermore, the fact that the questionnaire prevalence of musculoskeletal disorders in the neck-shoulder region was comparable (17%) to what was observed in previous studies on VDU workers [
15,
21] provides further support for the validity of outcome measures obtained from the questionnaire.
The results of the current study suggest a fair to good agreement between the presence of neck-shoulder disorders ascertained by self-administered questionnaire and physical examination. This finding is in accordance with those obtained in previous studies comparing data from questionnaire with clinical examination to identify cases of neck-shoulder disorders [
16,
19,
23,
24]. These earlier studies have concluded that self-reported neck-shoulder symptoms by questionnaire gave fairly-good to good picture of the neck-shoulders disorders prevalence.
According to previous studies, tests used in physical examination, especially measurement of range of motion and manual muscle testing, have poor to good reliability [
47‐
53]. However, the use of a rigorous standardized protocol, pretested by the examiner at the beginning of the current study, and the fact that only one person examined all the workers favored reliability. In their literature review, Gajdosik and Bohannon (1987) concluded that there was acceptable content validity for the measurement of range of motion [
47]. Nevertheless, the comparisons in the present study might have been compromised at least in part by measurement error which could explain some lack of association with symptoms.
The Kappa statistic provides a measure of agreement that corrects for the agreement that would be expected by chance alone [
54]. Global percent agreement was presented as well. According to suggested classifications [
37,
41], all Kappa values reported in this study are relatively low. However, the Kappa statistic is strongly influenced by the prevalence of the phenomenon under study, which is determined by the observed proportion of individuals who fall in each category of the classification table. For a given observed proportion of individuals, Kappa gets its highest value when the expected proportion of positive individuals is small [
55]. In this study, the expected proportions were high. This may have led to an underestimation of the true agreement beyond chance [
55,
56].
The different questionnaire definitions permitted the assessment of the influence of functional limitations on the agreement. The definition that included limitations in ADL gave similar agreement values when compared to the primary definition. On the other hand, definitions that included limitations in work, household and leisure activities resulted in poorer agreement. The lack of improvement in the agreement observed with the addition of functional limitations criterion may be explained by the fact that the questionnaire definition was already somewhat restrictive (pain reported in the neck-shoulder region for at least three days during the last seven days, with the worst pain intensity greater than 50 millimeters on the 100-millimeter VAS). Under these circumstances, the addition of the ADL limitations may not have contributed more information than the primary definition. Alternatively, the physical examination findings may not correspond closely enough to the domains that limit ADL. Furthermore, limitations measured in a dichotomous format (yes/no items) may not have been sufficiently sensitive in comparison to the more refined ADL limitations question. Finally, low prevalence figures (with more restrictive definitions) lead to lower Kappa values.
The inclusion of criteria related to functional limitations enhanced agreement among cases and reduced agreement among non-cases. Limitations in work and household questionnaire definitions resulted in as much as 92% and 95% agreement among cases respectively. These results suggest that the combined use of physical examination and questionnaire items that include functional limitations is useful when one wants to identify specifically cases that would be confirmed with physical examination. Results showed more workers with limitations in activities of daily living than workers with limitations in work activities. This might suggest that, in order to maintain themselves at work, workers with musculoskeletal disorders reduce their usual daily activities or they may learn to compensate in order to maintain ADL until much later in the disease process. It might also suggest that workers with musculoskeletal problems that manifested at work have already left work, due to the healthy worker effect [
57]. Individuals most likely to show limitations in range of motion or in muscular strength on physical examination and to report limitations in work activities on questionnaire were thus not included in this study.
According to our results, the measure of pain intensity provoked by specific maneuvers during the physical examination offered the best agreement when compared with the self-administered questionnaire. A low agreement was obtained with the physical examination definition based solely on decrease in range of motion or muscular strength. These results are consistent with the hypothesis that musculoskeletal disorders are progressive and that patients may have symptoms before objective physical findings appear [
58]. Also, cases defined by physical examination of range of motion and muscular strength may have been overlooked by the questionnaire; this would be consistent with previous studies that showed a low correlation between pain intensity and extent of tissue damaged [
59,
60].
The definition based on questionnaire may not measure the same concept than the physical examination. While the physical examination measures the integrity and the absolute performance of the structures and tissues, self-reported symptoms are based on actual performance and sensation, much affected by pain perception. This distinction is supported by the large impact that pain has on the agreement. The results of this study suggest that pain intensity is an important feature in the agreement between a questionnaire on musculoskeletal disorders and a physical examination and support the construct validity of a case definition based on symptoms.
The higher prevalence of findings in the physical examination than in questionnaire might be due to the selection criteria used to define non-cases according to the questionnaire. Given that the questionnaire definition was somewhat restrictive, some non-cases were not totally free of symptoms. Indeed, 26 of those 102 workers classified as non-cases according to the primary questionnaire definition had symptoms in the week prior to the questionnaire. This could have lead to a classification bias and could have attenuated the true associations with physical examination.
The time interval elapsed between the administrations of the two tests ranged from two days to six months. Better agreement (k = 0.54) was observed with a smaller time interval (21 days or less). The temporal variability present in musculoskeletal disorder symptoms and the fact that severity of pain in musculoskeletal disorders can vary from day to day depending upon the types of activities the person has engaged in [
45] are inherent difficulties for the measure of agreement between two tests [
46,
61]. The longer interval between the tests might have allowed time for real changes in symptoms and consequently may have contributed to the relatively limited agreement found in this study. These results are consistent with those of Björkstén et al. (1999) who observed that shorter reference period for reporting musculoskeletal problems yielded better agreement between a questionnaire and a physical examination [
19].
Finally, the current study's population consisted mainly of employed clerical women, thus the generalizability of the results is limited to similar populations.
Competing interests
The author(s) declare that they have no competing interests.
Authors' contributions
This work was conducted as part of the Master's thesis of NP, under the supervision of CB and CED. NP, CB, CED, SM and LP contributed to the conception and design of the research, analysis and interpretation of data, as well as to writing the article. Data collection was conducted under the direction of CB, with participation of NP. All authors read and approved the final manuscript.