Background
Neck pain is a common musculoskeletal complaint among adults. Worldwide estimates show that the 12-month prevalence of neck pain among adults ranges between 30% and 50%, depending on the definition of neck pain and the geographic spread of respondents [
1]. At any given time, approximately 12-14% of the adult population reports having neck pain [
1] and neck pain is now the second most common musculoskeletal disorder [
2,
3]. Likewise, neck pain often causes impairment, work disability and contributes to increased sickness absence [
4,
5] – thus millions of dollars are spent annually on treatment, compensation and lost earnings [
6], and neck pain is a contributory cause of reduced health-related quality of life [
7,
8]. Neck pain has been associated with impaired performance of muscles in the cervical spine [
9‐
13], as well as reduced proprioception and changes in the cervical motion patterns [
14‐
17]. For this reason, treatment often includes exercise therapy aimed at restoring these neuromuscular deficits [
18‐
23].
In order to assess any neuromuscular deficits present, it is of clinical importance to use reliable and valid assessment tools. Several performance tests have been developed with the aim of quantifying different aspects of muscle performance [
24‐
33]. The present study focuses specifically on five muscle performance tests, which are often used in clinical practice.
The Cranio-Cervical Flexion Test (CCFT) is a clinical assessment test of the deep cervical flexor muscle function [
28,
30]. It targets activation and endurance of the deep cervical flexors in progressive inner range positions. The individual is placed in supine crook lying with the head in a neutral starting position, followed by an active head nodding action (cranio-cervical flexion) during which the patient tries to sequentially target five progressive stages (measured as an increased downward pressure of 22, 24, 26, 28 and 30 mmHg) [
29,
30]. The reliability of the CCFT has previously been assessed and it has shown promising psychometric properties [
29,
34‐
37]. Intraclass Correlation Coefficient (ICC) values have revealed
substantial to
almost perfect intra-rater reliability for the CCFT, with ICC values ranging from 0.78 to 0.98 (95% Confidence Interval (CI) ratings between 0.47-0.99) [
24,
29,
35‐
37]. In addition,
moderate to
almost perfect inter-rater reliability has been reported, with ICC values from 0.57 to 0.91 (95% CI ratings between 0.37-0.96) [
24,
34,
36].
Grimmer et al. [
26] described a muscle performance test targeting neck flexor muscle endurance [
26]. The test is performed with the subject in a supine crook lying position and measures the subject’s ability to maintain a cranio-cervical flexion (chin tuck), while performing an active head lift [
26]. The maximal holding time is recorded in seconds. The recording is stopped when head movement, indicating fatigue occurs (i.e., inability to maintain upper cervical flexion, increase in neck flexion or lowering of the head). Reliability studies conducted on this muscle endurance test, as well as on several modified versions, have found
substantial to
almost perfect intra-rater reliability (ICC values from 0.71 to 0.96) [
25‐
27,
38‐
41]. Likewise,
moderate to
almost perfect inter-rater reliability has been reported (ICC values from 0.54 to 1.0) [
27,
39,
40,
42‐
44]. As patients with neck pain are often unable to perform the supine crook lying version, due to neck pain or reduced muscle strength, a modified version of the Neck Flexor Muscle Endurance (NFME) test is frequently used in clinical practice. The modified NFME test is performed in the same manner as the supine version [
26,
27] apart from the individual sitting in a 45°-upright position, which decreases the load on the neck. Nevertheless, little is known about the psychometric properties of the modified version.
Cervical Joint Position Error (JPE), measured as the ability to relocate the head to a starting position following active cervical range of motion, has been examined in patients with neck pain using several different measurement methods [
16,
32,
33,
45‐
48]. The test measures alterations in kinaesthetic awareness expressed as e.g. errors in head and neck repositioning. Studies using movement analysis devices, such as an ultrasound-based measuring device (Zebris) or electromagnetic tracking devices (3-Space Fastrak), have reported
substantial to
almost perfect intra- and inter-session reliability (ICC values from 0.61 to 0.84) [
47,
49‐
51], while others have failed to do so (ICC values from −0.01 to 0.51) [
49,
50,
52,
53]. Based on the results from e.g. Revel et al. [
32] and Heikkilä et al. [
45] it has been suggested that clinicians can use simple equipment such as a paper target and a head-mounted laser pointer to assess a subject’s ability to relocate the head to a neutral position following active cervical range of motion [
54]. However, the reliability of such clinical performance tests is still unknown.
Over the last decade there has been an increased interest in muscle performance of the cervical flexors in patients with neck pain [
12,
21,
30,
55]. Muscle performance tests have focused predominantly on the cervical flexor muscles and only a limited number of tests targeting the posterior neck muscles exist [
25,
56]. However, recent research indicates that significant changes also occur in the posterior neck muscles [
57‐
60], and there is a clinical need for the development of muscle performance tests targeting the posterior neck muscles. Drawing on the existing literature and the clinical practice we developed a new dynamic muscle performance test, which targets neck extensor muscle’ endurance.
When conducting reliability studies, great effort goes into standardising test procedures in order to reduce sources of variation and facilitate a stable outcome. One way to reduce test variation is by increasing the number of tests and using the average to calculate i.e. ICC values. Studies of muscle performance tests used for patients with neck pain have shown that an increased number of test trials (minimum of five trials) increases the test’s reliability (i.e., increased ICC values and decreased Limits Of Agreement (LOA)) [
50,
51] by reducing measurement error [
61]. However, when muscle performance tests are applied in clinical practice, clinicians often only conduct a muscle performance test once or twice, partly due to time constrains and partly due to avoiding pain or fatigue in the tested muscles, which may affect test reliability (cf. increased measurement error).
Therefore, we aimed to investigate whether muscle performance tests, which have shown promising psychometric properties, remain reliable when examined under conditions similar to those of daily clinical practice in physiotherapy. Likewise, we aimed to target some of the areas where limited evidence exists. In order to standardise test procedures, we used inexpensive, simple equipment, which easily can be applied in a clinical setting and which previously has been found useful in tests of lumbar motor control [
62].
The aim of this study was to determine the clinical reliability of five muscle performance tests in patients with and without neck pain.
Discussion
This study was conducted in accordance with the COSMIN checklist and investigates the reliability of muscle performance tests using cost- and time-effective methods similar to those used in daily clinical practice in physiotherapy. Generally, across all tests the study showed large variability with intra- and inter-rater reliability ranging from moderate to almost perfect agreement with the exception of the NET, which ranged from slight to moderate agreement. In addressing why such significant variability was observed, several methodological issues and study limitations need to be considered.
Joint position sense
Firstly, for head repositioning, the number of trials performed for each movement direction has been reported to affect the estimation of precision and accuracy, with an increasing test stability (i.e., higher ICC values) attained when a larger number of trials are performed (five trials or more) [
50,
51]. However, our results indicate that inter- and intra-rater reliability of neck rotation did not differ significantly from neck flexion or neck extension, despite the fact that calculations of ICC values for neck rotation were based on six trials (left and right), while ICC values for neck extension and neck flexion were only based on three trials each. A direct comparison to earlier studies should, however, be made with caution, since the methods of measurement are not directly comparable [
50,
51]. Secondly, age has been reported as one factor that can affect an individual’s ability to accurately reposition their head to a neutral position [
71]. In the present study the patients are significantly older than the healthy subjects, which could have increased a difference in results. In spite of this the majority of our findings indicate that there are no significant differences between patients with neck pain and healthy subjects. Thirdly, a tendency to overshoot the target position has been found in patients with neck pain [
32,
45,
71]. Unfortunately, data collection in the present study does not allow for investigation of a consistently over- or undershooting as part of the observed outcome variability. Fourthly, Treleaven et al. reported significantly larger errors in neck extension and rotation (to the right) in patients with whiplash when compared with controls [
48]. However, our findings do not show a similar pattern. Only data from examiner B show significant differences between patients with neck pain and healthy subjects. Likewise, the differences observed for neck extension were only present at the second assessment session, not at the first assessment session. Possible explanations for these inconsistent findings include inadequate sample size and measurement error, since our study was not designed to detect differences between groups. Even though significant differences were found, the mean differences are all smaller than the tests’ measurement errors (Tables
2–
3), which indicate that the differences observed may not be evidence of a true difference, but rather can be explained as measurement error. Therefore, our results should be interpreted with caution.
The cranio-cervical flexion test
For the CCFT, our findings demonstrated
substantial to
almost perfect intra-rater reliability and
almost perfect inter-rater reliability. These findings are consistent with the existing literature [
29,
34,
36,
37]. However, there is a tendency for higher ICC values to be reported with an increased number of trials performed [
34,
36,
37]. When performing the CCFT, progressive nodding action increased the pressure from the baseline of 20 mmHg to 22, 24, 26, 28 and 30 mmHg. Despite the fact that the CCFT was found to be fairly reliable, the LOA and SDC were substantial (ranking between 4.11 and 5.11 mmHg). As a result, a change in score has to be at least 5 mmHg to be interpreted as a real change [
61,
72]. As previously reported [
12,
28,
29,
35], patients with neck pain demonstrated a reduced ability to activate the deep neck flexors, when compared with healthy subjects (Tables
4–
5).
Muscle endurance tests
The NFME test (supine version) has previously been found reliable [
25‐
27,
38‐
42]. Similarly, we found this test to have
substantial inter- and intra-rater reliability. However, broad LOAs were determined for inter- and intra-rater reliability, indicating limited agreement between the examiners. SEM also revealed large measurement errors, with an estimation of 40 sec, estimated as the minimum detectable change. Edmondston et al. reported
almost perfect intra-rater reliability with a minimum change of 17.8 sec representing a true change [
25]. The mean holding time reported (≈50 sec) was almost twice the holding time reported in the current study (Table
6). However, their patient population was somewhat younger (mean age: 36 ±11) than the current patient population, which could explain the differences in holding time [
73]. Previous studies have reported reduced holding time (i.e., reduced isometric neck flexor muscle endurance) in patients with neck pain, when compared with a healthy population (measured with the neck flexor muscle endurance test) [
27,
44]. All three muscle performance tests indicated a tendency towards shorter holding time in patients compared with healthy subjects, although the differences were not statistically significant (Tables
4‐
5). Due to the fact that patients with neck pain often are unable to perform the supine version of the NFME test, a modified version is often applied in clinical practice. The modified upright sitting version decreases the load on the neck, which for patients enables performance. By and large, our results imply that this modified version is not as reliable as the original supine version (Tables
2‐
3). The SDC for the sitting version was above 97 sec (Table
2), which is longer than the actual holding time observed for both healthy subjects and patients with neck pain, implying that changes in scores should be interpreted with caution. Possible confounding factors include the presence or increase of neck pain and the number of trials performed. Olson et al. [
40] and Grimmer et al. [
26] reported a systematic improvement in performance from a first to a second test [
26,
40] even through the tests were performed so close in time that no significant increase in muscle strength was expected. Such a learning curve could have affected the NFME test, increasing the variability of the test results. However, no statistically significant differences were found between the first and the second test indicating a learning curve did in fact not take place (Table
6).
The neck extensor test
Despite the use of a standardised protocol, the overall level of reliability for the NET was poor, suggesting that this test is too unstable to be used to evaluate neck extensor muscle endurance. Several factors may have contributed to the discrepant findings. Firstly, some of the patients experienced increased pain during the muscle endurance performance tests and neck pain has in patients been shown to affect muscle performance [
74,
75]. Secondly, the order of the five muscle performance tests was random. Muscle fatigue has been found to influence muscle performance in patients with neck pain [
76,
77]. Theoretically, if the NET was performed last, muscle fatigue might have affected the outcome in both patients with neck pain and healthy subjects. However, post hoc analysis showed no statistically significant differences between the first and the second assessment performed on the same day (Table
6), which indicates that muscle fatigue did not influence the test results. Thirdly, even though great effort was invested into standardising the test protocol, it cannot be ruled out that discrepancy between test procedures could have affected the results.
Test procedures
Test procedures for the CCFT, the NFME tests and the NET entailed each test only being performed once. This was done to replicate a clinical setting, where limited consultation time and the patient’s pain condition often confines the amount of test trials performed. In order to facilitate standardised test procedures that could be implemented in a clinic, we used inexpensive, easily accessible equipment, which allowed us, for example, to establish easily detectable cut off points at which muscle fatigue occurred and thereby reduce measurement error. Nevertheless, significant diversity was observed across the four muscle performance tests.
Study strengths and limitations
The order of the examiner was random. This was done in order to avoid introducing measurement bias. However, some of the muscle performance tests aimed at measuring muscle endurance, which could have initiated muscle fatigue. If so, muscle fatigue would have occurred after performing the first set of muscle performance tests. This could theoretically have affected the outcome of the second set of muscle performance tests. Nevertheless, no statistically significant differences were found between the first and the second assessment for any of the muscle endurance tests (Table
6), which indicate that this was in fact not the case.
Despite a sufficient sample size (>50 participants) we found very broad 95% confidence intervals, which points to an inadequate sample size. A post hoc analysis was conducted to compare the results from patients with neck pain and healthy subjects. This was done in order to explore whether lack of variability among healthy subjects partly could explain our present findings. Furthermore, a difference between patients with neck pain and healthy subjects could point to relevant test candidates for future studies of specificity. However, due to the small sample size in the present study caution should be made when interpreting the results.
Inter-rater reliability reflects within-day comparison of the results. This may not mimic clinical practice as muscle endurance tests are often repeated after several days. Assessment of the between-day inter-rater reliability is likely to result in greater differences. Likewise, the use of recently certified physiotherapists may have contributed to the variation. More experienced clinicians might have achieved more reliable results, since the level of clinical skills needed to conduct the muscle performance tests are somewhat high. On the other hand recently certified physiotherapists may tend to follow the written protocol of procedures more strictly as they have no empirical routine to rely on. However, in both cases the findings presented in the present study are only related to test procedures performed in a similar manner. The present study replicated a clinical setting, with a broad range of therapists, including a large group with limited experience. An assessment tool has only limited clinical value if it takes years of practice to be able to reproduce stable results.
Competing interests
The authors declare that they have no financial affiliation (including research funding) or involvement with any commercial organization that has a direct financial interest in any matter included in this manuscript. The authors declare that they have no conflict of interests.
Authors’ contributions
TJ was involved in the planning of the study design, data acquisition, the data analysis and writing the paper. HL, FE and KS contributed to the analysis and interpretation of the data as well as study conception and design. All authors were involved in drafting the article or revising it critically for important intellectual content and all authors approved the final version of the manuscript.