Background
Intensive medical treatment regimens can significantly improve survival in patients with haematological malignancies [
1‐
3]. The cancer therapy itself, including chemotherapy or radiotherapy, damages healthy cells throughout the body, resulting in side-effects including nausea, emesis, decreased nutritional intake and anaemia. Higher fatigue levels that are associated with decreased levels of activity and lengthened bed rest contribute to muscular catabolism and atrophy [
4]. As a result, functional limitations and muscle weakness may persist even well beyond the period of active treatment [
5‐
7].
Patients with haematological malignancies may benefit from physical exercise programs in terms of maintenance or even improvement in physical activity levels [
7], fitness levels [
8,
9], and muscular strength [
5,
10,
11]. Assessment of muscle strength is an important part of the management of cancer patients, particularly in determining the response to a muscular strength training program [
12‐
14]. It is thus important to be able to accurately quantify the muscle strength of patients who are recovering from intensive medical treatment.
Muscular strength can be assessed both in research settings and in clinical practice settings by means of isokinetic and hand-held dynamometers (HHD). One of the advantages of using isokinetic dynamometers in patients with chronic diseases is the ability to assess muscle strength dynamically through a range of movements at various velocities, which may more accurately reflect functional performance [
15,
16]. However, isokinetic strength testing protocols may be too time consuming in typical clinical settings, and the size of the equipment can also be problematic (i.e., lack of portability). Clinically, HHD represents a simple, portable and relatively inexpensive alternative to isokinetic machines for assessing muscle strength [
16]. Moreover, hand-held dynamometers provide quantification of muscle strength, and are more sensitive to change in muscle strength than simple manual muscle tests [
16,
17].
Evidence of the validity of HHD has been provided in several studies, including a comparison of HHD with isokinetic strength measurements to assess lower limb strength in the elderly (r = 0.91) [
18], a comparison of HHD and manual muscle testing (r = 0.77) [
17], and of HHD and the Timed-Up-and-Go-test (r = 0.64 to -0.94). [
19] Nollet et al. also provided evidence for the validity of a HHD in lower strength ranges in patients with post-polio syndrome [
20].
To be clinically meaningful, however, the muscle strength assessment procedure must be reliable enough to evaluate outcomes of a therapeutic intervention [
21]. Reliability can be reported in
relative or
absolute terms [
21]. Relative reliability statistics indicate the degree of association between 2 or more measures (e.g., intraclass correlation coefficients or ICCs), [
22] but they do not provide clinical guidance for assessing real changes at an individual patient level [
23,
24]. The relative reliability of hand-held dynamometers for knee extension has been examined in numerous populations. ICCs' of 0.75 or higher have been reported in studies of healthy young and elderly adults [
25,
26], community-dwelling elderly fallers [
19,
27], people with acquired brain injury [
28], elderly after hip fracture and elective hip and knee arthroplasty [
29,
30], adults with cerebral palsy [
31], and patients with chronic obstructive pulmonary disease (COPD) [
16].
Absolute reliability reflects the magnitude of the differences between two measures [
32]. Examples of these statistics are the standard error of measurement (SEM), the corresponding 95% confidence interval, the smallest detectable difference (SDD), and the limits of agreement (LA). To be clinically useful, an assessment with an HHD must have only a small amount of measurement error in detecting real change over time.[
33] A retest difference in a patient with a value smaller than the SEM is likely to be the result of 'measurement noise' and is unlikely to be detected reliably in practice; a difference greater than the SDD is likely to be a real difference with 95% certainty [
21]. The absolute reliability of HHD has been reported by several authors [
16,
26,
27,
31,
33,
34]. However, measures of reliability are specific to the populations and testing procedures used. This implies that the findings of previous studies may not be applicable to patients with haematological malignancies. Disease- and treatment-related symptoms, including de-conditioning, muscle weakness, and fatigue may affect not only the reliability, but also the safety of performing HHD [
16,
22]. Therefore, the investigation of the measurement error of an HHD in patients with haematological malignancies is warranted.
In daily physiotherapy or rehabilitation practice, strength measurements for the same patient are often performed by several examiners. However, the measurement error associated with the assessment of strength by one observer (intra-observer reliability) may be different than that associated with the assessment of strength by several observers (inter-observer reliability)[
35]. For this reason, it is important to determine both the intra- and inter-observer reliability of the measurements obtained with a HHD. This study aimed to determine the relative and absolute reliability (measurement error) of intra and inter-observer strength measurements with a HHD in a sample of patients with haematological malignancies.
Discussion
This study evaluated the relative and absolute reliability of a strength assessment protocol using an HHD among a sample of haematological cancer patients recovering from high-dose treatment. We used the ICC (with accompanying 95%CI) to estimate relative reliability. Relative reliability is highly dependent on the variability observed in the patient sample, and relates to the ability to classify patients' strength measurements in the same rank. Thus, relative reliability is most relevant for assessing instruments that are to be used for discriminative purposes [
23]. Guyatt et al.[
24] demonstrated that discriminative instruments require a high level of relative reliability. That is, the measurement error should be small in comparison to the variability between the observers. In other words, if the difference between the observers is large, a certain amount of measurement error is acceptable [
23,
24].
However, if the aim is to measure change in health status, which is often the case in clinical practice, absolute reliability is more relevant [
23,
24]. Absolute reliability describes the agreement between repeated measurements and is concerned with measurement error [
23,
24]. For an evaluative instrument, it is not the variability between the observers that is of primary concern, but rather measurement error [
23,
24]. The measurement error should be smaller than the changes that the observer wishes to detect [
23,
58]. We calculated the SEM, the SDD and the limits of agreement to estimate absolute reliability.
To be of practical use, the results should be interpreted as follows: the intra-observation of the average of 3 MVPT 'knee strength' assessments provided acceptable relative reliability (ICC
3.3 = 0.94). The reliability of this parameter is affected by the variance statistic of the assessments from 'intra session 1', which was 644.65 Nm (calculated as the square of the standard deviation [25.39 Nm]), and the assessment from intra session 2', which was 847.39 Nm (sd 29.11 Nm) (see the distribution from Bland and Altman in Figure
2). When taking the measurement error into account, an SDD equal to or greater than 17.23 Nm between two measurements should be used as the threshold for a true clinical change in knee extension. The results of the other examination models in this study: the Inter-observer reliability for the average of 3 MVPT measurements (ICC
2.3), the intra-observer reliability for the highest value of 3 MVPT measurements (ICC
3.1), and the inter-observer reliability for the highest value of 3 MVPT measurements (ICC
2.1), should be interpreted in the same way (see Tables
2 and
3, and Figures
3,
4,
5). Thus, when evaluating knee strength measurements (e.g. after a muscle strength program), it is recommended to use the 3-repetition average strength measurement by one or more examiners.
We performed intra- and inter-observer re-test measurements on the same day. However, no learning effect was found in the present study between the first and the third strength measurement. This is probably due to the familiarization session [
21]. Although the highest value is probably a more valid measurement for assessing muscle strength [
59] (even though it is less reliable), the average of three MVPT strength measurements can be used in determining whether a result is a real change or is within the range of measurement error.
The protocol used for assessing isometric knee strength in this study had acceptable re-test reliability, as evidenced by ICCs equal to or greater than 0.75. The ICCs in the current study are similar to test-retest reliability coefficients reported in other, related studies [
16,
19,
25‐
31].
The measurement error of HHD for knee extension strength in haematological patients can be compared to that observed in other studies. In a study in orthopaedic knee patients, the intra-observer assessment of the SDD was 21.5 Nm for the single value, and 13.8 Nm for the average value. For inter-observer assessment, the SDD was 28.2 Nm for the single value and 18.7 Nm for the average value [
33]. However, one should keep in mind that the authors used the 'make' method to assess knee extension strength.
To compare our absolute reliability results for knee extension strength with those observed in COPD patients [
16], we estimated the SEM from their results. The SEM was estimated from the ICC and the total variance, using the formula SEM = Sd × (√1-ICC) [
48]. A SDD (= SEM × 1.96 × √2) of approximately 49 Nm from knee extension was calculated from their study results (ICC .87, Sd 14.5 Nm, strength value originally expressed in Kg, converted to Nm and corrected to an average lever arm of 34 cm, which was the average 80% shank length of the included and excluded participants in our study, (see table
2) [
16]. An important difference from our measurement protocol was that the measurements in this study were performed with a knee angle in 90 degrees of flexion.
From the study of Taylor et al. [
31] among patients with cerebral palsy, we were able to calculate a SDD of approximately 43 Nm (ICC .81, Sd 10.7, strength value originally expressed in Kg, converted to N and corrected to an average lever arm of 34 cm for Nm).
Excellent SEMs in knee arthroplasty patients were described by Gagnon et al. The average SEM from 3 trials was 1.84 Nm (SDD 5.10 Nm) [
34]. However, in this latter study, a chair-fixed device was used, and therefore was not fully comparable with the results of hand held dynamometry. In contrast to chair fixed dynamometry, the reliability of strength measurements in HHD is influenced by the experience of the examiners, the amount of strength that examiners are able to resist, and the standardization of measurements [
33].
Currently, there is no criterion for the SDD of hand held dynamometry. Therefore, the SDD in knee extension strength was compared to studies that obtained quadriceps strength measures after a resistive strength exercise program. A relatively small improvement of 18 Nm (95%CI 7–30 Nm, GM 144, Sd 45 Nm) was found in patients with COPD [
60]. Conversely, we estimated a mean change of 29.92 Nm (CI95% 24 Nm to 35 Nm) from the results of a study of breast cancer patients [
12]. Although muscle strength in this study was assessed with an eight repetition maximum, which is not fully comparable to HHD, the findings indicated that cancer patients may benefit from muscle strength training during chemotherapy treatment. Taken together, if obtained by the same observer, the SDD threshold of 17 Nm (see table
2) that corresponds to the average of 3 MVPT strength measurements, will probably be surpassed.
For the average inter-examiner MVPT measurements with the HHD, it is questionable if the threshold of 26 Nm (see table
2) will be surpassed in all haematological patients after a strength resistive training program. However, this is probably the case only in patients who recover steadily from the side effects of the medical treatment, and who are good responders to resistive strength training.
Several limitations of the current study should be mentioned. First, the resultant moment at the knee joint and the moment by the dynamometer are different. When measuring isometric strength, one should keep in mind that the differences between the measured and the resultant joint moments might influence the estimation of muscle torque parameters. Although the test protocol can be standardized to a reasonable degree, the deformation at the soft tissue of the leg, especially at the thigh, where the muscle mass is considerable, plays an important role in changing the alignment of the HHD axis of rotation, and the axis of the knee joint [
61]. Therefore, future studies need to examine the 'real' joint angles of hand-held dynamometry measurements.
Second, the measurements in this study were performed by female examiners without prior experience in muscle strength assessment with HHD. This may have influenced the upper boundary of the muscle strength assessments. Knee extension strength measurements performed by stronger examiners with experience in hand-held dynamometry may result in measurement values that are higher than 218 Nm. Moreover, the use of an isokinetic dynamometer has been recommended if the muscle strength of the patients exceeds the strength of the examiners [
21]. In several studies, isokinetic dynamometers yielded reproducible measurements with low measurement error [
21,
61‐
63]. However, isokinetic dynamometers also have several disadvantages. They require a good deal of space, and are costly, hampering their widespread use in clinical settings. The reliability of a HHD measurement may depend on the strength and the body mass of the examiner. The female examiners in this study were of varying weight. Examiner 2 achieved the highest (mean) MVPT measurements.
Third, the point in time at which the assessments took place varied considerably (see Table
1), and therefore some patients may have had the possibility to recover more from the side-effects of high-dose chemotherapy than others. This may have influenced the inter-subject variability, which in turn increases relative reliability (ICCs). However, this inter-subject variability does not effect absolute reliability (SEM, SDD). It is also possible that the patients in our study were healthier than other haematological cancer patients at the same stage of recovery. The primary reason that 12 patients did not participate was that they felt too fatigued or too weak to do so.
Fourth: although we could not detect a learning effect between the MVPT measurements, one should keep in mind that the results of this reliability study are based on an intra-day reliability assessment. A more complete picture of the reliability would require a between-day reliability study to allow the corresponding variations to affect (or not) the measures. Learning effects for strength measurements can potentially be of more concern for between-day than for within-day measurements [
64,
65]. In addition, if truly maximal exertions of muscle strength are desired, visual feedback should be employed during the measurements [
66]. A factor that may also influence the reliability of strength measurements is the circadian rhythm. A time-of-day effect for leg and back strength measurements was reported in one study in which maximum strength values increased consistently during daytime [
67]. Gauthier et al. [
68] reported similar findings for elbow flexion torque and body temperature, which varied concomitantly during the day. One should keep in mind that circadian rhythm disruption is hypothesized as a mechanism underlying fatigue in cancer patients [
69]. Fatigue is one of the most prevalent symptoms that cancer patients experience and it has a considerable effect on physical performance [
70]. Therefore, fatigue may also influence the reliability of the measurements in cancer patients.
Fifth, at the end-phase of the training period, the upper limit for the examiners torque was fixed at 218 Nm, because the weakest examiner was able to break through the knee extension movement of the 3 pre-test patients at 218 Nm, but not higher. Thus, only haematological cancer patients with knee extension measurements lower than this value were included in the analysis.
Finally, this study had a relatively small sample size. Although the sample size was adequate for studies of this nature [
71], a larger study might narrow the confidence intervals around the reliability coefficients (without necessarily affecting the reliability estimates themselves).
Clinical implications for the use of a HHD in patients with haematological malignancies
In this reliability study both participating assessors were students of the Institute of Human Movement Sciences and Sport. They underwent training sessions to learn the requisite manual muscle testing skills during 8 sessions of 1.5 hours each. The data for the average intra-examiner MVPT measurements in 24 patients with hematological malignancies yielded acceptable results for relative (ICC 0.94) and absolute reliability (SDD 17 Nm).
The conflicting finding on inter-examiner reliability, where the experience of the assessing examiners seemingly plays an important role, has important clinical implications. If more than one examiner is to evaluate the muscle strength of a haematological patient, then it is important that all examiners concerned apply the tests reliably and consistently. If this can not be achieved, then the resulting data will be of little use in a clinical setting. Clinicians specialized in the treatment of chronic diseases, and with comparable levels of practical experience with an HHD can, however, use the average MVPT value for intra-examiner measurements in their everyday practice with confidence. The HHD may be used in patients with haematological malignancies who have recovered from the direct side-effects of their medical treatment and who are in a stable physical condition to: 1) compare muscle strength with normative reference values (e.g. for discriminative purpose); or 2) evaluate the effect of a resistive exercise training in an individual patient (e.g. measure change in health status over time).
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
RHK is the guarantor of the study. He designed the study and was the main writer of the manuscript. GA and EDB designed and wrote the study, and critically revised the study for its content. DU initiated and monitored the study. NKA supervised and critically revised the study for its content. All authors read and approved the final manuscript.