This study assessed the intra and inter-examiner reliability of standing posture with a new computerized postural digitizer, PosturePrint
®, using three examiners, who evaluated forty subjects each on two different occasions. It had been hypothesized that the PosturePrint
® would be a reliable method to evaluate head, rib cage, and pelvic posture as three rotations and two translations or five degrees of freedom (DoF). In fact for 11 out of 15 variables (a total of 44 Intra- and Inter-examiner ICCs), 14 (32%) were in the good range (0.50 < ICC < 0.75) and 30 (68%) were in the excellent range (ICC > 0.75) [
22]. For the four postural variables, for which ICCs were inappropriate, small SEMs (1.3° or less for these axial and lateral flexion rotations) indicate excellent reliability. Additionally for all variables, small SEMs and small mean absolute errors (two types) indicate close examiner agreement. Thus, the data indicate that the PosturePrint
® is rated good to excellent for reliability of measuring standing posture.
Study limitations
One possible limitation of this study might be the fact that our participant population represented a relatively asymptomatic population with an average NRS of 1.1 ± 1.7. However, postural analysis has been shown to be repeatable in a variety of pain populations as well as asymptomatic groups [
19]. Some evidence in recent acute whiplash injured subjects suggested that head position sense is not repeatable [
25], but certain measures (forward head posture) in this group have been found to be reliable [
9].
Sources of error in the PosturePrint
® systems' analysis of posture included: possible variation in upright stance from day to day, inherent errors due to placing markers from palpation of boney landmarks [
26], errors involved in the choosing of sixteen points on the photographs via the computer mouse by each examiner, and errors in positioning the participants in the same manner relative to the reference wall grid and camera [
27]. However, the high ICCs, small SEMs, and low mean absolute differences between and within examiners' measurements indicate that these sources of error were kept at a minimum.
Another limitation might be the choice of the ICC method used [
13,
20,
28,
29]. The definition of the ICC method depended on the assumptions of (a) whether each of examiners, time, and participants was a fixed or random factor and (b) the type of error included (true score variance, systematic and/or random) [
28]. In the equation for calculating ICCs, this changed the denominator [
20]. For the conservative method, it was assumed that measurement was crossed with examiner and participant, and examiner, participant, and occasion were all random factors. This enlarged the denominator in the definitions of the conservative ICCs, making ICCs smaller. Additionally, the magnitude of an ICC depended on the between-participants variability [
20]. By providing both a liberal and conservative methods and SEMs for each method, we have reduced any limitations due to choice of an ICC method.
Depending on the ICC type of equations used, between 30 and 60 participants would be necessary for a conclusion of reliability to be made [
21,
30]. Estimations from Eliasziw et al. [
29] suggest that for 0.9 reliability and two repeated measurements, 40 participants were more than adequate for a 5% significance with 80% power. Because of this, the current investigators used 40 participants with three examiners assessing each participant twice with a one day interval between measurements.
According to Weir [
20], "there are six common versions of the ICC (and four others as well), and the choice of which version to use is not intuitively obvious." Additionally, there are 10 ICC versions presented by McGraw and Wong [
28]. This is the reason why we decided to report two types of ICCs to be calculated for each of fifteen variables, a more liberal method and a more conservative method.
The two sets of ICCs were calculated under slightly different tenable model assumptions. For the conservative type, measurement was crossed with examiner and participant, and examiner was a random factor [
21]. Results from this type of ICC (a generalization of ICC
2,1) can be generalized to subject and examiner populations [
20].
The liberal ICC method assumed that the three factors (examiner, participant, occasion) were fixed and used a two-way repeated-measures ANOVA model. Two-way ICC models (this liberal ICC type is ICC
3,1) required occasions or examiners to be crossed with participants (i.e., each examiner evaluated all participants on each occasion in the present study) [
20]. Use of this ICC type restricted how the results can be generalized. However, it can be used to identify the limits and pitfalls of postural analysis (e.g.: marker placement).
Therefore, the denominator in the equation to compute the liberal ICCs were the sum of two terms, while the denominators of the conservative ICCs were the sum of three terms, which makes the conservative ICCs smaller than the liberal ICCs.
Previous studies
A few studies have investigated the repeatability of postural measures using computer assisted devices [
10‐
16]. Some studies did not report reliability in terms of ICCs [
10,
15]. However, we noted that some studies have reported small ICCs and claimed poor posture reliability [
12‐
15], when in fact, their data suggested that ICCs were inappropriate for certain variables. According to Weir [
20], there were at least two instances when ICCs are not informative: (a) when multiple repeated measured values occur in the data and (b) when data is homogenous. ICCs, of any type, should not be used on measurements that are mostly one value because this violated a basic ANOVA assumption that the data were approximately normally distributed. This meant that the data must be spread out over a continuum, with concentration in the middle and symmetry about the middle. If there was a normal distribution, but the distribution had a very small standard deviation, then Weir stated [
20], "if subjects differ little from each other, ICC values are small even if trial-to-trial variability is small." Weir's ideas may apply to two recent studies by Dunk et al [
13,
14].
Dunk et al. [
13] performed a reliability study of a photographic technique and consequent digitization of reflective landmarks with 14 participants and reported poor to moderate ICCs for posture reliability. After a letter to the editor [
27] critical of their 2004 study [
13], in a follow up study [
14], Dunk et al. assessed the intra-examiner reliability with more (20) healthy participants. Dunk et al. concluded that their sagittal plane measures were more reliable than coronal plane measures, but their sagittal plane angles of spinal curvature had mean error of approximately 6° while their coronal plane bending had mean error less than 2° [
14]. Because Dunk et al. [
14] had an error of 6° with high ICCs for the sagittal plane, but a very small error of 2° with low ICCs for the coronal plane, it may be that either multiple values occurred in Dunk et al's data or their participants were quite similar. Thus, their conclusions of poor reliability for coronal plane bending may be incorrect.
Using an electromagnetic device, Swinkles and Dolan examined the ability of healthy individuals to reposition their thoraco-lumbar regions in both sagittal and coronal planes (two DoF) [
11,
12]. Intra-day and inter-day repeated measures were found to be 5° or less for sagittal displacements and 2.5° or less for coronal displacements. Although, Swinkles and Dolan [
11,
12] found some ICCs to be in the poor range, they commented that several of their displacement values were very small and approached the limit of accuracy of their measurement device. Here, the use of ICCs on these variables was inappropriate as explained above (see Weir [
20]). Consequently, considering the small repositioning errors, they concluded that "healthy volunteers were able to reposition their spine with considerable accuracy as measured with the 3-Space Fastrak" [
11].
In another reliability study of posture, using an ultrasonic digitizer (Zebris) method of cervical range of motion measurements, Strimpakos et al. stated that their method employed for measuring cervical joint position sense was unreliable [
16].
Posture reliability design suggestions
According to the above review, there were a variety of methodological concerns with reported reliability studies in the literature. For example, many investigations utilized only one examiner and it was possible that this examiner could have made gross mistakes from one examination to the next, causing poor intra-examiner reliability. Statistically, therefore, multiple examiners were needed to average any artificially low or high intra-examiner data, which would provide a more reasonable mean. It has been suggested that a minimum of three examiners each performing an analysis at least twice was needed for any conclusions to be drawn about inter- and intra-examiner reliability [
30]. In the current investigation of the PosturePrint
® system, we have followed this recommendation.
Lastly, depending upon the mean value and distribution of the specific postural displacement recorded, ICCs may be inappropriate as they cannot give a clinically relevant picture of the true error. Because of this, in the current investigation, we analyzed the Standard Error of Measurement (SEM) and mean absolute differences within (MADOM) and between (MADBO) examiners' measurements for each postural degree of freedom. The SEMs were small (2.7° or less for all rotations and 5.9 mm (≈ 1/4 inch) or less for all translations). The MADOM values were found to be 4 mm or less for lateral translations and 7.1 mm or less for forward translations. The MADOM values were 3.2° or less for flexion-extension rotational measurements and 1.4° or less for all axial rotations and lateral bending rotations. The MADBO values were found to be 6 mm or less for lateral translations and 8.4 mm or less for forward translations. The MADBO values were 3.5° or less for flexion-extension rotational measurements and 1.9° or less for all axial rotations and lateral bending rotations.
Since the PosturePrint® system has adequate reliability, there were several possible future studies. A study on healthy subjects could provide a normative database. Studies on patients could provide any differences from normal. Correlations between different postures and health conditions are possible, and pre- and post-treatment clinical trials with various technique methods are also possible studies.