Previous reviews of the diagnostic performances of physical tests of the hip in orthopedics have drawn limited conclusions because of the low to moderate quality of primary studies published in the literature. This systematic review aims to build on these reviews by assessing a broad range of hip pathologies, and employing a more selective approach to the inclusion of studies in order to accurately gauge diagnostic performance for the purposes of making recommendations for clinical practice and future research. It specifically identifies tests which demonstrate strong and moderate diagnostic performance.
Methods
A systematic search of Medline, Embase, Embase Classic and CINAHL was conducted to identify studies of hip tests. Our selection criteria included an analysis of internal and external validity. We reported diagnostic performance in terms of sensitivity, specificity, predictive values and likelihood ratios. Likelihood ratios were used to identify tests with strong and moderate diagnostic utility.
Results
Only a small proportion of tests reported in the literature have been assessed in methodologically valid primary studies. 16 studies were included in our review, producing 56 independent test-pathology combinations. Two tests demonstrated strong clinical utility, the patellar-pubic percussion test for excluding radiologically occult hip fractures (negative LR 0.05, 95% Confidence Interval [CI] 0.03-0.08) and the hip abduction sign for diagnosing sarcoglycanopathies in patients with known muscular dystrophies (positive LR 34.29, 95% CI 10.97-122.30). Fifteen tests demonstrated moderate diagnostic utility for diagnosing and/or excluding hip fractures, symptomatic osteoarthritis and loosening of components post-total hip arthroplasty.
Conclusions
We have identified a number of tests demonstrating strong and moderate diagnostic performance. These findings must be viewed with caution as there are concerns over the methodological quality of the primary studies from which we have extracted our data. Future studies should recruit larger, representative populations and allow for the construction of complete 2×2 contingency tables.
The online version of this article (doi:10.1186/1471-2474-14-257) contains supplementary material, which is available to authorized users.
Sam Adie, Justine Maree Naylor contributed equally to this work.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
LAR contributed to the design of the review; acquisition, analysis and interpretation of data; and drafting and revising of the manuscript. SA contributed to the conception and design of the review; analysis and interpretation of data; and revising of the manuscript. JMN contributed to the conception and design of the review; analysis and interpretation of data; and drafting and revising of the manuscript. RM contributed to the conception and design of the review; analysis and interpretation of data; and revising of the manuscript. SS contributed to the acquisition, analysis and interpretation of data, and revising of the manuscript. IAH contributed to the conception and design of the study; analysis and interpretation of the data; and revision of the manuscript. All authors read and approved the final manuscript.
Abkürzungen
THA
Total hip arthroplasty
+LR
Positive likelihood ratio
-LR
Negative likelihood ratio
CI
Confidence interval
CINAHL
Cumulative Index to Nursing and Allied Health Literature
FN
False negatives
FP
False positives
LR
Likelihood ratio
NPV
Negative predictive value
PPV
Positive predictive value
TN
True negatives
TP
True positives.
Background
The diagnostic value of many physical tests in orthopedic practice has been called into question and a number of these tests have been found to correspond poorly with anatomical models [1, 2]. In some cases, clinicians proceed directly to more invasive or technologically-involved ‘definitive’ investigations, however this is not always desirable, practical or economical [3]. For example, the more direct approach has been blamed for diagnostic delays and misclassification of hip joint pathologies [4].
Recently, several diagnostic reviews of physical tests of the hip have been published [5‐8] and they generally support the view that most studies are of low to moderate quality. Three of these reviews examined labral pathologies and/or femoroacetabular impingement [5, 6, 8] while a fourth looked at a wider range of pathologies [7]. This systematic review aims to build on these reviews by assessing a broad range of hip pathologies, and employing a more selective approach to the inclusion of studies in order to accurately gauge diagnostic performance for the purposes of making recommendations for clinical practice and future research. We aim to determine:
i)
which physical tests of the hip or physical clinical prediction rules have valid evidence from which their diagnostic performance in clinical practice can be calculated; and
ii)
whether any physical tests or clinical prediction rules have strong diagnostic utility; and
iii)
whether any physical tests or clinical prediction rules have moderate diagnostic utility.
Anzeige
Methods
In this systematic review, a preliminary search of various textbooks, medical journal databases, websites and grey literature sources was conducted to identify physical tests of the hip. Subsequently, an electronic database search strategy was developed, aided by a medical librarian (see Additional file 1), and applied to Medline (1950-July 2010), Embase (1980-July 2010), Embase Classic (1947–1979) and the Cumulative Index to Nursing and Allied Health Literature (CINAHL) (1982-July 2010). A follow up search was performed in March 2013 using Medline, Embase and CINAHL to identify studies published in the interim period following the original search (see Additional file 1).
Studies included in our review were required to:
i)
compare a physical (index) test for the diagnosis of a particular hip pathology against a ‘gold standard’ (reference) test representing the true diagnostic result. Physical tests were defined as non-invasive bedside maneuvers, beyond inspection, point tenderness and palpation alone, which were intended to increase the probability of a particular diagnosis; and
ii)
report sufficient information to construct complete 2×2 contingency tables; and
iii)
recruit predominantly adult populations (where ages were indicated); and
iv)
be written in English.
Studies were excluded if they:
i)
used physical tests under anesthesia or intra-operatively; or
ii)
used physical tests to diagnose vascular or neurologic pathologies.
Studies were also excluded if they did not meet our criteria for internally and externally valid methodology. These criteria are listed below.
iii)
For the purposes of internal validity, reference tests could not: (1) be dependent upon the index test result for interpretation, (2) be discredited for diagnosing the chosen pathology, or (3) allow for only partial construction of 2×2 contingency tables (e.g. by excluding persons with negative index test results from the study).
iv)
For the purposes of external validity, (1) the sample population had to reasonably represent a typical population presenting for diagnosis in clinical practice (e.g. they could not use healthy or asymptomatic controls who had no indications for testing), and (2) the index test needed to provide a threshold for dichotomizing results.
Assessments of validity were made independently by two authors and disputes arbitrated by a third author. No further restrictions were placed on study design, date of publication or clinical setting.
Anzeige
For the literature search in 2010, one author screened citations for inclusion on the basis of their title. The remaining citations were assessed independently by two authors, first by title and abstract and then by full text. Opposing views regarding inclusion were resolved by arbitration with the remaining authors. When new tests were identified, new search strategies were executed for them using Medline, Embase and Embase Classic (see Additional file 1). The follow up literature search and sorting process in March 2013 were conducted entirely by a single author.
The diagnostic performances of included physical tests are presented in terms of sensitivity, specificity, predictive values and likelihood ratios (LRs) with the latter being used to further identify tests demonstrating “strong” and “moderate” diagnostic utility. We favor the use of likelihood ratios because they offer the most valuable and comprehensive diagnostic information in the individual patient [9, 10]. Roughly speaking, tests with positive LRs greater than or equal to 10 or negative LRs less than or equal to 0.1 will cause almost conclusive, “strong” changes in post-test probability of disease. Positive LRs between 5 and 9.99 and negative LRs between 0.11 and 0.2 cause “moderate” changes in post-test probability [9]. In order to limit the uncertainty caused by studies recruiting small sample populations, we required “strong” tests to meet our likelihood ratio criteria within their entire 95% confidence intervals (otherwise the test was classified as “moderate”). When diagnostic data was only presented in the form of percentages or fractions, we attempted to revert it back to integer form to determine the original population numbers in each diagnostic category of a 2x2 contingency table. We only pooled data from studies involving the exact same index test and target pathology.
Results
Only a small proportion of hip tests identified in our preliminary search had their diagnostic performance assessed in methodologically valid primary studies. We identified sixteen studies containing data that satisfied our inclusion and exclusion criteria [11‐26] (Figure 1). This produced a total of 56 independent test-pathology combinations (Additional file 2).
×
Two physical tests demonstrated strong diagnostic utility with the patellar-pubic percussion (PPP) test strongly excluding radiologically occult hip fractures (negative LR 0.05, 95% CI 0.03-0.08) [26], and the hip abduction sign strongly diagnosing sarcoglycanopathies in patients with known muscular dystrophies (positive LR 34.29, 95% CI 10.97-122.30) [20] (Table 1). The original description of these tests from the primary studies can be found in Additional file 2.
Table 1
Diagnostic performances of independent physical test-hip pathology combinations with strong clinical diagnostic utilitya
Positive Predictive Value (PPV), Negative Predictive Value (NPV), Positive Likelihood Ratio (+LR), Negative Likelihood Ratio (−LR), 95% Confidence Interval (95% CI), True Positives (TP), False Positives (FP), True Negatives (TN), False Negatives (FN). All values rounded to 2 decimal places.
aStrong diagnostic utility defined as either +LR ≥ 10 or -LR ≤ 0.1 where entire 95% confidence interval satisfies these thresholds. Moderate diagnostic utility defined as +LR > 5 or -LR < 0.2 without satisfying the criteria for strong diagnostic utility.
b10 healthy controls that tested negative with the index test were removed from our calculations.
Fifteen independent test-pathology combinations demonstrated, at most, moderate diagnostic utility (Table 2). These included five tests for diagnosing symptomatic osteoarthritis [25], seven tests for diagnosing loosening of various components post-total hip arthroplasty [23] and three tests for diagnosing and excluding various hip fractures [11, 13, 24].
Table 2
Diagnostic performances of independent physical test-hip pathology combinations with moderate clinical diagnostic utilitya
Femoral Neck Stress Fracture (radiologically occult but suggestive bone scintigraphy)
6-week Follow up Radiography
1.00
0.33
0.76
1.00
1.50a
0.10
0.90-1.00
0.12-0.33
1.00 – 1.72a
0.01 – 0.98a
13/13
2/6
Positive Predictive Value (PPV), Negative Predictive Value (NPV), Positive Likelihood Ratio (+LR), Negative Likelihood Ratio (−LR), 95% Confidence Interval (95% CI), True Positives (TP), False Positives (FP), True Negatives (TN), False Negatives (FN), Range of Motion (ROM). All values rounded to 2 decimal places. When one of the cells of the 2×2 contingency table contained the value ‘zero’, we added 0.5 to each cell in order to calculate likelihood ratio values and their confidence intervals.
aStrong diagnostic utility defined as either +LR ≥ 10 or -LR ≤ 0.1 where entire 95% confidence interval satisfies these thresholds. Moderate diagnostic utility defined as +LR > 5 or -LR < 0.2 without satisfying the criteria for strong diagnostic utility.
bClinical Prediction Rule consisted of 5 variables: (1) self-reported squatting as an aggravating factor, (2) scour test with adduction causing groin or lateral pain, (3) active hip flexion causing late pain, (4) active hip extension causing hip pain, and (5) passive hip internal rotation less than or equal to 25°.
Discussion
Previous reviews of physical tests have found much of the existing literature to be methodologically flawed and insufficient for guiding clinical practice. This review sought to identify clinically useful physical tests or combinations of tests that demonstrated strong and moderate diagnostic performance. This information could potentially be used to form future clinical prediction rules or guide future research. We found the PPP test strongly excluded radiologically occult hip fractures and the hip abduction sign strongly diagnosed sarcoglycanopathies in patients with known muscular dystrophies. In addition, we identified a number of tests with moderate usefulness for diagnosing and/or excluding hip fractures, symptomatic osteoarthritis and loosening of components post-THA.
While some of our results are promising at face value, the raw data needs to be considered in more detail.
Firstly, it is possible that we have overstated the utility of the PPP test since we have based our conclusions primarily on a single study by Tiru et al. [26]. Two other studies recruiting smaller populations [11, 13] also employed the principle of osteophony when testing for hip fractures and found only moderate diagnostic utility. We did not pool the data from these studies they tested for radiologically apparent fractures, and the Bartford test employed by Bache and Cross [13] auscultated for sound transmitted by a tuning fork rather than percussion.
Anzeige
The hip abduction sign may also not perform as strongly as we suggested because Khadilkar and Singh [20] relied on retrospective testing of patients with known diagnoses of variable duration and severity. It is therefore possible that some of the recruited sample population may not have reflected clinical practice. Khadilkar and Singh’s [20] findings need to be confirmed prospectively in a pre-diagnosis setting.
There was significant uncertainty about the true diagnostic performance of some of the moderately useful physical tests because of the small sample populations recruited in the primary studies [11, 13, 24‐26]. We suggest further testing with large sample populations would be of benefit to better assess if these tests should be considered for inclusion in future clinical prediction rules.
While we acknowledge that previous hip test reviews have found much of the literature to be methodologically flawed, we did not use cumulatively-scored quality assessment tools to analyze our data as the implications of these numerical values are not clear [27]. Instead, we used our methodological validity criteria to provide a minimum standard to serve our primary purpose, which was to identify tests with strong and moderate diagnostic performance for use in clinical practice. Although our criteria are generally consistent with quality assessment tools and have been empirically associated with design-related bias [28], we acknowledge that this does not eliminate all bias and that there remain significant shortcomings in the literature. We believe our criteria represent a reasonable compromise for the sake of drawing basic conclusions. That said, since our criteria have not been independently validated, we have reported data from excluded studies in Additional file 3 when complete 2×2 contingency tables could be formed and Additional file 4 for the remaining studies and case reports. There were some discrepancies between this review and those that have been previously published. In some instances this was explained by calculation errors and in others this was because we found there was insufficient information in the primary study to construct 2×2 contingency tables for calculation of diagnostic performance.
Conclusions
There is valid evidence for the diagnostic performance of only a small proportion of physical tests of the hip in routine clinical practice. Two tests demonstrated strong diagnostic utility, the patellar-pubic percussion test for excluding radiologically occult hip fractures and the hip abduction sign for diagnosing sarcoglycanopathies in patients with known muscular dystrophies. In addition, we identified a number of tests with moderate usefulness for diagnosing and/or excluding hip fractures, symptomatic osteoarthritis and loosening of components post-THA. The primary studies from which our data are derived contain methodological flaws that bias their results. Future studies should recruit larger and more representative populations and allow for construction of complete 2×2 contingency tables.
Anzeige
Acknowledgements
We thank the staff at the Ken Merten Library at Liverpool Hospital, Sydney, Australia, for their assistance in developing the search strategy for this review. We also thank the staff at the Fairfield Hospital Library, Sydney, Australia, for their assistance in retrieving studies for this review.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
LAR contributed to the design of the review; acquisition, analysis and interpretation of data; and drafting and revising of the manuscript. SA contributed to the conception and design of the review; analysis and interpretation of data; and revising of the manuscript. JMN contributed to the conception and design of the review; analysis and interpretation of data; and drafting and revising of the manuscript. RM contributed to the conception and design of the review; analysis and interpretation of data; and revising of the manuscript. SS contributed to the acquisition, analysis and interpretation of data, and revising of the manuscript. IAH contributed to the conception and design of the study; analysis and interpretation of the data; and revision of the manuscript. All authors read and approved the final manuscript.