Erschienen in:
01.12.2012 | Clinical Research
Reliability of Bucholz and Ogden Classification for Osteonecrosis Secondary to Developmental Dysplasia of the Hip
verfasst von:
Andreas Roposch, MD, MSc, FRCS, John H. Wedge, OC, MD, FRCS(C), Georg Riedl, MD
Erschienen in:
Clinical Orthopaedics and Related Research®
|
Ausgabe 12/2012
Einloggen, um Zugang zu erhalten
Abstract
Background
Osteonecrosis is perhaps the most important serious complication after treatment of developmental dysplasia of the hip (DDH). The classification by Bucholz and Ogden has been used most frequently for grading osteonecrosis in this context, but its reliability is not established and unreliability could affect the validity of studies reporting the outcome of treatment.
Questions/Purpose
We established the interrater and intrarater reliabilities of this classification and analyzed the frequency and nature of disagreements.
Methods
Three pediatric hip surgeons, a musculoskeletal pediatric radiologist, and three orthopaedic trainees graded 39 radiographs (hips) according to the Bucholz and Ogden classification, blinded to any clinical data. Ratings were repeated after 2 weeks. Interrater reliability and intrarater reliability were determined using the simple kappa statistic. Grading was compared among raters, the nature and frequency of disagreements established, and subgroup analyses performed.
Results
Interrater reliability was 0.34 (95% CI = 0.28, 0.40) for all raters, and 0.31 (0.20 to 0.43) for the three surgeons. The best interrater reliability was observed between the radiologist and a surgeon with a kappa of 0.51 (0.30, 0.72). Intrarater reliability estimates ranged from 0.44 to 0.69. Raters disagreed regarding the grade of osteonecrosis in 26 of 39 hips (67%), with seven of 26 disagreements (27%) involving confusion between Grades I and II.
Conclusions
The interrater reliability was lower than expected, considering the raters’ experience. Distinguishing between Grades I and II was the most frequently observed problem. We believe that the low reliability was a result of an ambiguous classification scheme rather than the variability among the raters. Outcome studies of DDH based on this classification should be interpreted with caution. We recommend the development of a new classification with better prognostic ability.
Level of evidence
Level III, diagnostic study. See the Guidelines for Authors for a complete description of levels of evidence.