Elsevier

The Spine Journal

Volume 14, Issue 10, 1 October 2014, Pages 2442-2448
The Spine Journal

Clinical Study
Interrater and intrarater agreements of magnetic resonance imaging findings in the lumbar spine: significant variability across degenerative conditions

https://doi.org/10.1016/j.spinee.2014.03.010Get rights and content

Abstract

Background context

Magnetic resonance imaging (MRI) is frequently used in the evaluation of degenerative conditions in the lumbar spine. The relative interrater and intrarater agreements of MRI findings across different pathologic conditions are underexplored, as most studies are focused on specific findings.

Purpose

The purpose of this study was to characterize the interrater and intrarater agreements of MRI findings used to assess the degenerative lumbar spine.

Study design

A retrospective diagnostic study at a large academic medical center was undertaken with a panel of orthopedic surgeons and musculoskeletal radiologists to assess lumbar MRIs using standardized criteria.

Patient sample

Seventy-five subjects who underwent routine lumbar spine MRI at our institution were included.

Outcome measures

Each MRI study was assessed for 10 lumbar degenerative findings using standardized criteria. Lumbar vertebral levels were assessed independently, where applicable, for a total of 52 data points collected per study.

Methods

T2-weighted axial and sagittal MRI sequences were presented in random order to the four reviewers (two orthopedic spine surgeons and two musculoskeletal radiologists) independently to determine interrater agreement. The first 10 studies were reevaluated at the end to determine intrarater agreement. Images were assessed using standardized and pilot-tested criteria to assess disc degeneration, stenosis, and other degenerative changes. Interrater and intrarater absolute percent agreements were calculated. To highlight the most clinically important MRI disagreements, a modified agreement analysis was also performed (in which disagreements between the lowest two severity grades for applicable conditions were ignored). Fleiss kappa coefficients for interrater agreement were determined.

Results

The overall absolute and modified interrater agreements were 76.9% and 93.5%, respectively. The absolute and modified intrarater agreements were 81.3% and 92.7%, respectively. Average Fleiss kappa coefficient was 0.431, suggesting moderate overall agreement. However, when stratified by condition, absolute interrater agreement ranged from 65.1% to 92.0%. Disc hydration, disc space height, and bone marrow changes exhibited the lowest absolute interrater agreements. The absolute intrarater agreement had a narrower range, from 74.5% to 91.5%. Fleiss kappa coefficients ranged from fair-to-substantial agreement (0.282–0.618).

Conclusions

Even in a study using standardized evaluation criteria, there was significant variability in the interrater and intrarater agreements of MRI in assessing different degenerative conditions of the lumbar spine. Clinicians should be aware of the condition-specific diagnostic limitations of MRI interpretation.

Introduction

Degenerative conditions of the lumbar spine are ubiquitous in modern society [1]. Failing conservative management, magnetic resonance imaging (MRI) is a noninvasive and radiation-free imaging modality that is frequently considered for this population. Speed and image quality have continued to evolve for this imaging modality, but limitations remain.

The interpretation of MRI studies is subject to variability. This may be because of variations in the nomenclature [2], [3]. Analogous to clinical medicine, there is no single-established validated grading scheme for many radiographic findings. However, there are also variations inherent to the assessment of resultant images. A study interpreted as “severe” stenosis may be read as “moderate” or perhaps “mild” by another reviewer [4]. Though much of the clinical practice of spine surgery is based on the correlation of clinical symptomatology and imaging findings, the importance of these variabilities in MRI interpretation and nomenclature cannot be ignored.

Most studies evaluating the interpretation of lumbar MRI pathologies have focused on various specific grading scales. For example, studies have examined the diagnostic characteristics of MRI with regard to conditions such as spinal cord compression in acute traumatic injury [5], disc abnormalities [6], [7], [8], [9], [10], end-plate signal (Modic) changes [11], [12], lumbar spinal stenosis [4], [13], and disc herniation [14], [15]. There are several studies that have examined a handful of spinal conditions simultaneously [16], [17], [18].

Considering the reported variability in assessing specific lumbar conditions by MRI, it can be expected that this variation would exist between different pathologies in a standardized comparison. Nonetheless, we believe physicians and patients may underappreciate these inherent variabilities in MRI interpretation despite the widespread use of this imaging modality [4], [16], [18]. The purpose of our study was to examine the interrater and intrarater agreements of MRI in the evaluation of 10 degenerative conditions of the lumbar spine, with a panel of orthopedic spine surgeons and musculoskeletal radiologists.

Section snippets

Patient sample

The patient population for this study was drawn from our institution's radiology database of patients who underwent lumbar spine MRI in 2010 by our Department of Musculoskeletal Radiology. Exclusion criteria included prior lumbar instrumentation or fusion. There were no changes in imaging equipment or technique over the study period. The patients were sorted in chronological order based on the imaging study date, and the first 75 patients were included in our study based on a priori power

Results

The study population consisted of 75 patients, with 36 males (48%) and 39 females (52%). The mean age was 50.2 (range, 14–82) years. Each study was evaluated for 52 data points by each of the 4 reviewers, with the first 10 subjects evaluated twice.

Overall interrater absolute agreement was 76.9% (95% confidence interval [CI], 72.7–81.0). When stratified by pathology (Fig. 1), interrater absolute agreement ranged from 65.1% to 92.0%. This absolute interrater agreement is the percentage of

Discussion

Observer performance is an important source of inconsistency in imaging-based diagnoses. In the lumbar spine, where the differential for symptoms includes many possible pathologic conditions, there has been a paucity of rigorous studies on the agreement of MRI across multiple pathologies. Our study is an attempt at characterizing the interrater and intrarater agreements of MRI in assessing 10 common conditions of the lumbar spine, using a panel of orthopedic spine surgeons and musculoskeletal

References (24)

  • M.N. Brant-Zawadzki et al.

    Interobserver and intraobserver variability in interpretation of lumbar disc abnormalities. A comparison of two nomenclatures

    Spine

    (1995)
  • C.W. Pfirrmann et al.

    Magnetic resonance classification of lumbar intervertebral disc degeneration

    Spine

    (2001)
  • Cited by (43)

    • Standardized Classification of Lumbar Spine Degeneration on Magnetic Resonance Imaging Reduces Intra- and Inter-subspecialty Variability

      2022, Current Problems in Diagnostic Radiology
      Citation Excerpt :

      The efficacy of this tool was also not validated with a group of neuroradiologists, and no comparison was made between MSK and NR groups. Fu et al5 also developed a standardized classification of degenerative change, demonstrating improvement in variability, a group of MSK radiologists and orthopedic surgeons. Again, direct comparison with the current study is limited due to different statistical methods.

    • Underreporting of spinal epidural lipomatosis: A retrospective analysis of lumbosacral MRI examinations from different radiological settings

      2022, Diagnostic and Interventional Imaging
      Citation Excerpt :

      In fact, in our series, the reporting rate of SEL as the sole pathologic finding on lumbosacral MRI examinations was 33.3% and dropped to 8% in the whole cohort of patients with SEL, and 5.8% in patients in whom SEL was associated with other pathological findings, albeit this difference was not statistically significant (P = 0.0698). In the literature, there is a well-documented variability among radiologists in the interpretation of imaging examinations of the spine [20–23]. SEL seems to be commonly misdiagnosed, according to our data.

    View all citing articles on Scopus

    FDA device/drug status: Not applicable.

    Author disclosures: MCF: Nothing to disclose. RAB: Nothing to disclose. WDL: Nothing to disclose. DJB: Nothing to disclose. AWL: Nothing to disclose. AHH: Consulting: Shire HGT (B), Pfizer (B). JNG: Consulting: Affinergy (D), Alphatec (E), Bioventus, Depuy (C), Harvard Clinical Research Institute (E), Powered Research (A), Stryker (E), Transgenomic, Smith and Nephew (D), Medtronic (B); Grants: Smith and Nephew (Genetic tests done at no charge, but not funds exchanged for a study, Paid directly to institution).

    The disclosure key can be found on the Table of Contents and at www.TheSpineJournalOnline.com.

    There were no sources of funding or conflicts of interest related to this study.

    View full text