Introduction
Acute appendicitis represents a prevalent and important cause of abdominal pain in pediatric patients [
1]. Rapid diagnosis and identification of complications are vital for timely and appropriate management of this condition. Imaging plays a crucial role in evaluating children with suspected acute appendicitis and has been shown to reduce negative appendectomy rates [
2].
Although there is no universally accepted strategy for imaging pediatric patients with suspected acute appendicitis, some guidance is available. The American College of Radiology (ACR) has recently published updated appropriateness criteria on this topic [
3]. These guidelines recommend initial evaluation with ultrasound (US) for patients with an intermediate risk of acute appendicitis based on clinical assessment, followed by further evaluation with computed tomography (CT) or magnetic resonance imaging (MRI) in cases with equivocal US findings. In the setting of a high clinical risk, the three modalities carry equivalent recommendations of “may be appropriate,” while all three are “usually not appropriate” in the setting of a low clinical risk.
Ultrasound has clear benefits in the setting of pediatric right lower quadrant abdominal pain, including its lack of ionizing radiation and intravenous contrast, low cost and availability. It can achieve very high sensitivity and specificity for acute appendicitis under ideal conditions [
4]. However, the accuracy of US varies with operator skill, patient factors and clinical setting [
3] and the 2018 ACR Appropriateness Criteria make the assumption of US performance by an expert when making its recommendations. CT is more likely to be the initial diagnostic modality in a community hospital setting [
5]. The benefits and risks of CT in the diagnosis of pediatric appendicitis have been well characterized; while CT offers high diagnostic accuracy [
6,
7], concerns regarding radiation exposure limit its utility as a first-line test in the pediatric population.
There has been rising interest in MRI for pediatric appendicitis. Diagnostic accuracy of the modality is high [
8], but its utility was shown in some studies to be limited by long imaging times (often requiring sedation in children) and the need for gadolinium-based intravenous contrast administration. However, advances in MRI protocols have allowed for rapid non-contrast examinations that maintain a high degree of sensitivity/specificity [
9,
10]. A recent study demonstrated the efficacy of MRI when employed as a first-line modality in pediatric patients suspected of having acute appendicitis [
11].
However, there is a question of generalizability of these results. Given that the majority of studies published on MRI have relied on the interpretations of board-certified pediatric radiologists, it remains to be shown that the modality could effectively be employed by trainees or general/emergency radiologists in the community setting. The purpose of this study is to evaluate the diagnostic performance of radiology residents in preliminarily interpreting first-line MRI studies performed in children with clinical concern for acute appendicitis.
Materials and methods
This retrospective chart review study was approved by our institutional review board, with a waiver of informed consent.
Clinical setting
At our institution, pediatric acute abdomen MRI performed outside of normal business hours is supervised and interpreted by second-year (PGY-3) and above diagnostic radiology residents with indirect attending pediatric radiologist supervision. Pediatric radiology attendings are available in-house from 8 a.m. to 5 p.m.; however, during these hours, residents often render preliminary reports for these studies before an official readout with the attending. After hours, the on-call attending is not physically present in-house but is immediately available by phone and able to review studies from home if necessary. In general, MRI examinations are performed according to a standardized protocol and completed by the technologist without real-time radiologist supervision. Further detail regarding workflow in our department, including average times from MRI order to image acquisition and interpretation, can be found in a recently published study [
11].
Data collection
A list of consecutive patients 21 years old or younger (the cut-off age for our pediatric emergency department) who presented to our institution with acute abdominal pain between January 2013 and June 2016 and underwent an unenhanced MRI examination of the abdomen and pelvis was generated from the hospital electronic medical records. During this time period, our institution implemented a cross-departmental strategy of performing MRI as the initial imaging modality in cases of suspected acute appendicitis in pediatric patients. As many patients as possible were imaged using MRI as the first-line test, and patients as young as 3 years old were successfully examined.
A number of patients from the initial list were excluded from the study. Exclusion criteria included: (1) age 18 years or older (in order to better adhere to the commonly held definition of a pediatric cohort), (2) MRI following either a CT or US examination, (3) incomplete MRI, (4) previous history of appendectomy and (5) lost to follow-up (defined by a lack of subsequent visits recorded in the electronic medical record). This cohort of patients has been examined in a previous study [
11]; additional exclusion criteria were applied in the current study, such that the 377 patients included in the current study overlap with the 402 patients in the previous study. The additional exclusion criteria for the current study were patients whose reports were rendered exclusively by an attending radiologist or generated by a first-year resident under direct attending supervision. These additional exclusion criteria were used to exclude cases in which the preliminary resident report was not rendered by an “independent” resident.
Magnetic resonance imaging protocol
All MRI examinations were performed on one of two available scanners, a 1.5 tesla (T) Magnetom Aera or a 3.0-T Magnetom Skyra (Siemens Healthcare, Erlangen, Germany). Multisequence, multiplanar imaging was performed without gadolinium-based contrast. The protocol, which was standardized for all patients, included the following sequences: T2-weighted single-shot fast spin echo (axial, coronal and sagittal planes), fat-suppressed T2-weighted single-shot fast spin echo (axial, coronal and sagittal planes), T1-weighted three-dimensional (3-D) dual-echo spoiled gradient recalled echo (axial plane), diffusion-weighted imaging (axial plane), and 3-D T2-weighted turbo spin echo (axial plane, pelvis only). A complete technical description of imaging parameters has been previously published [
11]. Moderate sedation was administered by the treating emergency department physician in rare cases when deemed clinically necessary. A small minority of cases were performed under general anesthesia.
Magnetic resonance imaging interpretation
The imaging criteria used for acute appendicitis in this retrospective study were not strictly standardized and diagnosis was made at the discretion of each reader. However, commonly reported MRI findings indicative of acute appendicitis included a fluid-filled appendix, appendiceal wall edema, periappendiceal inflammatory changes and an appendicolith. For the purposes of this study, cases of complicated appendicitis (perforation, abscess) were not specifically delineated and were simply included as cases of positive acute appendicitis.
Reference standards
The electronic medical record for each patient was reviewed to serve as the reference standard for acute appendicitis (true positive or true negative). For patients who underwent surgery, surgical histopathological findings served as the reference standard. For patients who did not undergo surgery, follow-up clinical evaluations documenting favorable response to conservative management served as the reference standard. These evaluations occurred in the form of follow-up to the patient’s emergency department visit or as part of the patient’s next well-child visit.
Concordance and diagnostic performance
The preliminary (resident) and final (attending) interpretations of each MRI examination were reviewed. Interpretations were categorized as either positive or negative for acute appendicitis. For the purposes of this study, only interpretations that were unequivocally negative (e.g., reports stating “negative for acute appendicitis,” “no evidence of acute appendicitis” or “normal appendix”) were categorized as negative. Reports that indicated a low but nonzero degree of suspicion for acute appendicitis (e.g., “equivocal for acute appendicitis” or “could represent early appendicitis”) were categorized as a positive interpretation.
Concordance between resident and attending reports was determined. Additionally, diagnostic performance (sensitivity, specificity, and positive and negative predictive value) of both residents and attending radiologists was calculated by comparing the resident and attending reports to the reference standards described above.
Alternative diagnoses
The studies in the data set that were negative for acute appendicitis were reviewed to determine whether an alternative diagnosis explaining the patient’s abdominal pain was identified. In cases where an alternative diagnosis was identified in the attending report, a comparison was made between the resident and attending reports to determine concordance between the resident and attending radiologists.
Statistical methods
Categorical variables were numerically coded. Continuous variables were tested for normality assumption (Shapiro-Wilk W test) and appropriate summary statistics were reported. For the paired interpretations by the residents and the attending radiologists, McNemar’s test of correlated proportion was used to determine if the positive interpretation rate for acute appendicitis differed. Fisher exact tests were used to determine if the concordance rates differed with the resident’s level of training and with the academic year. For the metrics reported as a proportion or a ratio, the exact (Clopper-Pearson) 95% confidence intervals were obtained using simple binomial proportions. All analyses were performed using statistical software (SAS version 9.4; SAS Institute Inc., Cary, NC). Effects associated with P<0.05 were considered statistically significant.
Discussion
This study demonstrates a high degree of concordance between resident and attending radiologist interpretations of abdominopelvic MRI performed in our emergency department for the evaluation of pediatric acute appendicitis. In addition, subjective analysis of the 11 cases of discrepant interpretations suggests that the majority of trainee errors occurred on studies that demonstrated subtle findings or were otherwise difficult to interpret.
There was no significant difference in the concordance rates when comparing residents by postgraduate year of training, suggesting that residents with as little as 1 year of radiology training before they started on-call responsibilities demonstrated diagnostic performance comparable to more experienced residents. At least one previous study has similarly shown no difference between resident postgraduate years [
12], but the majority have demonstrated improvements in resident performance with increasing experience [
13‐
18]. The high performance of junior residents in this study may be due, in part, to early focused training. Before undertaking overnight call, first-year residents at our institution undergo focused lectures and case reviews dedicated to familiarizing them with essential sequences, common pathologies and pitfalls. The residents have access to a teaching file of interesting cases to review. The first-year residents also undergo a mock pre-call test that includes emergency body MRI cases to assess their proficiency in interpreting these studies. Detailed educational and training approaches to interpreting MRI for pediatric acute appendicitis are available in the literature [
19‐
21].
The accuracy of resident preliminary interpretations in emergent imaging studies has been widely investigated in the literature. Previous studies have shown that the concordance rates between resident and attending interpretations are generally high across a wide variety of modalities and clinical scenarios [
12,
13,
15,
17,
22‐
24]. However, higher rates of discrepancy have been noted for specific modalities/diagnostic scenarios, including neuroradiology MRI [
14] and CT angiography of the head/neck [
16]. These studies suggest that concordance between residents and attending radiologists should be evaluated for specific modalities and diagnoses to identify scenarios in which trainees with limited experience may not perform as well.
Analysis of the diagnostic performance of both residents’ and attending radiologists’ interpretations with respect to “ground truth” showed that the differences were not statistically significant. However, trends in these data may indicate specific deficiencies that lead to discrepant interpretations. Resident sensitivity (91.2%) was lower than that for attending radiologists (97.8%) with overlapping 95% confidence intervals, suggesting that residents may miss subtle positive cases of acute appendicitis. Specificity was high for both groups. However, when applied to the study cohort (in which there was a relatively low rate of positive cases), resident positive predictive value (92.2%) trended lower in comparison to attending radiologists (96.7%) with overlapping 95% confidence intervals. Negative predictive value was high for both groups, but slightly lower for the residents. Considering the positive interpretation rates were similar (90/377, 23.9% for residents and 92/377, 24.4% for attending radiologists), in the context of our patient population, there appears to be a trend toward resident “overcalls,” rather than “undercalls.”
A particular benefit of MRI is its ability to suggest alternative causes for a patient’s abdominal pain. In this data set, 93 cases (24.7%) were found to demonstrate alternative diagnoses, most commonly in the form of enteritis/colitis, pyelonephritis and ovarian pathology. The overall rate of resident to attending radiologist concordance for these alternative diagnoses was lower than that for acute appendicitis, at 80.6%. Although these results are likely influenced by the nearly unlimited number of possible alternative diagnoses (as compared with the binary interpretation of positive or negative for acute appendicitis), they demonstrate additional benefit gained from expert interpretation of these examinations.
This study is limited by its retrospective nature. Additionally, images were not retrospectively evaluated, and the statistical analysis was performed using the radiology reports. We acknowledge that a retrospective image analysis wherein each MRI examination was evaluated by multiple residents of varying levels of training and attending radiologists would have resulted in a more robust scientific analysis; however, the aim of the study was to assess the diagnostic performance as reflected in a real-life scenario. The reference standards used in the study (histology and clinical follow-up) represent imperfect gold standards. In particular, clinical follow-up was unable to be standardized due to the retrospective nature of the study. Diagnostic performance for identifying features of complicated appendicitis was not examined and could be further investigated in future work. We did not compare MRI with other imaging modalities, specifically US.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.